VM wont start after node reboot. Kernel Panic - starting switch root

RalphUK

New Member
May 6, 2024
7
0
1
Hi,

I am relatively new to Proxmox. About 3 weeks ago I migrated about half a dozen VMs that I had running on a Hyper-V setup on my desktop PC over onto to a Proxmox (8.2.4) dedicated machine I had built. I run my business on these VMs so they are critical to me.

Today, after 19 days of flawless running I noticed a strange performance drop off with no obvious reason (basically my 14700 CPU clocks weren't getting anywhere near max) - anyway, because it made no sense - I decided to reboot the node. This was when the trouble started. One of the VMs didn't start (Oracle Linux 9.4) and was just burning its two CPU cores. I took the attached screenshot from the console. I tried a few things in rescue mode, but quickly decided to restore a backup. Which didn't fix the issue.

Fortunately I had kept all my old Hyper-V VMs switched off, and this VM contains no user data - so I am back up and running with the luxury of time to work out what happened, how to fix it, and how to stop it happening again.

I do backup all the data and code that I write for my platform, however configuring some of these VMs is quite a big job, and this whole "your backups are useless" type failure scares the #$!# out of me!

I would be really appreciative of any help anyone can give.

Thanks

Ralph
 

Attachments

  • Screenshot 2024-07-13 193616.png
    Screenshot 2024-07-13 193616.png
    142.1 KB · Views: 10
For anyone to try & help you you are going to have to provide more info.

What is the output for qm config {VMID} for the affected VM?
What is your general HW/NW setup?
What has happened during the 19 day period?
What version of PVE are you running? Output of pveversion -v .
Has your node been fully updated?

Please try & provide the console-output for above within code-tags (choose from the formatting-bar with the </> sign).
 
Thanks for your reply.

Code:
agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 2
cpu: host
efidisk0: vmdata:vm-203-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
ide2: iso:iso/OracleLinux-R9-U3-x86_64-boot.iso,media=cdrom,size=909M
machine: q35
memory: 4096
meta: creation-qemu=8.1.5,ctime=1719597477
name: TomcatVM
net0: virtio=BC:24:11:00:07:CA,bridge=vmbr1,firewall=1
numa: 0
ostype: l26
scsi0: vmdata:vm-203-disk-1,iothread=1,size=32G
scsihw: virtio-scsi-single
smbios1: uuid=9f34f8bb-c0f1-4605-a808-25f439ed9c40
sockets: 1
vmgenid: f698b6d2-88df-4f60-acdf-02588ff29aed

Intel 14700, 96GB DDR5 @ 6600, 2 x 500GB SSD for PVE boot drives (ZFS), a 2TB NVMe for VM OS (LVM) a 2TB NVMe passed through to another one of the VMs.
10Gb Fiber (vmbr1) + 2.5Gb Ethernet

Code:
proxmox-ve: 8.2.0 (running kernel: 6.8.8-1-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.8-1
proxmox-kernel-6.8.8-1-pve-signed: 6.8.8-1
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.4-1
proxmox-backup-file-restore: 3.2.4-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1

I updated the node after installation, so about 3 weeks old.

As to what has gone on in the last 19 days. Quite a lot - I have created 5 extra VMs (Increased throughput of a single threaded CPU intensive part of the platform - one of the main reasons for the migration).

One of the issues I was trying to resolve during the reboot was that I remembered that last time I rebooted, I had to press return on a boot manager prompt, so I had changed a the grub_timeout = 0 in an attempt to fix this - it didn't work. But I haven't tried to look further into that since the problem with this VM.
 
Thank you for your suggestions.

I have done the PVE upgrade and still no joy.

I looked at the Oracle Linux upgrade but couldn't see an option to upgrade the existing installation.
 
Does this help? (As I ais I don't use Oracle Linux).
No. When I run it it tells me a reboot won't be necessary.

This failure happens so early in the boot sequence it seems difficult to believe an "os upgrade required" solution is going to resolve it.

I spent yesterday morning building a new TomcatVM (on Ubuntu this time...) so am back up and running properly.

It does still leave me with no confidence that it won't happen again, and if it does happen again, my backups are useless.
 
Your problem maybe specific to your Tomcat setup etc. I would try running that same image bare-metal (if possible) & see if it is more stable.

I decided to reboot the node. This was when the trouble started. One of the VMs didn't start
Just to clarify; during the 19 days did you ever reboot the VM itself? This problem only occurred when rebooting the node?
 
Once the configuration of the VM was complete, I rebooted the OS within the VM - but looking at the tasks, I had never rebooted the VM. So it is likely that this node reboot was the 1st time that the VM had been rebooted.
 
OK. On the new VM (working) that you've setup:

1. Shutdown the VM properly (from within the VM) - so that Proxmox actually shows it is shutdown.
2. Backup the VM.
3. Restart the VM.

If it works - you always have a backup that actually (should) work. (You can test this by restoring it with a different VMID for testing purposes). If not - something is wrong with the VM setup, so correct & reconfigure till working.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!