VM wont start after node reboot. Kernel Panic - starting switch root

RalphUK · Jul 13, 2024

Hi,

I am relatively new to Proxmox. About 3 weeks ago I migrated about half a dozen VMs that I had running on a Hyper-V setup on my desktop PC over onto to a Proxmox (8.2.4) dedicated machine I had built. I run my business on these VMs so they are critical to me.

Today, after 19 days of flawless running I noticed a strange performance drop off with no obvious reason (basically my 14700 CPU clocks weren't getting anywhere near max) - anyway, because it made no sense - I decided to reboot the node. This was when the trouble started. One of the VMs didn't start (Oracle Linux 9.4) and was just burning its two CPU cores. I took the attached screenshot from the console. I tried a few things in rescue mode, but quickly decided to restore a backup. Which didn't fix the issue.

Fortunately I had kept all my old Hyper-V VMs switched off, and this VM contains no user data - so I am back up and running with the luxury of time to work out what happened, how to fix it, and how to stop it happening again.

I do backup all the data and code that I write for my platform, however configuring some of these VMs is quite a big job, and this whole "your backups are useless" type failure scares the #$!# out of me!

I would be really appreciative of any help anyone can give.

Thanks

Ralph

gfngfn256 · Jul 13, 2024

For anyone to try & help you you are going to have to provide more info.

What is the output for qm config {VMID} for the affected VM?
What is your general HW/NW setup?
What has happened during the 19 day period?
What version of PVE are you running? Output of pveversion -v .
Has your node been fully updated?

Please try & provide the console-output for above within code-tags (choose from the formatting-bar with the </> sign).

RalphUK · Jul 13, 2024

Thanks for your reply.

Code:

agent: 1
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 2
cpu: host
efidisk0: vmdata:vm-203-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
ide2: iso:iso/OracleLinux-R9-U3-x86_64-boot.iso,media=cdrom,size=909M
machine: q35
memory: 4096
meta: creation-qemu=8.1.5,ctime=1719597477
name: TomcatVM
net0: virtio=BC:24:11:00:07:CA,bridge=vmbr1,firewall=1
numa: 0
ostype: l26
scsi0: vmdata:vm-203-disk-1,iothread=1,size=32G
scsihw: virtio-scsi-single
smbios1: uuid=9f34f8bb-c0f1-4605-a808-25f439ed9c40
sockets: 1
vmgenid: f698b6d2-88df-4f60-acdf-02588ff29aed

Intel 14700, 96GB DDR5 @ 6600, 2 x 500GB SSD for PVE boot drives (ZFS), a 2TB NVMe for VM OS (LVM) a 2TB NVMe passed through to another one of the VMs.
10Gb Fiber (vmbr1) + 2.5Gb Ethernet

Code:

proxmox-ve: 8.2.0 (running kernel: 6.8.8-1-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.8-1
proxmox-kernel-6.8.8-1-pve-signed: 6.8.8-1
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.4-1
proxmox-backup-file-restore: 3.2.4-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1

I updated the node after installation, so about 3 weeks old.

As to what has gone on in the last 19 days. Quite a lot - I have created 5 extra VMs (Increased throughput of a single threaded CPU intensive part of the platform - one of the main reasons for the migration).

One of the issues I was trying to resolve during the reboot was that I remembered that last time I rebooted, I had to press return on a boot manager prompt, so I had changed a the grub_timeout = 0 in an attempt to fix this - it didn't work. But I haven't tried to look further into that since the problem with this VM.

gfngfn256 · Jul 14, 2024

First thing, I would try updating your node to latest PVE kernel with all updates.

Secondly, I don't use Oracle Linux, but there is a newer ISO version available; OracleLinux-R9-U4-x86_64-boot.iso

RalphUK · Jul 14, 2024

Thank you for your suggestions.

I have done the PVE upgrade and still no joy.

I looked at the Oracle Linux upgrade but couldn't see an option to upgrade the existing installation.

gfngfn256 · Jul 14, 2024

RalphUK said:
cpu: host

If there is no specific requirement for using a host type CPU, maybe try with the x86-64-v2-AES type.

gfngfn256 · Jul 14, 2024

RalphUK said:
I looked at the Oracle Linux upgrade but couldn't see an option to upgrade the existing installation.

Does this help? (As I ais I don't use Oracle Linux).

RalphUK · Jul 15, 2024

gfngfn256 said:
If there is no specific requirement for using a host type CPU, maybe try with the x86-64-v2-AES type.

Made no difference.

RalphUK · Jul 15, 2024

gfngfn256 said:
Does this help? (As I ais I don't use Oracle Linux).

No. When I run it it tells me a reboot won't be necessary.

This failure happens so early in the boot sequence it seems difficult to believe an "os upgrade required" solution is going to resolve it.

I spent yesterday morning building a new TomcatVM (on Ubuntu this time...) so am back up and running properly.

It does still leave me with no confidence that it won't happen again, and if it does happen again, my backups are useless.

gfngfn256 · Jul 15, 2024

Your problem maybe specific to your Tomcat setup etc. I would try running that same image bare-metal (if possible) & see if it is more stable.

RalphUK said:
I decided to reboot the node. This was when the trouble started. One of the VMs didn't start

Just to clarify; during the 19 days did you ever reboot the VM itself? This problem only occurred when rebooting the node?

RalphUK · Jul 16, 2024

Once the configuration of the VM was complete, I rebooted the OS within the VM - but looking at the tasks, I had never rebooted the VM. So it is likely that this node reboot was the 1st time that the VM had been rebooted.

gfngfn256 · Jul 16, 2024

OK. On the new VM (working) that you've setup:

1. Shutdown the VM properly (from within the VM) - so that Proxmox actually shows it is shutdown.
2. Backup the VM.
3. Restart the VM.

If it works - you always have a backup that actually (should) work. (You can test this by restoring it with a different VMID for testing purposes). If not - something is wrong with the VM setup, so correct & reconfigure till working.

RalphUK · Jul 17, 2024

Yeah, I have got that backup some a shutdown VM. Thanks for your help.

Search

Search

VM wont start after node reboot. Kernel Panic - starting switch root

RalphUK

New Member

Attachments

gfngfn256

Renowned Member

RalphUK

New Member

gfngfn256

Renowned Member

RalphUK

New Member

gfngfn256

Renowned Member

gfngfn256

Renowned Member

RalphUK

New Member

RalphUK

New Member

gfngfn256

Renowned Member

RalphUK

New Member

gfngfn256

Renowned Member

RalphUK

New Member