[SOLVED] VM not starting after upgrading kernel and reboot - timeout waiting on systemd

vale.maio2 · Feb 4, 2022

Hi all, quite a noob here and running Proxmox on my own home server so please bear with me.
After upgrading the Proxmox kernel this morning (via a simple apt upgrade) and rebooting, one of my 2 VM is refusing to start. From this post I've tried running a systemctl stop 100.slice command (100 is the ID of the offending VM), but it's still showing up as

Bash:

root@server:~# systemctl status qemu.slice
● qemu.slice
     Loaded: loaded
     Active: active since Fri 2022-02-04 08:44:52 GMT; 9min ago
      Tasks: 31
     Memory: 586.6M
        CPU: 33.558s
     CGroup: /qemu.slice
             ├─100.scope
             │ └─1691 [kvm]
             └─101.scope
               └─2015 /usr/bin/kvm -id 101 -name Pihole -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/101.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reco>

And of course, when trying to restart it I get the following error:


root@server:~# qm start 100
timeout waiting on systemd

For what is worth, I'm running pve-manager/7.1-10/6ddebafe (running kernel: 5.13.19-4-pve), on a Dell T710 server, witha a hardware RAID5 configuration.
If you need any more details I'll be happy to oblige.

EDIT: just in case you need pveversion --verbose:

Bash:

root@server:~# pveversion --verbose
proxmox-ve: 7.1-1 (running kernel: 5.13.19-4-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-9
pve-kernel-5.13: 7.1-7
pve-kernel-5.11: 7.0-10
pve-kernel-5.13.19-4-pve: 5.13.19-8
pve-kernel-5.13.19-3-pve: 5.13.19-7
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph: 16.2.7
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-2
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-5
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-1
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1
root@server:~#

vale.maio2 · Feb 4, 2022

Ok, so it could have been the kernel. The update installed Linux 5.13.19-4-pve, which is when my VM broke. I've now restarted the server using Linux 5.13.19-3-pve, and the VM is up and running as if nothing happened. For now I'll remove that kernel version and mark it as not to be installed.
Would there be any explanation for this?

hACKhIDE · Feb 4, 2022

Hello, the same problem is for me.

vale.maio2 · Feb 4, 2022

hACKhIDE said:
Hello, the same problem is for me.

Did you try switching back to an older kernel version? That's what fixed it for me.

hACKhIDE · Feb 4, 2022

vale.maio2 said:
Did you try switching back to an older kernel version? That's what fixed it for me.

Yep, rollback to 5.13.19-3-pve kernel and now works fine. Thanks.

kuchar · Feb 4, 2022

How to do that?

vale.maio2 · Feb 4, 2022

kuchar said:
How to do that?

Open a terminal to your Proxmox server. Launch the command
grep menuentry /boot/grub/grub.cfg
and in there you'll need two things:

The first one is the id_option from the Advanced options for Proxmox VE GNU/Linux line (circled in red, number 1), the second one is the id_option for the last known working kernel (circled in red number 2, which for me was 5.13.19-3-pve). make sure not to grab the ID for the recovery mode kernel.

Modify the /etc/default/grub file with
nano /etc/default/grub
(or whatever your favourite text editor is). From there, delete the line that says
GRUB_DEFAULT=0
and replace it with

GRUB_DEFAULT="menu entry ID>kernel ID"

and use the two IDs you grabbed above. Don't forget to separate the IDs with the sign >.
In my case, the line looks like this:


GRUB_DEFAULT="gnulinux-advanced-9912e5fa-300a-4311-a7df-612754946075>gnulinux-5.13.19-3-pve-advanced-9912e5fa-300a-4311-a7df-612754946075"

Save and close the file, update the GRUB with
update-grub
and reboot.

kuchar · Feb 4, 2022

hACKhIDE said:
Yep, rollback to 5.13.19-3-pve kernel

vale.maio2 said:
Open a terminal to your Proxmox server. Launch the command
grep menuentry /boot/grub/grub.cfg
and in there you'll need two things:
View attachment 33903
The first one is the id_option from the menuentry line (circled in red, number 1), the second one is the id_option for the last known working kernel (which for me was 5.13.19-3-pve). make sure not to grab the ID for the recovery mode kernel.

Modify the /etc/default/grub file with
/etc/default/grub
(or whatever your favourite text editor is). From there, delete the line that says
GRUB_DEFAULT=0
and replace it with

GRUB_DEFAULT="menu entry ID>kernel ID"

and use the two IDs you grabbed above. Don't forget to separate the IDs with the sign >.
In my case, the line looks like this:
GRUB_DEFAULT="gnulinux-advanced-9912e5fa-300a-4311-a7df-612754946075>gnulinux-5.13.19-3-pve-advanced-9912e5fa-300a-4311-a7df-612754946075"
Save and close the file, update the GRUB with
update-grub
and reboot.

Thanks, works!

Ralf MUC · Feb 4, 2022

I can confirm I had the same issue and rollback to 5.13.19-3-pve kernel solved the problem. Thanks a lot!
Just a note: In the instruction above on how to rollback to 5.13.19-3 in the picture there is a slight error. Circle 1 should be on the respective part in one line below. The one that starts with "gnulinux-advanced-..."

jpeppard · Feb 4, 2022

Thanks all for the advice, my HBA passthrough stopped working post kernel reboot. Rolling back to 5.13.19-3-pve is working fine.

tristank · Feb 4, 2022

I just followed your instructions but this won't work for a zfs installation with efistub. Any suggestions how to do this? Also there seems to be missmatch between the instructions in the image and the example. In the image you recommend the menuentry gnulinux-simple-******** and in the example you choose a submenu entry like gnulinux-advanced-******.

vale.maio2 · Feb 5, 2022

tristank said:
Also there seems to be missmatch between the instructions in the image and the example. In the image you recommend the menuentry gnulinux-simple-******** and in the example you choose a submenu entry like gnulinux-advanced-******.

Woopsie you're right, I've corrected the image, thanks for spotting that.

tristank said:
I just followed your instructions but this won't work for a zfs installation with efistub. Any suggestions how to do this?

No I'm afraid, I'm not too familiar with ZFS.

ScottK · Feb 5, 2022

jpeppard said:
Thanks all for the advice, my HBA passthrough stopped working post kernel reboot. Rolling back to 5.13.19-3-pve is working fine.

Mine did as well! I had to backtrack back to 5.3.19-3-pve to get it back to working.

paulprox · Feb 5, 2022

Hello, came looking for this issue. I have the same exact problem and it was refusing to restart even. Reversing back the kernel worked. Hope this will get fixed

peteb · Feb 5, 2022

I also have the same issue after upgrading to kernel 5.13.19-4
My linux VM's boot fine, but a windows 10 and a windows 11 vm do not boot. Strangely a windows server 2019 vm does boot fine.
I had to revert back to kernel 5.13.19-3 to get everything working again.

It would be really nice if Proxmox could release a simple kernel removal tool to remove recently installed buggy kernels so that previous working kernels can be booted by default. At present there is no simple way to do this when booting ZFS with Systemd-boot.
proxmox-boot-tool also does not actually remove a buggy kernel - maybe proxmox should update this tool to do this.

When trying to remove the buggy kernel with:

Code:

apt remove pve-kernel-5.13.19-4-pve

I get a message that it also wants to remove the following:

Code:

root@pve0:~# apt remove pve-kernel-5.13.19-4-pve
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libzpool4linux pve-kernel-5.11.22-3-pve pve-kernel-5.11.22-5-pve pve-kernel-5.13.19-1-pve
Use 'apt autoremove' to remove them.
The following packages will be REMOVED:
  proxmox-ve pve-kernel-5.13 pve-kernel-5.13.19-4-pve
0 upgraded, 0 newly installed, 3 to remove and 0 not upgraded.
After this operation, 328 MB disk space will be freed.
Do you want to continue? [Y/n]

I do not want to proceed with this as it also wants to remove the packages proxmox-ve & pve-kernel-5.13 which I need to run kernel 5.13.19-3.

coffeedragonfly · Feb 5, 2022

vale.maio2 said:
Open a terminal to your Proxmox server. Launch the command
grep menuentry /boot/grub/grub.cfg
and in there you'll need two things:
View attachment 33923
The first one is the id_option from the Advanced options for Proxmox VE GNU/Linux line (circled in red, number 1), the second one is the id_option for the last known working kernel (circled in red number 2, which for me was 5.13.19-3-pve). make sure not to grab the ID for the recovery mode kernel.

Modify the /etc/default/grub file with
/etc/default/grub
(or whatever your favourite text editor is). From there, delete the line that says
GRUB_DEFAULT=0
and replace it with

GRUB_DEFAULT="menu entry ID>kernel ID"

and use the two IDs you grabbed above. Don't forget to separate the IDs with the sign >.
In my case, the line looks like this:
GRUB_DEFAULT="gnulinux-advanced-9912e5fa-300a-4311-a7df-612754946075>gnulinux-5.13.19-3-pve-advanced-9912e5fa-300a-4311-a7df-612754946075"
Save and close the file, update the GRUB with
update-grub
and reboot.

Unfortunately this did not work for me. I no longer get the "timeout waiting on systemd" error, but it throws up

"TASK ERROR: start failed: command '/usr/bin/kvm -id 123 -name emby -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/123.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/123.pid -daemonize -smbios 'type=1,uuid=eda27b91-dee9-40e3-a080-d0724a987a80' -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/123.vnc,password=on' -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 4096 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=4678ff30-3a41-4fc0-8481-679a6a6107a7' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'nec-usb-xhci,id=xhci,bus=pci.1,addr=0x1b' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'usb-host,bus=xhci.0,hostbus=3,hostport=4,id=usb0' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:ec04b86f0bb' -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/zvol/rpool/data/vm-123-disk-0,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap123i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=56:2A:4B:46:14:31,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102' -machine 'type=pc+pve0'' failed: got timeout"

I'm using ZFS like a previous user who still had the issue.

EDIT: for anyone else with this issue using ZFS, all you have to do is what the user above mentioned (Proxmox also said I should run “proxmox-boot-tool refresh” to refresh the boot options so I did that, too) rebooted and chose the correct one from the options and I can run the VMs normally. Fun times.

tristank · Feb 5, 2022

coffeedragonfly said:
Unfortunately this did not work for me. I no longer get the "timeout waiting on systemd" error, but it throws up

"TASK ERROR: start failed: command '/usr/bin/kvm -id 123 -name emby -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/123.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/123.pid -daemonize -smbios 'type=1,uuid=eda27b91-dee9-40e3-a080-d0724a987a80' -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/123.vnc,password=on' -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 4096 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=4678ff30-3a41-4fc0-8481-679a6a6107a7' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'nec-usb-xhci,id=xhci,bus=pci.1,addr=0x1b' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'usb-host,bus=xhci.0,hostbus=3,hostport=4,id=usb0' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:ec04b86f0bb' -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/zvol/rpool/data/vm-123-disk-0,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap123i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=56:2A:4B:46:14:31,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102' -machine 'type=pc+pve0'' failed: got timeout"

I'm using ZFS like a previous user who still had the issue.

On ZFS installation systemd-boot is in use for booting not grub. So as it turns out you can define the default boot entry by selecting the entry in the systemd-boot menu and hit "d" for default. Would have been nice to to see this in the wiki-page.

d select the default entry to boot (stored in a non-volatile EFI variable)

Quote from here.

housy · Feb 5, 2022

Thanks this helped me too

vale.maio2 said:
Open a terminal to your Proxmox server. Launch the command
grep menuentry /boot/grub/grub.cfg
and in there you'll need two things:
View attachment 33923
The first one is the id_option from the Advanced options for Proxmox VE GNU/Linux line (circled in red, number 1), the second one is the id_option for the last known working kernel (circled in red number 2, which for me was 5.13.19-3-pve). make sure not to grab the ID for the recovery mode kernel.

Modify the /etc/default/grub file with
/etc/default/grub
(or whatever your favourite text editor is). From there, delete the line that says
GRUB_DEFAULT=0
and replace it with

GRUB_DEFAULT="menu entry ID>kernel ID"

and use the two IDs you grabbed above. Don't forget to separate the IDs with the sign >.
In my case, the line looks like this:
GRUB_DEFAULT="gnulinux-advanced-9912e5fa-300a-4311-a7df-612754946075>gnulinux-5.13.19-3-pve-advanced-9912e5fa-300a-4311-a7df-612754946075"
Save and close the file, update the GRUB with
update-grub
and reboot.

Thanks, this helped me too

coffeedragonfly · Feb 5, 2022

tristank said:
On ZFS installation systemd-boot is in use for booting not grub. So as it turns out you can define the default boot entry by selecting the entry in the systemd-boot menu and hit "d" for default. Would have been nice to to see this in the wiki-page.

Quote from here.

Yep, this did it! Thank you so much!

eshrath · Feb 7, 2022

hACKhIDE said:
Yep, rollback to 5.13.19-3-pve kernel and now works fine. Thanks.

I am new to Proxmox and facing the same issue. Any steps on how to rollback to the previous kernel version? Thanks.

[SOLVED] VM not starting after upgrading kernel and reboot - timeout waiting on systemd

Member

Member

New Member

Member

New Member

Member

Member

Attachments

Member

New Member

Member

Active Member

Member

New Member

Member

Active Member

New Member

Active Member

New Member

New Member

New Member

We value your privacy