[SOLVED] VM not starting after upgrading kernel and reboot - timeout waiting on systemd

vale.maio2

Member
Feb 4, 2022
20
7
8
36
Hi all, quite a noob here and running Proxmox on my own home server so please bear with me.
After upgrading the Proxmox kernel this morning (via a simple apt upgrade) and rebooting, one of my 2 VM is refusing to start. From this post I've tried running a systemctl stop 100.slice command (100 is the ID of the offending VM), but it's still showing up as

Bash:
root@server:~# systemctl status qemu.slice
● qemu.slice
     Loaded: loaded
     Active: active since Fri 2022-02-04 08:44:52 GMT; 9min ago
      Tasks: 31
     Memory: 586.6M
        CPU: 33.558s
     CGroup: /qemu.slice
             ├─100.scope
             │ └─1691 [kvm]
             └─101.scope
               └─2015 /usr/bin/kvm -id 101 -name Pihole -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/101.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reco>

And of course, when trying to restart it I get the following error:
root@server:~# qm start 100 timeout waiting on systemd
For what is worth, I'm running pve-manager/7.1-10/6ddebafe (running kernel: 5.13.19-4-pve), on a Dell T710 server, witha a hardware RAID5 configuration.
If you need any more details I'll be happy to oblige.

EDIT: just in case you need pveversion --verbose:

Bash:
root@server:~# pveversion --verbose
proxmox-ve: 7.1-1 (running kernel: 5.13.19-4-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-9
pve-kernel-5.13: 7.1-7
pve-kernel-5.11: 7.0-10
pve-kernel-5.13.19-4-pve: 5.13.19-8
pve-kernel-5.13.19-3-pve: 5.13.19-7
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-4-pve: 5.11.22-9
ceph: 16.2.7
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-2
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-1
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-5
pve-cluster: 7.1-3
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-4
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-1
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1
root@server:~#
 
Last edited:
Ok, so it could have been the kernel. The update installed Linux 5.13.19-4-pve, which is when my VM broke. I've now restarted the server using Linux 5.13.19-3-pve, and the VM is up and running as if nothing happened. For now I'll remove that kernel version and mark it as not to be installed.
Would there be any explanation for this?
 
How to do that?
Open a terminal to your Proxmox server. Launch the command
grep menuentry /boot/grub/grub.cfg
and in there you'll need two things:
Cattura.JPG
The first one is the id_option from the Advanced options for Proxmox VE GNU/Linux line (circled in red, number 1), the second one is the id_option for the last known working kernel (circled in red number 2, which for me was 5.13.19-3-pve). make sure not to grab the ID for the recovery mode kernel.

Modify the /etc/default/grub file with
nano /etc/default/grub
(or whatever your favourite text editor is). From there, delete the line that says
GRUB_DEFAULT=0
and replace it with

GRUB_DEFAULT="menu entry ID>kernel ID"

and use the two IDs you grabbed above. Don't forget to separate the IDs with the sign >.
In my case, the line looks like this:
GRUB_DEFAULT="gnulinux-advanced-9912e5fa-300a-4311-a7df-612754946075>gnulinux-5.13.19-3-pve-advanced-9912e5fa-300a-4311-a7df-612754946075"
Save and close the file, update the GRUB with
update-grub
and reboot.
 

Attachments

  • Cattura.PNG
    Cattura.PNG
    68.2 KB · Views: 127
Last edited:
Yep, rollback to 5.13.19-3-pve kernel

Open a terminal to your Proxmox server. Launch the command
grep menuentry /boot/grub/grub.cfg
and in there you'll need two things:
View attachment 33903
The first one is the id_option from the menuentry line (circled in red, number 1), the second one is the id_option for the last known working kernel (which for me was 5.13.19-3-pve). make sure not to grab the ID for the recovery mode kernel.

Modify the /etc/default/grub file with
/etc/default/grub
(or whatever your favourite text editor is). From there, delete the line that says
GRUB_DEFAULT=0
and replace it with

GRUB_DEFAULT="menu entry ID>kernel ID"

and use the two IDs you grabbed above. Don't forget to separate the IDs with the sign >.
In my case, the line looks like this:
GRUB_DEFAULT="gnulinux-advanced-9912e5fa-300a-4311-a7df-612754946075>gnulinux-5.13.19-3-pve-advanced-9912e5fa-300a-4311-a7df-612754946075"
Save and close the file, update the GRUB with
update-grub
and reboot.

Thanks, works!
 
  • Like
Reactions: vale.maio2
I can confirm I had the same issue and rollback to 5.13.19-3-pve kernel solved the problem. Thanks a lot!
Just a note: In the instruction above on how to rollback to 5.13.19-3 in the picture there is a slight error. Circle 1 should be on the respective part in one line below. The one that starts with "gnulinux-advanced-..."
 
Thanks all for the advice, my HBA passthrough stopped working post kernel reboot. Rolling back to 5.13.19-3-pve is working fine.
 
I just followed your instructions but this won't work for a zfs installation with efistub. Any suggestions how to do this? Also there seems to be missmatch between the instructions in the image and the example. In the image you recommend the menuentry gnulinux-simple-******** and in the example you choose a submenu entry like gnulinux-advanced-******.
 
Last edited:
  • Like
Reactions: vale.maio2
Also there seems to be missmatch between the instructions in the image and the example. In the image you recommend the menuentry gnulinux-simple-******** and in the example you choose a submenu entry like gnulinux-advanced-******.
Woopsie you're right, I've corrected the image, thanks for spotting that.

I just followed your instructions but this won't work for a zfs installation with efistub. Any suggestions how to do this?
No I'm afraid, I'm not too familiar with ZFS.
 
Thanks all for the advice, my HBA passthrough stopped working post kernel reboot. Rolling back to 5.13.19-3-pve is working fine.
Mine did as well! I had to backtrack back to 5.3.19-3-pve to get it back to working.
 
Hello, came looking for this issue. I have the same exact problem and it was refusing to restart even. Reversing back the kernel worked. Hope this will get fixed
 
I also have the same issue after upgrading to kernel 5.13.19-4
My linux VM's boot fine, but a windows 10 and a windows 11 vm do not boot. Strangely a windows server 2019 vm does boot fine.
I had to revert back to kernel 5.13.19-3 to get everything working again.

It would be really nice if Proxmox could release a simple kernel removal tool to remove recently installed buggy kernels so that previous working kernels can be booted by default. At present there is no simple way to do this when booting ZFS with Systemd-boot.
proxmox-boot-tool also does not actually remove a buggy kernel - maybe proxmox should update this tool to do this.

When trying to remove the buggy kernel with:

Code:
apt remove pve-kernel-5.13.19-4-pve

I get a message that it also wants to remove the following:

Code:
root@pve0:~# apt remove pve-kernel-5.13.19-4-pve
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libzpool4linux pve-kernel-5.11.22-3-pve pve-kernel-5.11.22-5-pve pve-kernel-5.13.19-1-pve
Use 'apt autoremove' to remove them.
The following packages will be REMOVED:
  proxmox-ve pve-kernel-5.13 pve-kernel-5.13.19-4-pve
0 upgraded, 0 newly installed, 3 to remove and 0 not upgraded.
After this operation, 328 MB disk space will be freed.
Do you want to continue? [Y/n]

I do not want to proceed with this as it also wants to remove the packages proxmox-ve & pve-kernel-5.13 which I need to run kernel 5.13.19-3.
 
Last edited:
Open a terminal to your Proxmox server. Launch the command
grep menuentry /boot/grub/grub.cfg
and in there you'll need two things:
View attachment 33923
The first one is the id_option from the Advanced options for Proxmox VE GNU/Linux line (circled in red, number 1), the second one is the id_option for the last known working kernel (circled in red number 2, which for me was 5.13.19-3-pve). make sure not to grab the ID for the recovery mode kernel.

Modify the /etc/default/grub file with
/etc/default/grub
(or whatever your favourite text editor is). From there, delete the line that says
GRUB_DEFAULT=0
and replace it with

GRUB_DEFAULT="menu entry ID>kernel ID"

and use the two IDs you grabbed above. Don't forget to separate the IDs with the sign >.
In my case, the line looks like this:
GRUB_DEFAULT="gnulinux-advanced-9912e5fa-300a-4311-a7df-612754946075>gnulinux-5.13.19-3-pve-advanced-9912e5fa-300a-4311-a7df-612754946075"
Save and close the file, update the GRUB with
update-grub
and reboot.
Unfortunately this did not work for me. I no longer get the "timeout waiting on systemd" error, but it throws up

"TASK ERROR: start failed: command '/usr/bin/kvm -id 123 -name emby -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/123.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/123.pid -daemonize -smbios 'type=1,uuid=eda27b91-dee9-40e3-a080-d0724a987a80' -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/123.vnc,password=on' -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 4096 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=4678ff30-3a41-4fc0-8481-679a6a6107a7' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'nec-usb-xhci,id=xhci,bus=pci.1,addr=0x1b' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'usb-host,bus=xhci.0,hostbus=3,hostport=4,id=usb0' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:ec04b86f0bb' -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/zvol/rpool/data/vm-123-disk-0,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap123i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=56:2A:4B:46:14:31,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102' -machine 'type=pc+pve0'' failed: got timeout"

I'm using ZFS like a previous user who still had the issue.

EDIT: for anyone else with this issue using ZFS, all you have to do is what the user above mentioned (Proxmox also said I should run “proxmox-boot-tool refresh” to refresh the boot options so I did that, too) rebooted and chose the correct one from the options and I can run the VMs normally. Fun times.
 
Last edited:
Unfortunately this did not work for me. I no longer get the "timeout waiting on systemd" error, but it throws up

"TASK ERROR: start failed: command '/usr/bin/kvm -id 123 -name emby -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/123.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/123.pid -daemonize -smbios 'type=1,uuid=eda27b91-dee9-40e3-a080-d0724a987a80' -smp '4,sockets=1,cores=4,maxcpus=4' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/123.vnc,password=on' -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 4096 -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'vmgenid,guid=4678ff30-3a41-4fc0-8481-679a6a6107a7' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'nec-usb-xhci,id=xhci,bus=pci.1,addr=0x1b' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'usb-host,bus=xhci.0,hostbus=3,hostport=4,id=usb0' -device 'VGA,id=vga,bus=pci.0,addr=0x2' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:ec04b86f0bb' -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' -device 'virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5' -drive 'file=/dev/zvol/rpool/data/vm-123-disk-0,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on' -device 'scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap123i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=56:2A:4B:46:14:31,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102' -machine 'type=pc+pve0'' failed: got timeout"

I'm using ZFS like a previous user who still had the issue.

On ZFS installation systemd-boot is in use for booting not grub. So as it turns out you can define the default boot entry by selecting the entry in the systemd-boot menu and hit "d" for default. Would have been nice to to see this in the wiki-page.

dselect the default entry to boot (stored in a non-volatile EFI variable)

Quote from here.
 
Last edited:
Thanks this helped me too
Open a terminal to your Proxmox server. Launch the command
grep menuentry /boot/grub/grub.cfg
and in there you'll need two things:
View attachment 33923
The first one is the id_option from the Advanced options for Proxmox VE GNU/Linux line (circled in red, number 1), the second one is the id_option for the last known working kernel (circled in red number 2, which for me was 5.13.19-3-pve). make sure not to grab the ID for the recovery mode kernel.

Modify the /etc/default/grub file with
/etc/default/grub
(or whatever your favourite text editor is). From there, delete the line that says
GRUB_DEFAULT=0
and replace it with

GRUB_DEFAULT="menu entry ID>kernel ID"

and use the two IDs you grabbed above. Don't forget to separate the IDs with the sign >.
In my case, the line looks like this:
GRUB_DEFAULT="gnulinux-advanced-9912e5fa-300a-4311-a7df-612754946075>gnulinux-5.13.19-3-pve-advanced-9912e5fa-300a-4311-a7df-612754946075"
Save and close the file, update the GRUB with
update-grub
and reboot.
Thanks, this helped me too
 
  • Like
Reactions: vale.maio2

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!