VM doesn't start Proxmox 6 - timeout waiting on systemd

Just to be clear: Did you make this change on the PVE host or the guest VMs?

Edit: Changed that cstate setting on my PVE host, updated grub, and rebooted. I've got three VMs that won't start up already. No dice.
For whole host of course. I don't know but it's solved all my issues.
 
I moved everything to base Ubuntu with KVM and libvirtd/virt-manager. I didn't have any containers, just KVM machines. Now, I can actually go on holidays without things locking up on me. I started Proxmox on version2. Pity they won't accept there is an issue here.
 
Just this week I encountered a similar problem on a single virtual machine (linux). In the course of finding out the reasons, it was found that the problem occurred due to the enabled memory option "ballooning device". This option is enabled by default and I just forgot to disable it.
 
Just this week I encountered a similar problem on a single virtual machine (linux). In the course of finding out the reasons, it was found that the problem occurred due to the enabled memory option "ballooning device". This option is enabled by default and I just forgot to disable it.

I had ballooning off on guests that locked up.
 
Does the release of 6.3 solve this issue? Anyone here that had the issue and upgraded?

Gerald
 
Does the release of 6.3 solve this issue? Anyone here that had the issue and upgraded?

Gerald

I started having this problem recently on 6.2 after changing my storage to be ZFS on the proxmox host rather than LVM on iSCSI. I upgraded to 6.3 yesterday and am having the problem again this morning.
 
Hi there,

I've got the same problem on 6.3-2 just tonight. Web GUI ist reachable, two VMs are not (out of 6). Just the error TASK ERROR: timeout waiting on systemd is coming up.
I cannot guess how this is coming up, because there is no info there nor a log. Right now I added the /etc/modprobe.d/vhost-net.conf workaround from the problem of PVE 5 and hope that this will be better.

Any news to this?
 
We were getting this on snapshots that we were using a script to take. We had to add a sleep function between each qm command to resolve this issue. With backups it doesn't happen often but was suggested to switch from Stop backup to snapshot backup type.
 
I can duplicate this pretty easy. Sorry for the lack of any details; home lab user here and my troubleshooting is a little limited.

Fresh install of Proxmox, single node.

Server Specs:
SuperMicro Server
Dual E5-2630 (6 core)
32GB RAM
Seagate Constellation ST91000640NS 1TB 7200 RPM 64MB Cache SATA drives (guest storage 6 drives in ZFS RAID 10)
Kensington 120GB SATA SSD drives (Proxmox system, ZFS RAID 1)

I set up two Ubuntu 20.04 VMs, 8GB RAM, 200GB drive. On both VMs run the stress command.
Code:
stress -d 2 --hdd-bytes 512M
Eventually I'll start to see the "timeout waiting on systemd" errors on the guest console. Sometimes it's as little as 10 minutes other times it can be an hour before it happens.

I couldn't reboot or shutdown the guests that had the "timeout waiting on systemd" error, rebooting the Proxmox server was the only way I could get them to power up again.

My system is Intel so I added "intel_idle.max_cstate=1" to my grub config and it did help. It took longer for a guest to show the "waiting on systemd" on the console but still had an eventual freeze on the guest and had to reboot the system so I could do anything with the guest again.

I then installed the qemu-guest-agent on the Ubuntu guests and enabled the QEMU Guest Agent in the Proxmox guest config. I also disabled Memory Ballooning. After doing these two things I'll get the "waiting on systemd" error in the guest console but I can cancel the stress command, do a reboot, etc. with the guests. The system stays operational with those guests.

Is anyone from Proxmox interested in getting access to the system to test with? I read somewhere that Proxmox can't duplicate so maybe there's something specific with my system that can help.
 
When does the error appear? After shutting down the VM and then starting it again?
If so, could you get the PID after starting the VM (qm list shows it once the VM is running).
Get the output of ps ax | grep <PID>.
Shutdown the VM.
Get the output of ps ax | grep <PID>.
Then start the VM.

The output of all commands would be of interest when the start runs into the timeout.

Once the shutdown is completed, there should be no longer any scope available, but sometimes when starting a VM the scope is still there for some reason. Most likely it is because something is still running in that scope. And when the scope is still there 5 seconds after killing the scope on startup (just to make sure), we run into this timeout.
 
For me, the issue appears to come up when backups are being taken at night. I've also noticed its always the same virtual machine, this is also my only BIOS virtual machine, all others are UEFI.

The virtual machine doesn't respond to qm stop <vmid> and if I try running systemctl stop <vmid>.slice it doesn't do anything. if I kill the PID of the virtualmachine the PID disappears, but the scope stays until the host is rebooted, also the virtual machine cannot be started again.

Currently my only work around is to reboot the Proxmox host every day.

My virtual machine storage is ZFS striped mirrors
Backup mode is snapshot
Problem VM is BIOS and not UEFI
The problem VM shows no errors in it's dmesg
VM doesn't responed to stop/shutdown commands nor systemctl stop <vmid>.slice
Scope stays until host reboot after killing the vm's PID

I'm going to convert this VM from seabios to OVMF and see if it changes anything.
 
This has happened to me since Proxmox 4 I think it was. My setup is a home lab but when I first tried Proxmox I had a workload that did a lot of heavy disk IO with video files running under Ubuntu. I'd run the jobs overnight and I'd wake up in the morning to to find my pfSense and other guest weren't responding.

I tried all day to get this to act up and everything is working good. I get a few "waiting on systemd" errors on the guest console but I could still stop and start the guests I even turned memory ballooning back on, uninstalled the qemu-guest-agent from the Ubuntu guests and removed "intel_idle.max_cstate=1" from the host grub.

Mira, I'll get you the requested information but it might take a while. I had this happen to me twice yesterday and today it's happy...
 
I have this problem as well with Exchange VM's.
My solution (after the timeout message) is always to stop the VM, then kill the slice using systemctl then wait. The slice takes anywhere from 5-20 minutes to go away, then I try to start the VM again and it works. I have this problem with every single Exchange server I host over four ProxMox servers. All other Windows servers start fine.
 
I'm going to convert this VM from seabios to OVMF and see if it changes anything.
I converted from seabios to ovmf and backedup three times now, once manually. During the conversion I had copied my raw VM disk to a fresh VM. Since doing this I haven't had the issue. I'll continue to monitor, but it may be solved for me. Hopefully this helps someone else that stumbles into this thread.
 
Last edited:
Hi there,

I've got the same problem on 6.3-2 just tonight. Web GUI ist reachable, two VMs are not (out of 6). Just the error TASK ERROR: timeout waiting on systemd is coming up.
I cannot guess how this is coming up, because there is no info there nor a log. Right now I added the /etc/modprobe.d/vhost-net.conf workaround from the problem of PVE 5 and hope that this will be better.

Any news to this?
Doh! I just upgraded my GPU-est host (4 x Quadro RTX 8000) to 6.3. No problems yet, but it doesn't usually become a problem until she's got a bit of uptime under her belt. :(
 
Good Morning everyone, ive got the Same Problem... my VM dosent Start anymore pls help, i need to restore the Data anyway
 
A little feedback from here...
I upgraded proxmox to pve-manager/6.3-3/eee5f901 (running kernel: 5.4.73-1-pve). I still got freezes, my rServer hangs unreproducably after indefinable days of runtime. There is no reproducable error to find in the logs. I try to run proxmox without any windows machine since two days, yesterday it suddenly rebooted two VMs which are not overprovisioned. And yes, it is running as nested virtualization.
My host-specs:
netcup root server with AMD-V flag for virtualization
6x AMD EPYC cores
32GB RAM
800GB SSD RAID 1
All VMs actually running in CPU virtualization mode "host" - I switched from kvm, because there where much more problems with it (more freezes, more hangs, more reboots). VirtIO is working on ethernet and SCSI. Ballooning is off. Every VM runs on 1 Core (6 VMs total). Backups are off. Everything runs in LVM (and LVM thin, do not know why - was there after setup).
I cannot tell if c-states are on, because netcup will not tell me.

My plans:
- backup every vm
- install a new proxmox node
- setup the new system
- restore every vm
- monitoring

For now it would be nicer just to make change xy and everythings running fine - but it seems to be impossible.
 
netcup root server with AMD-V flag for virtualization
So not a physical server? What kernel runs on the host, this can well be an issue on netcups environment- not saying it has to be, but can be. Getting the info about distro and kernel in use by them would help.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!