VM doesn't start Proxmox 6 - timeout waiting on systemd

evg32 · Sep 2, 2020

Jay Sullivan said:
Just to be clear: Did you make this change on the PVE host or the guest VMs?

Edit: Changed that cstate setting on my PVE host, updated grub, and rebooted. I've got three VMs that won't start up already. No dice.

For whole host of course. I don't know but it's solved all my issues.

gbr · Sep 8, 2020

I moved everything to base Ubuntu with KVM and libvirtd/virt-manager. I didn't have any containers, just KVM machines. Now, I can actually go on holidays without things locking up on me. I started Proxmox on version2. Pity they won't accept there is an issue here.

wolfgang · Sep 10, 2020

gbr said:
Pity they won't accept there is an issue here.

This is not we don't accept that there is no issue.
But it is hard to fix something that we can't reproduce.

We found something that might correlate with the problem.
see https://forum.proxmox.com/threads/backup-randomly-stops.75252/

gbr · Sep 10, 2020

wolfgang said:
This is not we don't accept that there is no issue.
But it is hard to fix something that we can't reproduce.

We found something that might correlate with the problem.
see https://forum.proxmox.com/threads/backup-randomly-stops.75252/

My apologies, then. There's been no Proxmox employee interaction on here for quite a while, so I thought you'd dropped it.

Gerald

Catwoolfii · Sep 13, 2020

Just this week I encountered a similar problem on a single virtual machine (linux). In the course of finding out the reasons, it was found that the problem occurred due to the enabled memory option "ballooning device". This option is enabled by default and I just forgot to disable it.

gbr · Sep 14, 2020

Catwoolfii said:
Just this week I encountered a similar problem on a single virtual machine (linux). In the course of finding out the reasons, it was found that the problem occurred due to the enabled memory option "ballooning device". This option is enabled by default and I just forgot to disable it.

I had ballooning off on guests that locked up.

gbr · Nov 29, 2020

Does the release of 6.3 solve this issue? Anyone here that had the issue and upgraded?

Gerald

BuzzKillingtonne · Nov 29, 2020

gbr said:
Does the release of 6.3 solve this issue? Anyone here that had the issue and upgraded?

Gerald

I started having this problem recently on 6.2 after changing my storage to be ZFS on the proxmox host rather than LVM on iSCSI. I upgraded to 6.3 yesterday and am having the problem again this morning.

shruxx88 · Nov 30, 2020

Hi there,

I've got the same problem on 6.3-2 just tonight. Web GUI ist reachable, two VMs are not (out of 6). Just the error TASK ERROR: timeout waiting on systemd is coming up.
I cannot guess how this is coming up, because there is no info there nor a log. Right now I added the /etc/modprobe.d/vhost-net.conf workaround from the problem of PVE 5 and hope that this will be better.

Any news to this?

Mohave County Library · Nov 30, 2020

We were getting this on snapshots that we were using a script to take. We had to add a sleep function between each qm command to resolve this issue. With backups it doesn't happen often but was suggested to switch from Stop backup to snapshot backup type.

brdctr · Dec 1, 2020

I can duplicate this pretty easy. Sorry for the lack of any details; home lab user here and my troubleshooting is a little limited.

Fresh install of Proxmox, single node.

Server Specs:
SuperMicro Server
Dual E5-2630 (6 core)
32GB RAM
Seagate Constellation ST91000640NS 1TB 7200 RPM 64MB Cache SATA drives (guest storage 6 drives in ZFS RAID 10)
Kensington 120GB SATA SSD drives (Proxmox system, ZFS RAID 1)

I set up two Ubuntu 20.04 VMs, 8GB RAM, 200GB drive. On both VMs run the stress command.

Code:

stress -d 2 --hdd-bytes 512M

Eventually I'll start to see the "timeout waiting on systemd" errors on the guest console. Sometimes it's as little as 10 minutes other times it can be an hour before it happens.

I couldn't reboot or shutdown the guests that had the "timeout waiting on systemd" error, rebooting the Proxmox server was the only way I could get them to power up again.

My system is Intel so I added "intel_idle.max_cstate=1" to my grub config and it did help. It took longer for a guest to show the "waiting on systemd" on the console but still had an eventual freeze on the guest and had to reboot the system so I could do anything with the guest again.

I then installed the qemu-guest-agent on the Ubuntu guests and enabled the QEMU Guest Agent in the Proxmox guest config. I also disabled Memory Ballooning. After doing these two things I'll get the "waiting on systemd" error in the guest console but I can cancel the stress command, do a reboot, etc. with the guests. The system stays operational with those guests.

Is anyone from Proxmox interested in getting access to the system to test with? I read somewhere that Proxmox can't duplicate so maybe there's something specific with my system that can help.

mira · Dec 1, 2020

When does the error appear? After shutting down the VM and then starting it again?
If so, could you get the PID after starting the VM (qm list shows it once the VM is running).
Get the output of ps ax | grep <PID>.
Shutdown the VM.
Get the output of ps ax | grep <PID>.
Then start the VM.

The output of all commands would be of interest when the start runs into the timeout.

Once the shutdown is completed, there should be no longer any scope available, but sometimes when starting a VM the scope is still there for some reason. Most likely it is because something is still running in that scope. And when the scope is still there 5 seconds after killing the scope on startup (just to make sure), we run into this timeout.

BuzzKillingtonne · Dec 2, 2020

For me, the issue appears to come up when backups are being taken at night. I've also noticed its always the same virtual machine, this is also my only BIOS virtual machine, all others are UEFI.

The virtual machine doesn't respond to qm stop <vmid> and if I try running systemctl stop <vmid>.slice it doesn't do anything. if I kill the PID of the virtualmachine the PID disappears, but the scope stays until the host is rebooted, also the virtual machine cannot be started again.

Currently my only work around is to reboot the Proxmox host every day.

My virtual machine storage is ZFS striped mirrors
Backup mode is snapshot
Problem VM is BIOS and not UEFI
The problem VM shows no errors in it's dmesg
VM doesn't responed to stop/shutdown commands nor systemctl stop <vmid>.slice
Scope stays until host reboot after killing the vm's PID

I'm going to convert this VM from seabios to OVMF and see if it changes anything.

brdctr · Dec 2, 2020

This has happened to me since Proxmox 4 I think it was. My setup is a home lab but when I first tried Proxmox I had a workload that did a lot of heavy disk IO with video files running under Ubuntu. I'd run the jobs overnight and I'd wake up in the morning to to find my pfSense and other guest weren't responding.

I tried all day to get this to act up and everything is working good. I get a few "waiting on systemd" errors on the guest console but I could still stop and start the guests I even turned memory ballooning back on, uninstalled the qemu-guest-agent from the Ubuntu guests and removed "intel_idle.max_cstate=1" from the host grub.

Mira, I'll get you the requested information but it might take a while. I had this happen to me twice yesterday and today it's happy...

twigggs · Dec 2, 2020

I have this problem as well with Exchange VM's.
My solution (after the timeout message) is always to stop the VM, then kill the slice using systemctl then wait. The slice takes anywhere from 5-20 minutes to go away, then I try to start the VM again and it works. I have this problem with every single Exchange server I host over four ProxMox servers. All other Windows servers start fine.

BuzzKillingtonne · Dec 3, 2020

BuzzKillingtonne said:
I'm going to convert this VM from seabios to OVMF and see if it changes anything.

I converted from seabios to ovmf and backedup three times now, once manually. During the conversion I had copied my raw VM disk to a fresh VM. Since doing this I haven't had the issue. I'll continue to monitor, but it may be solved for me. Hopefully this helps someone else that stumbles into this thread.

Jay Sullivan · Dec 14, 2020

shruxx88 said:
Hi there,

I've got the same problem on 6.3-2 just tonight. Web GUI ist reachable, two VMs are not (out of 6). Just the error TASK ERROR: timeout waiting on systemd is coming up.
I cannot guess how this is coming up, because there is no info there nor a log. Right now I added the /etc/modprobe.d/vhost-net.conf workaround from the problem of PVE 5 and hope that this will be better.

Any news to this?

Doh! I just upgraded my GPU-est host (4 x Quadro RTX 8000) to 6.3. No problems yet, but it doesn't usually become a problem until she's got a bit of uptime under her belt.

justini · Dec 16, 2020

Good Morning everyone, ive got the Same Problem... my VM dosent Start anymore pls help, i need to restore the Data anyway

shruxx88 · Dec 18, 2020

A little feedback from here...
I upgraded proxmox to pve-manager/6.3-3/eee5f901 (running kernel: 5.4.73-1-pve). I still got freezes, my rServer hangs unreproducably after indefinable days of runtime. There is no reproducable error to find in the logs. I try to run proxmox without any windows machine since two days, yesterday it suddenly rebooted two VMs which are not overprovisioned. And yes, it is running as nested virtualization.
My host-specs:
netcup root server with AMD-V flag for virtualization
6x AMD EPYC cores
32GB RAM
800GB SSD RAID 1
All VMs actually running in CPU virtualization mode "host" - I switched from kvm, because there where much more problems with it (more freezes, more hangs, more reboots). VirtIO is working on ethernet and SCSI. Ballooning is off. Every VM runs on 1 Core (6 VMs total). Backups are off. Everything runs in LVM (and LVM thin, do not know why - was there after setup).
I cannot tell if c-states are on, because netcup will not tell me.

My plans:
- backup every vm
- install a new proxmox node
- setup the new system
- restore every vm
- monitoring

For now it would be nicer just to make change xy and everythings running fine - but it seems to be impossible.

t.lamprecht · Dec 18, 2020

shruxx88 said:
netcup root server with AMD-V flag for virtualization

So not a physical server? What kernel runs on the host, this can well be an issue on netcups environment- not saying it has to be, but can be. Getting the info about distro and kernel in use by them would help.

VM doesn't start Proxmox 6 - timeout waiting on systemd

Renowned Member

Renowned Member

Proxmox Retired Staff

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Member

Member

New Member

Renowned Member

Proxmox Staff Member

Member

Renowned Member

New Member

Member

Member

New Member

Member

Proxmox Staff Member

We value your privacy