Guest CPU soft lockups after update

runlevel · Apr 30, 2025

I recently updated a cluster from PVE 7.3 to 8.3 (updated to 7.4 first of course, but did the upgrade to 8 pretty quickly after). Since then we've been seeing "freezes" on one of the Rocky Linux 8 qemu guests of up to about 25 seconds long. During the freezes the guest is unresponsive via either the Web console or over the network. The longer freezes are also accompanied with the message

Code:

watchdog: BUG: soft lockup - CPU #0 stuck for 21s!

printed to the console and to any users logged in over SSH. I've been searching the forums and wiki for any solution and so far I've tried the following:

1. Set all storage devices to iothread=1,aio=threads per https://bugzilla.proxmox.com/show_bug.cgi?id=1453
2. After noticing log messages to the effect of

Code:

Apr 25 14:27:40 hercules5 pmxcfs[4643]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/hercules5/herc5-sata: -1

which seemed to correlate to the second of the lockups, I deleted and let the system regenerate the RRD DBs to no success
3. Set intel_iommu=off in the kernel parameters per https://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_8.2

None of the previous attempts have made any apparent impact in the frequency or duration of the freezes the guest experiences. There is only one other guest on the same node and it does not appear to have the same issues. Below are some hardware details about the node:

- Dell Poweredge R820
- 4 x Intel(R) Xeon(R) CPU E5-4657L v2 @ 2.40GHz
- 16 x 32GB DDR3 1866 MHz RAM

and the config for the affect VM

Code:

boot: order=virtio0
cores: 42
ide2: none,media=cdrom
memory: 393216
name: zeus
net0: virtio=A2:1A:38:0B:19:B4,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=51b73292-6b94-473b-8a68-b0636b0f70ae
sockets: 2
virtio0: herc5-ssd:144/vm-144-disk-0.raw,aio=threads,iothread=1,size=32G
virtio1: herc5-ssd:144/vm-144-disk-2.raw,aio=threads,iothread=1,size=5T
virtio2: herc5-ssd:144/vm-144-disk-1.raw,aio=threads,iothread=1,size=2T
virtio3: herc5-sata:144/vm-144-disk-1.raw,aio=threads,iothread=1,size=2000G
virtio4: herc5-sata:144/vm-144-disk-0.raw,aio=threads,iothread=1,size=5T
vmgenid: 151eedb0-c0d2-43b4-8fef-6ff679d5e764

Any help is appreciated and I'd be happy to provide any other details that would help.

runlevel · May 12, 2025

Some new information: I've observed that after performing a cold boot of the VM it takes close to 24 hours before the soft lockups start occurring. Taking a look at the memory graph, it correlates at least roughly with the RAM usage reaching it's peak. I don't suppose this is supposed to stay at the peak like this, any theories as to why it might be behaving this way?

Search

Search

Guest CPU soft lockups after update

runlevel

New Member

runlevel

New Member

We value your privacy