I recently updated a cluster from PVE 7.3 to 8.3 (updated to 7.4 first of course, but did the upgrade to 8 pretty quickly after). Since then we've been seeing "freezes" on one of the Rocky Linux 8 qemu guests of up to about 25 seconds long. During the freezes the guest is unresponsive via either the Web console or over the network. The longer freezes are also accompanied with the message
printed to the console and to any users logged in over SSH. I've been searching the forums and wiki for any solution and so far I've tried the following:
1. Set all storage devices to iothread=1,aio=threads per https://bugzilla.proxmox.com/show_bug.cgi?id=1453
2. After noticing log messages to the effect of
which seemed to correlate to the second of the lockups, I deleted and let the system regenerate the RRD DBs to no success
3. Set intel_iommu=off in the kernel parameters per https://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_8.2
None of the previous attempts have made any apparent impact in the frequency or duration of the freezes the guest experiences. There is only one other guest on the same node and it does not appear to have the same issues. Below are some hardware details about the node:
- Dell Poweredge R820
- 4 x Intel(R) Xeon(R) CPU E5-4657L v2 @ 2.40GHz
- 16 x 32GB DDR3 1866 MHz RAM
and the config for the affect VM
Any help is appreciated and I'd be happy to provide any other details that would help.
Code:
watchdog: BUG: soft lockup - CPU #0 stuck for 21s!
1. Set all storage devices to iothread=1,aio=threads per https://bugzilla.proxmox.com/show_bug.cgi?id=1453
2. After noticing log messages to the effect of
Code:
Apr 25 14:27:40 hercules5 pmxcfs[4643]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/hercules5/herc5-sata: -1
3. Set intel_iommu=off in the kernel parameters per https://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_8.2
None of the previous attempts have made any apparent impact in the frequency or duration of the freezes the guest experiences. There is only one other guest on the same node and it does not appear to have the same issues. Below are some hardware details about the node:
- Dell Poweredge R820
- 4 x Intel(R) Xeon(R) CPU E5-4657L v2 @ 2.40GHz
- 16 x 32GB DDR3 1866 MHz RAM
and the config for the affect VM
Code:
boot: order=virtio0
cores: 42
ide2: none,media=cdrom
memory: 393216
name: zeus
net0: virtio=A2:1A:38:0B:19:B4,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=51b73292-6b94-473b-8a68-b0636b0f70ae
sockets: 2
virtio0: herc5-ssd:144/vm-144-disk-0.raw,aio=threads,iothread=1,size=32G
virtio1: herc5-ssd:144/vm-144-disk-2.raw,aio=threads,iothread=1,size=5T
virtio2: herc5-ssd:144/vm-144-disk-1.raw,aio=threads,iothread=1,size=2T
virtio3: herc5-sata:144/vm-144-disk-1.raw,aio=threads,iothread=1,size=2000G
virtio4: herc5-sata:144/vm-144-disk-0.raw,aio=threads,iothread=1,size=5T
vmgenid: 151eedb0-c0d2-43b4-8fef-6ff679d5e764
Any help is appreciated and I'd be happy to provide any other details that would help.