VMs Freezing randomly - Live migration unfreezes it.

eddor

Member
Sep 21, 2021
18
1
8
49
Hello everyone.
We are having this issue on a couple of clusters.
VMs will randomly freeze, no network response, CPU stuck between 100 and 102%.
When we start Live migration to a different Node, migration works flawlessly
and VM starts working just fine on new node.
While VM is frozen, there is no log activity on the OS inside the VM
OS is various ubuntu versions, from 12 to 20.
This happens also on a different storage: local-lvm, NFS, LVM over iSCSI LUN.

This is the version information on the last issue:

# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.19.17-1-pve)
pve-manager: 7.3-4 (running version: 7.3-4/d69b70d4)
pve-kernel-helper: 7.3-2
pve-kernel-5.15: 7.3-1
pve-kernel-5.19: 7.2-14
pve-kernel-5.19.17-1-pve: 5.19.17-1
pve-kernel-5.19.7-2-pve: 5.19.7-2
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.64-1-pve: 5.15.64-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
openvswitch-switch: 2.15.0+ds1-2+deb11u2
proxmox-backup-client: 2.3.2-1
proxmox-backup-file-restore: 2.3.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-2
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.7-pve3


# qm config 7110
agent: 1
balloon: 8192
boot: order=virtio0;ide2
cores: 8
ide2: none,media=cdrom
memory: 65536
meta: creation-qemu=7.1.0,ctime=1675781477
name: statsdata03.sw
net0: virtio=C6:BB:7B:39:03:00,bridge=vmbr0,mtu=9000,tag=25
numa: 0
ostype: l26
scsihw: virtio-scsi-single
smbios1: uuid=e1996e5b-0152-4590-8caf-83229d59e377
sockets: 1
virtio0: local-lvm:vm-7110-disk-0,backup=0,discard=on,iothread=1,size=3584G
vmgenid: 7785e57f-9800-4ab9-bf8b-81a5cf7024d9


While VM was frozen, we took some strace output from this VM and others that were running fine.
This is the summary of 10 seconds, we thought it was strange:

Frozen VM:

Code:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.52   32.214249        1482     21735           ppoll
  1.86    0.615001          40     15164           write
  0.38    0.125828          34      3700           recvmsg
  0.24    0.077628          19      3900           read
  0.00    0.000074           5        14         6 futex
  0.00    0.000069           0        73           sendmsg
  0.00    0.000022           1        16           close
  0.00    0.000017           1        15           accept4
  0.00    0.000010           0        30           fcntl
  0.00    0.000005           0        15           getsockname
------ ----------- ----------- --------- --------- ----------------
100.00   33.032903         739     44662         6 total

Running VM:

Code:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 43.51    0.325470        1179       276           ioctl
 30.79    0.230316          11     19617           ppoll
 19.08    0.142736        3244        44         3 futex
  5.93    0.044384       44384         1         1 restart_syscall
  0.41    0.003098           1      2177           write
  0.12    0.000900           1       581           read
  0.11    0.000823           1       523           recvmsg
  0.01    0.000095          11         8           io_submit
  0.01    0.000075          75         1           clone
  0.00    0.000035           3        10           sendmsg
  0.00    0.000025           4         6           fdatasync
  0.00    0.000016           8         2           rt_sigprocmask
  0.00    0.000010           5         2           close
  0.00    0.000007           7         1           prctl
  0.00    0.000007           3         2           accept4
  0.00    0.000005           5         1           madvise
  0.00    0.000003           1         2           getsockname
  0.00    0.000003           0         4           fcntl
  0.00    0.000003           3         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.748011          32     23259         4 total

I hope someone has seen this before, and can give us a clue.
It's a strange issue, specially because it gets resolved by just migrating.
Thanks.
 
Hello everyone.
We are having this issue on a couple of clusters.
VMs will randomly freeze, no network response, CPU stuck between 100 and 102%.
When we start Live migration to a different Node, migration works flawlessly
and VM starts working just fine on new node.
While VM is frozen, there is no log activity on the OS inside the VM
OS is various ubuntu versions, from 12 to 20.
This happens also on a different storage: local-lvm, NFS, LVM over iSCSI LUN.

This is the version information on the last issue:

# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.19.17-1-pve)
pve-manager: 7.3-4 (running version: 7.3-4/d69b70d4)
pve-kernel-helper: 7.3-2
pve-kernel-5.15: 7.3-1
pve-kernel-5.19: 7.2-14
pve-kernel-5.19.17-1-pve: 5.19.17-1
pve-kernel-5.19.7-2-pve: 5.19.7-2
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.64-1-pve: 5.15.64-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
openvswitch-switch: 2.15.0+ds1-2+deb11u2
proxmox-backup-client: 2.3.2-1
proxmox-backup-file-restore: 2.3.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-2
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.7-pve3


# qm config 7110
agent: 1
balloon: 8192
boot: order=virtio0;ide2
cores: 8
ide2: none,media=cdrom
memory: 65536
meta: creation-qemu=7.1.0,ctime=1675781477
name: statsdata03.sw
net0: virtio=C6:BB:7B:39:03:00,bridge=vmbr0,mtu=9000,tag=25
numa: 0
ostype: l26
scsihw: virtio-scsi-single
smbios1: uuid=e1996e5b-0152-4590-8caf-83229d59e377
sockets: 1
virtio0: local-lvm:vm-7110-disk-0,backup=0,discard=on,iothread=1,size=3584G
vmgenid: 7785e57f-9800-4ab9-bf8b-81a5cf7024d9


While VM was frozen, we took some strace output from this VM and others that were running fine.
This is the summary of 10 seconds, we thought it was strange:

Frozen VM:

Code:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.52   32.214249        1482     21735           ppoll
  1.86    0.615001          40     15164           write
  0.38    0.125828          34      3700           recvmsg
  0.24    0.077628          19      3900           read
  0.00    0.000074           5        14         6 futex
  0.00    0.000069           0        73           sendmsg
  0.00    0.000022           1        16           close
  0.00    0.000017           1        15           accept4
  0.00    0.000010           0        30           fcntl
  0.00    0.000005           0        15           getsockname
------ ----------- ----------- --------- --------- ----------------
100.00   33.032903         739     44662         6 total

Running VM:

Code:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 43.51    0.325470        1179       276           ioctl
 30.79    0.230316          11     19617           ppoll
 19.08    0.142736        3244        44         3 futex
  5.93    0.044384       44384         1         1 restart_syscall
  0.41    0.003098           1      2177           write
  0.12    0.000900           1       581           read
  0.11    0.000823           1       523           recvmsg
  0.01    0.000095          11         8           io_submit
  0.01    0.000075          75         1           clone
  0.00    0.000035           3        10           sendmsg
  0.00    0.000025           4         6           fdatasync
  0.00    0.000016           8         2           rt_sigprocmask
  0.00    0.000010           5         2           close
  0.00    0.000007           7         1           prctl
  0.00    0.000007           3         2           accept4
  0.00    0.000005           5         1           madvise
  0.00    0.000003           1         2           getsockname
  0.00    0.000003           0         4           fcntl
  0.00    0.000003           3         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.748011          32     23259         4 total

I hope someone has seen this before, and can give us a clue.
It's a strange issue, specially because it gets resolved by just migrating.
Thanks.
What does your virsh dumpxml vmname look like?
 
Hi,
Hello everyone.
We are having this issue on a couple of clusters.
VMs will randomly freeze, no network response, CPU stuck between 100 and 102%.
When we start Live migration to a different Node, migration works flawlessly
and VM starts working just fine on new node.
While VM is frozen, there is no log activity on the OS inside the VM
OS is various ubuntu versions, from 12 to 20.
This happens also on a different storage: local-lvm, NFS, LVM over iSCSI LUN.
that does sound strange indeed. FYI, the opt-in kernel 5.19 has been superseded by 6.1. Does the issue occur with different kernel versions? What CPUs do the nodes in your clusters have?

From the strace output it seems that the QEMU process is stuck in a loop polling for something. You could install debugger and debug symbols with apt install pve-qemu-kvm-dbg gdb and then run
Code:
gdb --batch --ex 't a a bt' --ex 'break qemu_poll_ns' --ex 'c' --ex 'set $n = nfds' --ex 'p *fds@$n' -p $(cat /var/run/qemu-server/<ID>.pid)
ls -l /proc/$(cat /var/run/qemu-server/<ID>.pid)/fd
replacing <ID> with the ID of the frozen VM in both commands. Maybe this tells us more.
 
Hi, thank you for your response.
We did a series of changes. On some hosts upgrading to 6.1 kernel, some adding custom limits as suggested here
So far we had this issue re-occurring on 6.1 kernels that didn't have custom limits config, but it might be a coincidence.
We are now in the process of changing default AIO policy to "native" since we use LVM over iSCSI LUNs.
We are hoping this might be related also.
These are the CPU types we have:

# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 56
On-line CPU(s) list: 0-55
Thread(s) per core: 2
Core(s) per socket: 14
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz
Stepping: 4
CPU MHz: 3300.059
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
BogoMIPS: 5200.00
Virtualization: VT-x
L1d cache: 896 KiB
L1i cache: 896 KiB
L2 cache: 28 MiB
L3 cache: 38.5 MiB
NUMA node0 CPU(s): 0-13,28-41
NUMA node1 CPU(s): 14-27,42-55
Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflus
h dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constan
t_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aper
fmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pd
cm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx
f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_s
ingle pti ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_a
d fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a a
vx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx
512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_m
bm_local dtherm ida arat pln pts pku ospke md_clear flush_l1d

# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz
Stepping: 4
CPU MHz: 3700.000
CPU max MHz: 3700.0000
CPU min MHz: 1200.0000
BogoMIPS: 6800.00
Virtualization: VT-x
L1d cache: 384 KiB
L1i cache: 384 KiB
L2 cache: 12 MiB
L3 cache: 38.5 MiB
NUMA node0 CPU(s): 0-5,12-17
NUMA node1 CPU(s): 6-11,18-23
Vulnerability Itlb multihit: KVM: Mitigation: Split huge pages
Vulnerability L1tf: Mitigation; PTE Inversion; VMX conditional cach
e flushes, SMT vulnerable
Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown: Mitigation; PTI
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Retbleed: Mitigation; IBRS
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled v
ia prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user
pointer sanitization
Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, RSB filling
, PBRSB-eIBRS Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtr
r pge mca cmov pat pse36 clflush dts acpi mmx f
xsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rd
tscp lm constant_tsc art arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc cpuid aperf
mperf pni pclmulqdq dtes64 ds_cpl vmx smx est t
m2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_
1 sse4_2 x2apic movbe popcnt tsc_deadline_timer
aes xsave avx f16c rdrand lahf_lm abm 3dnowpre
fetch cpuid_fault epb cat_l3 cdp_l3 invpcid_sin
gle pti ssbd mba ibrs ibpb stibp tpr_shadow vnm
i flexpriority ept vpid ept_ad fsgsbase tsc_adj
ust bmi1 hle avx2 smep bmi2 erms invpcid rtm cq
m mpx rdt_a avx512f avx512dq rdseed adx smap cl
flushopt clwb intel_pt avx512cd avx512bw avx512
vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_o
ccup_llc cqm_mbm_total cqm_mbm_local dtherm ida
arat pln pts pku ospke md_clear flush_l1d

I'll try gdb command the next time ti happens, and I'll update here.
Thank you very much
 
We had another frozen vm, this is the output of gdb:

root@pvea42:~# gdb --batch --ex 't a a bt' --ex 'break qemu_poll_ns' --ex 'c' --ex 'set $n = nfds' --ex 'p *fds@$n' -p $(cat /var/run/qemu-server/$VMID.pid)
[New LWP 3335572]
[New LWP 3335593]
[New LWP 3335594]
[New LWP 3335595]
[New LWP 3335596]
[New LWP 3335597]
[New LWP 3335598]
[New LWP 3335601]

warning: Could not load vsyscall page because no executable was specified
0x00007f5e4a82ce26 in ?? ()

Thread 9 (LWP 3335601 "vnc_worker"):
#0 0x00007f5e4a91f7b2 in ?? ()
#1 0x0000000000043c64 in ?? ()
#2 0x0000000000000000 in ?? ()

Thread 8 (LWP 3335598 "CPU 5/KVM"):
#0 0x00007f5e4a82e5f7 in ?? ()
#1 0x000055a46106e817 in ?? ()
#2 0x000055a461a1e140 in ?? ()
#3 0x000055a462516e10 in ?? ()
#4 0x000055a400000010 in ?? ()
#5 0x00007f5e3c969330 in ?? ()
#6 0x00007f5e3c9692c0 in ?? ()
#7 0x314761b2bcd76c00 in ?? ()
#8 0x000055a462516e10 in ?? ()
#9 0x314761b2bcd76c00 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 7 (LWP 3335597 "CPU 4/KVM"):
#0 0x00007f5e4a82e5f7 in ?? ()
#1 0x000055a46106e817 in ?? ()
#2 0x000055a461a1e140 in ?? ()
#3 0x000055a462516e10 in ?? ()
#4 0x000055a400000010 in ?? ()
#5 0x00007f5e3d16a330 in ?? ()
#6 0x00007f5e3d16a2c0 in ?? ()
#7 0x314761b2bcd76c00 in ?? ()
#8 0x000055a462516e10 in ?? ()
#9 0x314761b2bcd76c00 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 6 (LWP 3335596 "CPU 3/KVM"):
#0 0x00007f5e4a82e5f7 in ?? ()
#1 0x000055a46106e817 in ?? ()
#2 0x000055a461a1e140 in ?? ()
#3 0x000055a462516e10 in ?? ()
#4 0x000055a400000010 in ?? ()
#5 0x00007f5e3d96b330 in ?? ()
#6 0x00007f5e3d96b2c0 in ?? ()
#7 0x314761b2bcd76c00 in ?? ()
#8 0x000055a462516e10 in ?? ()
#9 0x314761b2bcd76c00 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 5 (LWP 3335595 "CPU 2/KVM"):
#0 0x00007f5e4a82e5f7 in ?? ()
#1 0x000055a46106e817 in ?? ()
#2 0x000055a461a1e140 in ?? ()
#3 0x000055a462516e10 in ?? ()
#4 0x000055a400000010 in ?? ()
#5 0x00007f5e3e16c330 in ?? ()
#6 0x00007f5e3e16c2c0 in ?? ()
#7 0x314761b2bcd76c00 in ?? ()
#8 0x000055a462516e10 in ?? ()
#9 0x314761b2bcd76c00 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 4 (LWP 3335594 "CPU 1/KVM"):
#0 0x00007f5e4a82e5f7 in ?? ()
#1 0x000055a46106e817 in ?? ()
#2 0x000055a461a1e140 in ?? ()
#3 0x000055a462516e10 in ?? ()
#4 0x000055a400000010 in ?? ()
#5 0x00007f5e3e96d330 in ?? ()
#6 0x00007f5e3e96d2c0 in ?? ()
#7 0x314761b2bcd76c00 in ?? ()
#8 0x000055a462516e10 in ?? ()
#9 0x314761b2bcd76c00 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 3 (LWP 3335593 "CPU 0/KVM"):
#0 0x00007f5e4a82e5f7 in ?? ()
#1 0x000055a46106e817 in ?? ()
#2 0x000055a461a1e140 in ?? ()
#3 0x00000000fffffffc in ?? ()
#4 0x000055a400000010 in ?? ()
#5 0x00007f5e3f3ae330 in ?? ()
#6 0x00007f5e3f3ae2c0 in ?? ()
#7 0x314761b2bcd76c00 in ?? ()
#8 0x000055a462a0d110 in ?? ()
#9 0x314761b2bcd76c00 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 2 (LWP 3335572 "call_rcu"):
#0 0x00007f5e4a8322e9 in ?? ()
#1 0x000055a4611f0bda in ?? ()
#2 0x0000000000000000 in ?? ()

Thread 1 (LWP 3335571 "kvm"):
#0 0x00007f5e4a82ce26 in ?? ()
#1 0xffffffff641130a9 in ?? ()
#2 0x000055a462ab1890 in ?? ()
#3 0x0000000000000051 in ?? ()
#4 0x0000000000000000 in ?? ()
No symbol table is loaded. Use the "file" command.
Make breakpoint pending on future shared library load? (y or [n]) [answered N; input not from terminal]










[New LWP 479189]
[New LWP 479190]
[LWP 479189 exited]
[New LWP 479189]
[New LWP 479190]
[LWP 479189 exited]

Thread 3 "CPU 0/KVM" received signal SIGUSR1, User defined signal 1.
[Switching to LWP 3335593]
0x00007f5e4a82e5f7 in ?? ()
No symbol table is loaded. Use the "file" command.
No symbol table is loaded. Use the "file" command.
[Inferior 1 (process 3335571) detached]

root@pvea42:~# ls -l /proc/$(cat /var/run/qemu-server/$VMID.pid)/fd
total 0
lrwx------ 1 root root 64 Mar 15 03:39 0 -> /dev/null
lrwx------ 1 root root 64 Mar 15 03:41 1 -> /dev/null
lrwx------ 1 root root 64 Mar 15 03:41 10 -> 'socket:[95833406]'
lrwx------ 1 root root 64 Mar 15 03:41 100 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 101 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 102 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 103 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 104 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 105 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 106 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 107 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 108 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 109 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 11 -> 'socket:[95838364]'
lrwx------ 1 root root 64 Mar 15 03:41 110 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 111 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 112 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 113 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 12 -> 'socket:[95821028]'
lrwx------ 1 root root 64 Mar 15 03:41 13 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 14 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 15 -> /dev/kvm
lr-x------ 1 root root 64 Mar 15 03:41 16 -> 'pipe:[95856816]'
lrwx------ 1 root root 64 Mar 15 03:41 17 -> anon_inode:kvm-vm
lrwx------ 1 root root 64 Mar 15 03:41 18 -> /dev/net/tun
lrwx------ 1 root root 64 Mar 15 03:41 19 -> /dev/vhost-net
lrwx------ 1 root root 64 Mar 15 03:41 2 -> 'socket:[95821033]'
lrwx------ 1 root root 64 Mar 15 03:41 20 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 21 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 22 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 23 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 24 -> anon_inode:kvm-vcpu:0
lrwx------ 1 root root 64 Mar 15 03:41 25 -> anon_inode:kvm-vcpu:1
lrwx------ 1 root root 64 Mar 15 03:41 26 -> anon_inode:kvm-vcpu:2
lrwx------ 1 root root 64 Mar 15 03:41 27 -> anon_inode:kvm-vcpu:3
lrwx------ 1 root root 64 Mar 15 03:41 28 -> anon_inode:kvm-vcpu:4
lrwx------ 1 root root 64 Mar 15 03:41 29 -> anon_inode:kvm-vcpu:5
lrwx------ 1 root root 64 Mar 15 03:41 3 -> 'socket:[95833405]'
lrwx------ 1 root root 64 Mar 15 03:41 30 -> /dev/dm-62
lrwx------ 1 root root 64 Mar 15 03:41 31 -> 'socket:[95821032]'
lrwx------ 1 root root 64 Mar 15 03:41 32 -> 'socket:[95821033]'
lrwx------ 1 root root 64 Mar 15 03:41 33 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 34 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 35 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 36 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 37 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 38 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 39 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 4 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 40 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 41 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 42 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 43 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 46 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 47 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 48 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 49 -> 'anon_inode:[eventfd]'
l-wx------ 1 root root 64 Mar 15 03:41 5 -> /run/qemu-server/1099.pid
lrwx------ 1 root root 64 Mar 15 03:41 50 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 51 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 52 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 53 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 54 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 55 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 56 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 57 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 58 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 59 -> 'anon_inode:[eventfd]'
l-wx------ 1 root root 64 Mar 15 03:41 6 -> 'pipe:[95859741]'
lrwx------ 1 root root 64 Mar 15 03:41 60 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 61 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 62 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 63 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 64 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 65 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 66 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 67 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 68 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 69 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 7 -> 'anon_inode:[signalfd]'
lrwx------ 1 root root 64 Mar 15 03:41 70 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 71 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 72 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 73 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 74 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 75 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 76 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 77 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 78 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 79 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 8 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 80 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 81 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 82 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 83 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 84 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 85 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 86 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 87 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 88 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 89 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 9 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 90 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 91 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 92 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 93 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 94 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 95 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 96 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 97 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 98 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Mar 15 03:41 99 -> 'anon_inode:[eventfd]'
 
Seems like the debug symbols pve-qemu-kvm-dbg are not yet installed.

Regarding the file descriptor limit, it seems the process still has a lot of room (the maximum in your output is 113) until reaching 1024.
 
Hi
It was installed, but on this node we installed it while the process was frozen already.
I'll have it ready for the next re-occurrence
 
Hi
It was installed, but on this node we installed it while the process was frozen already.
That shouldn't actually matter. If you run the GDB command with the debug symbols installed, it should work. What could be the case is that the VM was started with an older version and thus the installed debug symbols didn't match with the already running binary.
I'll have it ready for the next re-occurrence
 
We went without a single issue for a few weeks, but just now it happened again.
This is a 7.3 version cluster, we upgraded it a few weeks ago, and also applied limits.conf settings.
This is the gdb command output:

root@pveb21:~# gdb --batch --ex 't a a bt' --ex 'break qemu_poll_ns' --ex 'c' --ex 'set $n = nfds' --ex 'p *fds@$n' -p $(cat /var/run/qemu-server/419.pid)
[New LWP 40291]
[New LWP 40326]
[New LWP 40327]
[New LWP 40328]
[New LWP 40329]
[New LWP 40330]
[New LWP 40331]
[New LWP 40332]
[New LWP 40333]
[New LWP 40336]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f5cf8e69e26 in __ppoll (fds=0x55db9f5f1600, nfds=94, timeout=<optimized out>, timeout@entry=0x7ffe2e6dce70, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
44 ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.

Thread 11 (Thread 0x7f4cc67bf700 (LWP 40336) "vnc_worker"):
#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x55db9fb1f22c) at ../sysdeps/nptl/futex-internal.h:186
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55db9fb1f238, cond=0x55db9fb1f200) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=cond@entry=0x55db9fb1f200, mutex=mutex@entry=0x55db9fb1f238) at pthread_cond_wait.c:638
#3 0x000055db9cdee9cb in qemu_cond_wait_impl (cond=0x55db9fb1f200, mutex=0x55db9fb1f238, file=0x55db9ce65434 "../ui/vnc-jobs.c", line=248) at ../util/qemu-thread-posix.c:220
#4 0x000055db9c87d5c3 in vnc_worker_thread_loop (queue=0x55db9fb1f200) at ../ui/vnc-jobs.c:248
#5 0x000055db9c87e288 in vnc_worker_thread (arg=arg@entry=0x55db9fb1f200) at ../ui/vnc-jobs.c:361
#6 0x000055db9cdede89 in qemu_thread_start (args=0x7f4cc67ba3f0) at ../util/qemu-thread-posix.c:505
#7 0x00007f5cf9829ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8 0x00007f5cf8e75a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 10 (Thread 0x7f4cdddfb700 (LWP 40333) "CPU 7/KVM"):
#0 0x00007f5cf8e6b5f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055db9cc66997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55db9f642710, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055db9cc66b01 in kvm_cpu_exec (cpu=cpu@entry=0x55db9f642710) at ../accel/kvm/kvm-all.c:2850
#3 0x000055db9cc6817d in kvm_vcpu_thread_fn (arg=arg@entry=0x55db9f642710) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055db9cdede89 in qemu_thread_start (args=0x7f4cdddf63f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f5cf9829ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f5cf8e75a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 9 (Thread 0x7f4cde5fc700 (LWP 40332) "CPU 6/KVM"):
#0 0x00007f5cf8e6b5f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055db9cc66997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55db9f63a710, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055db9cc66b01 in kvm_cpu_exec (cpu=cpu@entry=0x55db9f63a710) at ../accel/kvm/kvm-all.c:2850
#3 0x000055db9cc6817d in kvm_vcpu_thread_fn (arg=arg@entry=0x55db9f63a710) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055db9cdede89 in qemu_thread_start (args=0x7f4cde5f73f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f5cf9829ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f5cf8e75a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 8 (Thread 0x7f4cdedfd700 (LWP 40331) "CPU 5/KVM"):
#0 0x00007f5cf8e6b5f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055db9cc66997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55db9f6325f0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055db9cc66b01 in kvm_cpu_exec (cpu=cpu@entry=0x55db9f6325f0) at ../accel/kvm/kvm-all.c:2850
#3 0x000055db9cc6817d in kvm_vcpu_thread_fn (arg=arg@entry=0x55db9f6325f0) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055db9cdede89 in qemu_thread_start (args=0x7f4cdedf83f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f5cf9829ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f5cf8e75a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 7 (Thread 0x7f4cdf5fe700 (LWP 40330) "CPU 4/KVM"):
#0 0x00007f5cf8e6b5f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055db9cc66997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55db9f62a930, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055db9cc66b01 in kvm_cpu_exec (cpu=cpu@entry=0x55db9f62a930) at ../accel/kvm/kvm-all.c:2850
#3 0x000055db9cc6817d in kvm_vcpu_thread_fn (arg=arg@entry=0x55db9f62a930) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055db9cdede89 in qemu_thread_start (args=0x7f4cdf5f93f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f5cf9829ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f5cf8e75a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7f4cdfdff700 (LWP 40329) "CPU 3/KVM"):
#0 0x00007f5cf8e6b5f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055db9cc66997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55db9f622b40, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055db9cc66b01 in kvm_cpu_exec (cpu=cpu@entry=0x55db9f622b40) at ../accel/kvm/kvm-all.c:2850
#3 0x000055db9cc6817d in kvm_vcpu_thread_fn (arg=arg@entry=0x55db9f622b40) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055db9cdede89 in qemu_thread_start (args=0x7f4cdfdfa3f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f5cf9829ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f5cf8e75a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7f5cec86f700 (LWP 40328) "CPU 2/KVM"):
#0 0x00007f5cf8e6b5f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055db9cc66997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55db9f61ae90, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055db9cc66b01 in kvm_cpu_exec (cpu=cpu@entry=0x55db9f61ae90) at ../accel/kvm/kvm-all.c:2850
#3 0x000055db9cc6817d in kvm_vcpu_thread_fn (arg=arg@entry=0x55db9f61ae90) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055db9cdede89 in qemu_thread_start (args=0x7f5cec86a3f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f5cf9829ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f5cf8e75a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f5ced070700 (LWP 40327) "CPU 1/KVM"):
#0 0x00007f5cf8e6b5f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055db9cc66997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55db9f613070, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055db9cc66b01 in kvm_cpu_exec (cpu=cpu@entry=0x55db9f613070) at ../accel/kvm/kvm-all.c:2850
#3 0x000055db9cc6817d in kvm_vcpu_thread_fn (arg=arg@entry=0x55db9f613070) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055db9cdede89 in qemu_thread_start (args=0x7f5ced06b3f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f5cf9829ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f5cf8e75a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f5cedab1700 (LWP 40326) "CPU 0/KVM"):
#0 0x00007f5cf8e6b5f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055db9cc66997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55db9f5e3230, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055db9cc66b01 in kvm_cpu_exec (cpu=cpu@entry=0x55db9f5e3230) at ../accel/kvm/kvm-all.c:2850
#3 0x000055db9cc6817d in kvm_vcpu_thread_fn (arg=arg@entry=0x55db9f5e3230) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055db9cdede89 in qemu_thread_start (args=0x7f5cedaac3f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f5cf9829ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f5cf8e75a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f5cee3b3700 (LWP 40291) "call_rcu"):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x000055db9cdef04a in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /build/pve-qemu/pve-qemu-kvm-7.2.0/include/qemu/futex.h:29
#2 qemu_event_wait (ev=ev@entry=0x55db9d650328 <rcu_call_ready_event>) at ../util/qemu-thread-posix.c:430
#3 0x000055db9cdf794a in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:261
#4 0x000055db9cdede89 in qemu_thread_start (args=0x7f5cee3ae3f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f5cf9829ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f5cf8e75a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f5cee51e1c0 (LWP 40290) "kvm"):
#0 0x00007f5cf8e69e26 in __ppoll (fds=0x55db9f5f1600, nfds=94, timeout=<optimized out>, timeout@entry=0x7ffe2e6dce70, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
#1 0x000055db9ce02e11 in ppoll (__ss=0x0, __timeout=0x7ffe2e6dce70, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=988384) at ../util/qemu-timer.c:351
#3 0x000055db9ce00675 in os_host_main_loop_wait (timeout=988384) at ../util/main-loop.c:315
#4 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:606
#5 0x000055db9ca1d191 in qemu_main_loop () at ../softmmu/runstate.c:739
#6 0x000055db9c856aa7 in qemu_default_main () at ../softmmu/main.c:37
#7 0x00007f5cf8d9cd0a in __libc_start_main (main=0x55db9c851c60 <main>, argc=73, argv=0x7ffe2e6dd038, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe2e6dd028) at ../csu/libc-start.c:308
#8 0x000055db9c8569da in _start ()
Breakpoint 1 at 0x55db9ce02da0: file ../util/qemu-timer.c, line 336.

Thread 1 "kvm" hit Breakpoint 1, qemu_poll_ns (fds=0x55db9f5f1600, nfds=94, timeout=timeout@entry=0) at ../util/qemu-timer.c:336
336 ../util/qemu-timer.c: No such file or directory.
$1 = {{fd = 4, events = 1, revents = 0}, {fd = 9, events = 1, revents = 0}, {fd = 11, events = 1, revents = 0}, {fd = 12, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 16, events = 1, revents = 0}, {fd = 17, events = 1, revents = 0}, {fd = 23, events = 1, revents = 0}, {fd = 25, events = 1, revents = 0}, {fd = 29, events = 1, revents = 0}, {fd = 31, events = 1, revents = 0}, {fd = 40, events = 1, revents = 0}, {fd = 41, events = 1, revents = 0}, {fd = 42, events = 1, revents = 0}, {fd = 45, events = 1, revents = 0}, {fd = 46, events = 1, revents = 0}, {fd = 47, events = 1, revents = 0}, {fd = 48, events = 1, revents = 0}, {fd = 49, events = 1, revents = 0}, {fd = 50, events = 1, revents = 0}, {fd = 51, events = 1, revents = 0}, {fd = 52, events = 1, revents = 0}, {fd = 53, events = 1, revents = 0}, {fd = 54, events = 1, revents = 0}, {fd = 55, events = 1, revents = 0}, {fd = 56, events = 1, revents = 0}, {fd = 57, events = 1, revents = 0}, {fd = 58, events = 1, revents = 0}, {fd = 59, events = 1, revents = 0}, {fd = 60, events = 1, revents = 0}, {fd = 61, events = 1, revents = 0}, {fd = 62, events = 1, revents = 0}, {fd = 63, events = 1, revents = 0}, {fd = 64, events = 1, revents = 0}, {fd = 65, events = 1, revents = 0}, {fd = 66, events = 1, revents = 0}, {fd = 67, events = 1, revents = 0}, {fd = 68, events = 1, revents = 0}, {fd = 69, events = 1, revents = 0}, {fd = 70, events = 1, revents = 0}, {fd = 71, events = 1, revents = 0}, {fd = 72, events = 1, revents = 0}, {fd = 73, events = 1, revents = 0}, {fd = 74, events = 1, revents = 0}, {fd = 75, events = 1, revents = 0}, {fd = 76, events = 1, revents = 0}, {fd = 77, events = 1, revents = 0}, {fd = 78, events = 1, revents = 0}, {fd = 79, events = 1, revents = 0}, {fd = 80, events = 1, revents = 0}, {fd = 81, events = 1, revents = 0}, {fd = 82, events = 1, revents = 0}, {fd = 83, events = 1, revents = 0}, {fd = 84, events = 1, revents = 0}, {fd = 85, events = 1, revents = 0}, {fd = 86, events = 1, revents = 0}, {fd = 87, events = 1, revents = 0}, {fd = 88, events = 1, revents = 0}, {fd = 89, events = 1, revents = 0}, {fd = 90, events = 1, revents = 0}, {fd = 91, events = 1, revents = 0}, {fd = 92, events = 1, revents = 0}, {fd = 93, events = 1, revents = 0}, {fd = 94, events = 1, revents = 0}, {fd = 95, events = 1, revents = 0}, {fd = 96, events = 1, revents = 0}, {fd = 97, events = 1, revents = 0}, {fd = 98, events = 1, revents = 0}, {fd = 99, events = 1, revents = 0}, {fd = 100, events = 1, revents = 0}, {fd = 101, events = 1, revents = 0}, {fd = 102, events = 1, revents = 0}, {fd = 103, events = 1, revents = 0}, {fd = 104, events = 1, revents = 0}, {fd = 105, events = 1, revents = 0}, {fd = 106, events = 1, revents = 0}, {fd = 115, events = 1, revents = 0}, {fd = 116, events = 1, revents = 0}, {fd = 117, events = 1, revents = 0}, {fd = 118, events = 1, revents = 0}, {fd = 119, events = 1, revents = 0}, {fd = 120, events = 1, revents = 0}, {fd = 121, events = 1, revents = 0}, {fd = 122, events = 1, revents = 0}, {fd = 131, events = 1, revents = 0}, {fd = 132, events = 1, revents = 0}, {fd = 133, events = 1, revents = 0}, {fd = 134, events = 1, revents = 0}, {fd = 135, events = 1, revents = 0}, {fd = 136, events = 1, revents = 0}, {fd = 137, events = 1, revents = 0}, {fd = 138, events = 1, revents = 0}, {fd = 147, events = 1, revents = 0}, {fd = 148, events = 1, revents = 0}}
[Inferior 1 (process 40290) detached]
 
also the list of file desc.

root@pveb21:~# ls -l /proc/$(cat /var/run/qemu-server/419.pid)/fd
total 0
lrwx------ 1 root root 64 Apr 5 23:29 0 -> /dev/null
lrwx------ 1 root root 64 Apr 5 23:30 1 -> /dev/null
l-wx------ 1 root root 64 Apr 5 23:30 10 -> 'pipe:[403893]'
lrwx------ 1 root root 64 Apr 5 23:30 100 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 101 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 102 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 103 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 104 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 105 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 106 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 107 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 108 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 109 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 11 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 110 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 111 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 112 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 113 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 114 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 115 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 116 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 117 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 118 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 119 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 12 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 120 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 121 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 122 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 123 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 124 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 125 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 126 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 127 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 128 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 129 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 13 -> 'socket:[403916]'
lrwx------ 1 root root 64 Apr 5 23:30 130 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 131 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 132 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 133 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 134 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 135 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 136 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 137 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 138 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 139 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 14 -> 'socket:[378203]'
lrwx------ 1 root root 64 Apr 5 23:30 140 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 141 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 142 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 143 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 144 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 145 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 146 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 147 -> 'socket:[61128500]'
lrwx------ 1 root root 64 Apr 5 23:30 15 -> /dev/dm-118
lrwx------ 1 root root 64 Apr 5 23:30 16 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 18 -> /dev/kvm
lrwx------ 1 root root 64 Apr 5 23:30 19 -> anon_inode:kvm-vm
lrwx------ 1 root root 64 Apr 5 23:30 2 -> 'socket:[403983]'
lrwx------ 1 root root 64 Apr 5 23:30 20 -> /dev/net/tun
lrwx------ 1 root root 64 Apr 5 23:30 21 -> /dev/vhost-net
lrwx------ 1 root root 64 Apr 5 23:30 22 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 23 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 24 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 25 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 26 -> /dev/net/tun
lrwx------ 1 root root 64 Apr 5 23:30 27 -> /dev/vhost-net
lrwx------ 1 root root 64 Apr 5 23:30 28 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 29 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 3 -> 'socket:[403915]'
lrwx------ 1 root root 64 Apr 5 23:30 30 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 31 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 32 -> anon_inode:kvm-vcpu:0
lrwx------ 1 root root 64 Apr 5 23:30 33 -> anon_inode:kvm-vcpu:1
lrwx------ 1 root root 64 Apr 5 23:30 34 -> anon_inode:kvm-vcpu:2
lrwx------ 1 root root 64 Apr 5 23:30 35 -> anon_inode:kvm-vcpu:3
lrwx------ 1 root root 64 Apr 5 23:30 36 -> anon_inode:kvm-vcpu:4
lrwx------ 1 root root 64 Apr 5 23:30 37 -> anon_inode:kvm-vcpu:5
lrwx------ 1 root root 64 Apr 5 23:30 38 -> anon_inode:kvm-vcpu:6
lrwx------ 1 root root 64 Apr 5 23:30 39 -> anon_inode:kvm-vcpu:7
lrwx------ 1 root root 64 Apr 5 23:30 4 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 40 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 41 -> 'socket:[403982]'
lrwx------ 1 root root 64 Apr 5 23:30 42 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 43 -> 'socket:[403983]'
lrwx------ 1 root root 64 Apr 5 23:30 44 -> /dev/dm-117
lrwx------ 1 root root 64 Apr 5 23:30 45 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 46 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 47 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 48 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 49 -> 'anon_inode:[eventfd]'
l-wx------ 1 root root 64 Apr 5 23:30 5 -> /run/qemu-server/419.pid
lrwx------ 1 root root 64 Apr 5 23:30 50 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 51 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 52 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 53 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 54 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 55 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 56 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 57 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 58 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 59 -> 'anon_inode:[eventfd]'
l-wx------ 1 root root 64 Apr 5 23:30 6 -> 'pipe:[403891]'
lrwx------ 1 root root 64 Apr 5 23:30 60 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 61 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 62 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 63 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 64 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 65 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 66 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 67 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 68 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 69 -> 'anon_inode:[eventfd]'
lr-x------ 1 root root 64 Apr 5 23:30 7 -> 'pipe:[403892]'
lrwx------ 1 root root 64 Apr 5 23:30 70 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 71 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 72 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 73 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 74 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 75 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 76 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 77 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 78 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 79 -> 'anon_inode:[eventfd]'
l-wx------ 1 root root 64 Apr 5 23:30 8 -> 'pipe:[402786]'
lrwx------ 1 root root 64 Apr 5 23:30 80 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 81 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 82 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 83 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 84 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 85 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 86 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 87 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 88 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 89 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 9 -> 'anon_inode:[signalfd]'
lrwx------ 1 root root 64 Apr 5 23:30 90 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 91 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 92 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 93 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 94 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 95 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 96 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 97 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 98 -> 'anon_inode:[eventfd]'
lrwx------ 1 root root 64 Apr 5 23:30 99 -> 'anon_inode:[eventfd]'
 
I'm adding information on Node and VM:

root@pveb21:~# pveversion -v
proxmox-ve: 7.3-1 (running kernel: 6.1.15-1-pve)
pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec)
pve-kernel-helper: 7.3-8
pve-kernel-6.1: 7.3-6
pve-kernel-5.15: 7.3-3
pve-kernel-5.19: 7.2-15
pve-kernel-6.1.15-1-pve: 6.1.15-1
pve-kernel-5.19.17-2-pve: 5.19.17-2
pve-kernel-5.19.17-1-pve: 5.19.17-1
pve-kernel-5.15.102-1-pve: 5.15.102-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-2
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-6
libpve-storage-perl: 7.3-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
openvswitch-switch: 2.15.0+ds1-2+deb11u2.1
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.5
pve-cluster: 7.3-2
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.2-7
pve-firmware: 3.6-4
pve-ha-manager: 3.5.1
pve-i18n: 2.8-3
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

# cat /etc/pve/nodes/pveb41/qemu-server/419.conf
agent: 1
balloon: 16384
boot: order=virtio0
cipassword: $5$1lKEV1zn$Z0ufVzU7VobiDAQCKYbQ12xKPpF79orrFpPQwFYYvtD
ciuser: ubuntu
cores: 4
cpu: qemu64,flags=+pcid;+spec-ctrl;+ssbd;+pdpe1gb;+aes
ipconfig0: ip=172.25.0.121/16,gw=172.25.0.254
memory: 65536
name: dragondb02.sw
nameserver: 172.26.4.11
net0: virtio=AA:71:57:A2:D8:45,bridge=vmbr0,tag=313
net1: virtio=72:F5:75:34:B1:7C,bridge=vmbr0,tag=604
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
searchdomain: sw
smbios1: uuid=13e6a902-232a-45ad-aef2-84dc9bf37ef5
sockets: 2
virtio0: lunpxprdb01:vm-419-disk-0,aio=native,size=30G
virtio1: lunpxprdb01:vm-419-disk-1,aio=native,size=1000G
vmgenid: 73e3ee98-1b4a-405e-a6ba-6518278dc45e
 
Another issue. A server freeze, live migration unfreezes it.
Node has latest 6.2 kernel and PVE 7.4

root@pvea44:~# VMID=1099
root@pvea44:~# gdb --batch --ex 't a a bt' --ex 'break qemu_poll_ns' --ex 'c' --ex 'set $n = nfds' --ex 'p *fds@$n' -p $(cat /var/run/qemu-server/${VMID}.pid)
[New LWP 15556]
[New LWP 15582]
[New LWP 15583]
[New LWP 15584]
[New LWP 15585]
[New LWP 15586]
[New LWP 15587]
[New LWP 15590]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f27b0a68e26 in __ppoll (fds=0x55e9c3eecee0, nfds=82, timeout=<optimized out>, timeout@entry=0x7ffc99d203b0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
44 ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.

Thread 9 (Thread 0x7f1e7f3bf700 (LWP 15590) "vnc_worker"):
#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x55e9c3fa27fc) at ../sysdeps/nptl/futex-internal.h:186
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55e9c3fa2808, cond=0x55e9c3fa27d0) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=cond@entry=0x55e9c3fa27d0, mutex=mutex@entry=0x55e9c3fa2808) at pthread_cond_wait.c:638
#3 0x000055e9c09dd9cb in qemu_cond_wait_impl (cond=0x55e9c3fa27d0, mutex=0x55e9c3fa2808, file=0x55e9c0a54434 "../ui/vnc-jobs.c", line=248) at ../util/qemu-thread-posix.c:220
#4 0x000055e9c046c5c3 in vnc_worker_thread_loop (queue=0x55e9c3fa27d0) at ../ui/vnc-jobs.c:248
#5 0x000055e9c046d288 in vnc_worker_thread (arg=arg@entry=0x55e9c3fa27d0) at ../ui/vnc-jobs.c:361
#6 0x000055e9c09dce89 in qemu_thread_start (args=0x7f1e7f3ba3f0) at ../util/qemu-thread-posix.c:505
#7 0x00007f27b13ebea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8 0x00007f27b0a74a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 8 (Thread 0x7f1e965fc700 (LWP 15587) "CPU 5/KVM"):
#0 0x00007f27b0a6a5f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055e9c0855997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55e9c33682e0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055e9c0855b01 in kvm_cpu_exec (cpu=cpu@entry=0x55e9c33682e0) at ../accel/kvm/kvm-all.c:2850
#3 0x000055e9c085717d in kvm_vcpu_thread_fn (arg=arg@entry=0x55e9c33682e0) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055e9c09dce89 in qemu_thread_start (args=0x7f1e965f73f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f27b13ebea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f27b0a74a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 7 (Thread 0x7f1e96dfd700 (LWP 15586) "CPU 4/KVM"):
#0 0x00007f27b0a6a5f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055e9c0855997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55e9c33605a0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055e9c0855b01 in kvm_cpu_exec (cpu=cpu@entry=0x55e9c33605a0) at ../accel/kvm/kvm-all.c:2850
#3 0x000055e9c085717d in kvm_vcpu_thread_fn (arg=arg@entry=0x55e9c33605a0) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055e9c09dce89 in qemu_thread_start (args=0x7f1e96df83f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f27b13ebea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f27b0a74a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7f1e975fe700 (LWP 15585) "CPU 3/KVM"):
#0 0x00007f27b0a6a5f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055e9c0855997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55e9c3358770, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055e9c0855b01 in kvm_cpu_exec (cpu=cpu@entry=0x55e9c3358770) at ../accel/kvm/kvm-all.c:2850
#3 0x000055e9c085717d in kvm_vcpu_thread_fn (arg=arg@entry=0x55e9c3358770) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055e9c09dce89 in qemu_thread_start (args=0x7f1e975f93f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f27b13ebea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f27b0a74a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7f1e97dff700 (LWP 15584) "CPU 2/KVM"):
#0 0x00007f27b0a6a5f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055e9c0855997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55e9c3350ac0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055e9c0855b01 in kvm_cpu_exec (cpu=cpu@entry=0x55e9c3350ac0) at ../accel/kvm/kvm-all.c:2850
#3 0x000055e9c085717d in kvm_vcpu_thread_fn (arg=arg@entry=0x55e9c3350ac0) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055e9c09dce89 in qemu_thread_start (args=0x7f1e97dfa3f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f27b13ebea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f27b0a74a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f27a4c4a700 (LWP 15583) "CPU 1/KVM"):
#0 0x00007f27b0a6a5f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055e9c0855997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55e9c3348ca0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055e9c0855b01 in kvm_cpu_exec (cpu=cpu@entry=0x55e9c3348ca0) at ../accel/kvm/kvm-all.c:2850
#3 0x000055e9c085717d in kvm_vcpu_thread_fn (arg=arg@entry=0x55e9c3348ca0) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055e9c09dce89 in qemu_thread_start (args=0x7f27a4c453f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f27b13ebea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f27b0a74a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f27a568b700 (LWP 15582) "CPU 0/KVM"):
#0 0x00007f27b0a6a5f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055e9c0855997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55e9c3319800, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055e9c0855b01 in kvm_cpu_exec (cpu=cpu@entry=0x55e9c3319800) at ../accel/kvm/kvm-all.c:2850
#3 0x000055e9c085717d in kvm_vcpu_thread_fn (arg=arg@entry=0x55e9c3319800) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055e9c09dce89 in qemu_thread_start (args=0x7f27a56863f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f27b13ebea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f27b0a74a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f27a5f8d700 (LWP 15556) "call_rcu"):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x000055e9c09de04a in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /build/pve-qemu/pve-qemu-kvm-7.2.0/include/qemu/futex.h:29
#2 qemu_event_wait (ev=ev@entry=0x55e9c123f328 <rcu_call_ready_event>) at ../util/qemu-thread-posix.c:430
#3 0x000055e9c09e694a in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:261
#4 0x000055e9c09dce89 in qemu_thread_start (args=0x7f27a5f883f0) at ../util/qemu-thread-posix.c:505
#5 0x00007f27b13ebea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f27b0a74a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f27a60f81c0 (LWP 15555) "kvm"):
#0 0x00007f27b0a68e26 in __ppoll (fds=0x55e9c3eecee0, nfds=82, timeout=<optimized out>, timeout@entry=0x7ffc99d203b0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
#1 0x000055e9c09f1e11 in ppoll (__ss=0x0, __timeout=0x7ffc99d203b0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=990590) at ../util/qemu-timer.c:351
#3 0x000055e9c09ef675 in os_host_main_loop_wait (timeout=990590) at ../util/main-loop.c:315
#4 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:606
#5 0x000055e9c060c191 in qemu_main_loop () at ../softmmu/runstate.c:739
#6 0x000055e9c0445aa7 in qemu_default_main () at ../softmmu/main.c:37
#7 0x00007f27b099bd0a in __libc_start_main (main=0x55e9c0440c60 <main>, argc=65, argv=0x7ffc99d20578, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc99d20568) at ../csu/libc-start.c:308
#8 0x000055e9c04459da in _start ()
Breakpoint 1 at 0x55e9c09f1da0: file ../util/qemu-timer.c, line 336.

Thread 1 "kvm" hit Breakpoint 1, qemu_poll_ns (fds=0x55e9c3eecee0, nfds=82, timeout=timeout@entry=0) at ../util/qemu-timer.c:336
336 ../util/qemu-timer.c: No such file or directory.
$1 = {{fd = 4, events = 1, revents = 0}, {fd = 9, events = 1, revents = 0}, {fd = 11, events = 1, revents = 0}, {fd = 12, events = 1, revents = 0}, {fd = 13, events = 1, revents = 0}, {fd = 14, events = 1, revents = 0}, {fd = 15, events = 1, revents = 0}, {fd = 16, events = 1, revents = 0}, {fd = 22, events = 1, revents = 0}, {fd = 24, events = 1, revents = 0}, {fd = 31, events = 1, revents = 0}, {fd = 32, events = 1, revents = 0}, {fd = 33, events = 1, revents = 0}, {fd = 36, events = 1, revents = 0}, {fd = 37, events = 1, revents = 0}, {fd = 38, events = 1, revents = 0}, {fd = 39, events = 1, revents = 0}, {fd = 40, events = 1, revents = 0}, {fd = 41, events = 1, revents = 0}, {fd = 42, events = 1, revents = 0}, {fd = 43, events = 1, revents = 0}, {fd = 44, events = 1, revents = 0}, {fd = 45, events = 1, revents = 0}, {fd = 46, events = 1, revents = 0}, {fd = 47, events = 1, revents = 0}, {fd = 48, events = 1, revents = 0}, {fd = 49, events = 1, revents = 0}, {fd = 50, events = 1, revents = 0}, {fd = 51, events = 1, revents = 0}, {fd = 52, events = 1, revents = 0}, {fd = 53, events = 1, revents = 0}, {fd = 54, events = 1, revents = 0}, {fd = 55, events = 1, revents = 0}, {fd = 56, events = 1, revents = 0}, {fd = 57, events = 1, revents = 0}, {fd = 58, events = 1, revents = 0}, {fd = 59, events = 1, revents = 0}, {fd = 60, events = 1, revents = 0}, {fd = 61, events = 1, revents = 0}, {fd = 62, events = 1, revents = 0}, {fd = 63, events = 1, revents = 0}, {fd = 64, events = 1, revents = 0}, {fd = 65, events = 1, revents = 0}, {fd = 66, events = 1, revents = 0}, {fd = 67, events = 1, revents = 0}, {fd = 68, events = 1, revents = 0}, {fd = 69, events = 1, revents = 0}, {fd = 70, events = 1, revents = 0}, {fd = 71, events = 1, revents = 0}, {fd = 72, events = 1, revents = 0}, {fd = 73, events = 1, revents = 0}, {fd = 74, events = 1, revents = 0}, {fd = 75, events = 1, revents = 0}, {fd = 76, events = 1, revents = 0}, {fd = 77, events = 1, revents = 0}, {fd = 78, events = 1, revents = 0}, {fd = 79, events = 1, revents = 0}, {fd = 80, events = 1, revents = 0}, {fd = 81, events = 1, revents = 0}, {fd = 82, events = 1, revents = 0}, {fd = 83, events = 1, revents = 0}, {fd = 84, events = 1, revents = 0}, {fd = 85, events = 1, revents = 0}, {fd = 86, events = 1, revents = 0}, {fd = 87, events = 1, revents = 0}, {fd = 88, events = 1, revents = 0}, {fd = 89, events = 1, revents = 0}, {fd = 90, events = 1, revents = 0}, {fd = 91, events = 1, revents = 0}, {fd = 92, events = 1, revents = 0}, {fd = 93, events = 1, revents = 0}, {fd = 94, events = 1, revents = 0}, {fd = 95, events = 1, revents = 0}, {fd = 96, events = 1, revents = 0}, {fd = 97, events = 1, revents = 0}, {fd = 104, events = 1, revents = 0}, {fd = 105, events = 1, revents = 0}, {fd = 106, events = 1, revents = 0}, {fd = 107, events = 1, revents = 0}, {fd = 108, events = 1, revents = 0}, {fd = 109, events = 1, revents = 0}, {fd = 114, events = 1, revents = 0}}
[Inferior 1 (process 15555) detached]
 
Unfortunately, I'm not able to see anything particularly special in the GDB output. I get similar output with a VM that's running fine.

Is there anything in /var/log/syslog (host or guest) around the time the VM gets stuck?

and also applied limits.conf settings.
What exactly do you mean here?

The configs for the two VMs you posted rather different. A shot in the dark, but did the issue ever happen with a VM without a virtio<N> disk (or do all of your VMs have that?)?
 
Hi
Thanks for your response.
On syslog we have no event, nothing, for the entire time the VM is frozen, there is a gap on timestamp.
It can be minutes or hours, depending on how much time we wait until migrating it.
On Host Node we have this:


Code:
Apr 29 06:59:40 pvea44 pvedaemon[2015313]: VM 1099 qmp command failed - VM 1099 qmp command 'guest-ping' failed - got timeout
Apr 29 06:59:41 pvea44 pvedaemon[1999193]: VM 1099 qmp command failed - VM 1099 qmp command 'guest-ping' failed - got timeout
Apr 29 06:59:43 pvea44 pvedaemon[1869176]: VM 1099 qmp command failed - VM 1099 qmp command 'guest-ping' failed - got timeout
Apr 29 06:59:43 pvea44 pvedaemon[2015313]: VM 1099 qmp command failed - VM 1099 qmp command 'guest-ping' failed - got timeout
Apr 29 06:59:46 pvea44 pvedaemon[1999193]: VM 1099 qmp command failed - VM 1099 qmp command 'guest-ping' failed - got timeout
Apr 29 06:59:49 pvea44 pvedaemon[1869176]: VM 1099 qmp command failed - VM 1099 qmp command 'guest-ping' failed - got timeout
Apr 29 06:59:49 pvea44 pvedaemon[1999193]: VM 1099 qmp command failed - VM 1099 qmp command 'guest-ping' failed - got timeout
Apr 29 06:59:49 pvea44 pvedaemon[2015313]: VM 1099 qmp command failed - VM 1099 qmp command 'guest-ping' failed - got timeout
Apr 29 07:00:09 pvea44 pvedaemon[1999193]: VM 1099 qmp command failed - VM 1099 qmp command 'guest-ping' failed - got timeout
Apr 29 07:00:29 pvea44 pvedaemon[1869176]: VM 1099 qmp command failed - VM 1099 qmp command 'guest-ping' failed - got timeout
Apr 29 07:06:40 pvea44 pvedaemon[1999193]: <systems@pve> starting task UPID:pvea44:001EF36F:0995CE83:644CA5E0:qmigrate:1099:systems@pve:
Apr 29 07:08:33 pvea44 ovs-vsctl[2028872]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port fwln1099i1
Apr 29 07:08:33 pvea44 ovs-vsctl[2028872]: ovs|00002|db_ctl_base|ERR|no port named fwln1099i1
Apr 29 07:08:33 pvea44 ovs-vsctl[2028873]: ovs|00001|vsctl|INFO|Called as /usr/bin/ovs-vsctl del-port tap1099i1
Apr 29 07:08:35 pvea44 pvedaemon[1999193]: <systems@pve> end task UPID:pvea44:001EF36F:0995CE83:644CA5E0:qmigrate:1099:systems@pve: OK
Apr 29 07:08:36 pvea44 systemd[1]: 1099.scope: Succeeded.
Apr 29 07:08:36 pvea44 systemd[1]: 1099.scope: Consumed 1w 2d 3h 14min 35.974s CPU time.

I mentioned before the limits.conf, we have applied the solution found here.

All of our VMs use virtio disks, although we have switched AIO from default to "native"

We are currently adding intel-microcode to all Nodes, after enabling non-free debian repo.
We tried it on one of our clusters and it seems to be ok, no re-occurrence so far.
 
We are currently adding intel-microcode to all Nodes, after enabling non-free debian repo.
We tried it on one of our clusters and it seems to be ok, no re-occurrence so far.
Hope that it helps!
 
@eddor - I'm struggling with a similar issue, although I haven't tried to migrate the frozen guests to see if they spring back into life.

To rule out this as the same issue: Do you assign multiple cores to the guests, and if so, is it a single core that registeres 100% utilisation? I've been reading with interest the threads about the Jasper Lake issue, but that doesn't apply to me (Intel i7-7700). I've been dealing with a hard reset of the guests, which can run for either a couple of minutes, or days before locking up like this.
 
Hi @walacio, yes, we normally assign multiple CPUs to VMs, and yes, we can see CPU usage goes up when it happens.
our last few issues were on a cluster that still don't have intel-microcode installed, we suspect it might be related.
we have another cluster, that has the microcode installed and it seems to be just fine for a few weeks now.
 
oh, another detail I forgot to mention, one of our clusters is an old version, 7.1
It never failed. never had this issue, running for years.
 
Hi @walacio, yes, we normally assign multiple CPUs to VMs, and yes, we can see CPU usage goes up when it happens.
our last few issues were on a cluster that still don't have intel-microcode installed, we suspect it might be related.
we have another cluster, that has the microcode installed and it seems to be just fine for a few weeks now.
Sorry - to clarify: an artefact of this particular issue for me is that *one* of the multiple virtual cores assigned to the guest sits at 100%, the others are idle. Interestingly, watching htop on the host, the QEMU process associated with the VM that's using 100% of a core, moves around different physical cores.

So: in the virtual environment, when it freezes, it's a single core (at random) that has 100% use and as long as it's hung, that never changes.
On the physical host, the corresponding process running at 100% swaps around the physical CPU cores every few seconds. What that means, I don't know - it suggests to me that *something* is happening, or at least Proxmox is trying to service the process (polling?).

There's no updated microcode available for me, I'm pretty sure it's included in the Dell BIOS for the host. I've installed the intel-microcode package and although it's not showing as being loaded with the kernel, lscpu is showing that the correct microcode has been loaded, so presumably in BIOS.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!