[SOLVED] VMs freeze with 100% CPU

So, today i was "lucky", too:

About a month ago, i cloned the problematic debian-vm and let it run without a network device so it couldnt interfere with the live-vm. And today, after this vm was up for 28 days, it finally hang, too.

Here are the requested informations:

Code:
strace -c -p $(cat /var/run/qemu-server/191.pid)
strace: Process 50940 attached
^Cstrace: Process 50940 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.71   19.739996       10527      1875           ppoll
  0.17    0.034197           4      6938           write
  0.07    0.013513           7      1698           recvmsg
  0.05    0.009642           5      1876        40 read
  0.00    0.000076           2        33           sendmsg
  0.00    0.000036           4         8           accept4
  0.00    0.000030           3         8           close
  0.00    0.000013           0        16           fcntl
  0.00    0.000009           1         8           getsockname
------ ----------- ----------- --------- --------- ----------------
100.00   19.797512        1588     12460        40 total







 gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/191.pid)
[New LWP 50941]
[New LWP 50948]
[New LWP 50949]
[New LWP 51267]
[New LWP 51269]
[New LWP 51270]
[New LWP 51271]
[New LWP 51274]
[New LWP 51283]
[New LWP 2639435]
[New LWP 1133766]
[New LWP 1148438]
[New LWP 1153824]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007efd30182a66 in __ppoll (fds=0x565319eca320, nfds=145, timeout=<optimized out>, timeout@entry=0x7fff11a5e1e0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
44      ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.

Thread 14 (Thread 0x7efd24d54700 (LWP 1153824) "iou-wrk-50948"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 13 (Thread 0x7efd1ffff700 (LWP 1148438) "iou-wrk-50949"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 12 (Thread 0x7efd1ffff700 (LWP 1133766) "iou-wrk-50949"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 11 (Thread 0x7efd24d54700 (LWP 2639435) "iou-wrk-50948"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 10 (Thread 0x7ef90afbf700 (LWP 51283) "vnc_worker"):
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0x56531971294c) at ../sysdeps/nptl/futex-internal.h:186
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x565319712958, cond=0x565319712920) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=cond@entry=0x565319712920, mutex=mutex@entry=0x565319712958) at pthread_cond_wait.c:638
#3  0x00005653187a99cb in qemu_cond_wait_impl (cond=0x565319712920, mutex=0x565319712958, file=0x565318820434 "../ui/vnc-jobs.c", line=248) at ../util/qemu-thread-posix.c:220
#4  0x00005653182385c3 in vnc_worker_thread_loop (queue=0x565319712920) at ../ui/vnc-jobs.c:248
#5  0x0000565318239288 in vnc_worker_thread (arg=arg@entry=0x565319712920) at ../ui/vnc-jobs.c:361
#6  0x00005653187a8e89 in qemu_thread_start (args=0x7ef90afba3f0) at ../util/qemu-thread-posix.c:505
#7  0x00007efd3026eea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8  0x00007efd3018ea2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 9 (Thread 0x7efd1d1ff700 (LWP 51274) "SPICE Worker"):
#0  0x00007efd3018296f in __GI___poll (fds=0x7ef900001ff0, nfds=2, timeout=2147483647) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007efd315f80ae in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007efd315f840b in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007efd31ca1fe7 in ?? () from /lib/x86_64-linux-gnu/libspice-server.so.1
#4  0x00007efd3026eea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#5  0x00007efd3018ea2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 8 (Thread 0x7efd1dffb700 (LWP 51271) "CPU 3/KVM"):
#0  0x00007efd30184237 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x0000565318621997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5653196eea10, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2  0x0000565318621b01 in kvm_cpu_exec (cpu=cpu@entry=0x5653196eea10) at ../accel/kvm/kvm-all.c:2850
#3  0x000056531862317d in kvm_vcpu_thread_fn (arg=arg@entry=0x5653196eea10) at ../accel/kvm/kvm-accel-ops.c:51
#4  0x00005653187a8e89 in qemu_thread_start (args=0x7efd1dff63f0) at ../util/qemu-thread-posix.c:505
#5  0x00007efd3026eea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007efd3018ea2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 7 (Thread 0x7efd1e7fc700 (LWP 51270) "CPU 2/KVM"):
#0  0x00007efd30184237 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x0000565318621997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5653196e6d20, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2  0x0000565318621b01 in kvm_cpu_exec (cpu=cpu@entry=0x5653196e6d20) at ../accel/kvm/kvm-all.c:2850
#3  0x000056531862317d in kvm_vcpu_thread_fn (arg=arg@entry=0x5653196e6d20) at ../accel/kvm/kvm-accel-ops.c:51
#4  0x00005653187a8e89 in qemu_thread_start (args=0x7efd1e7f73f0) at ../util/qemu-thread-posix.c:505
#5  0x00007efd3026eea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007efd3018ea2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7efd1effd700 (LWP 51269) "CPU 1/KVM"):
#0  0x00007efd30184237 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x0000565318621997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5653196deee0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2  0x0000565318621b01 in kvm_cpu_exec (cpu=cpu@entry=0x5653196deee0) at ../accel/kvm/kvm-all.c:2850
#3  0x000056531862317d in kvm_vcpu_thread_fn (arg=arg@entry=0x5653196deee0) at ../accel/kvm/kvm-accel-ops.c:51
#4  0x00005653187a8e89 in qemu_thread_start (args=0x7efd1eff83f0) at ../util/qemu-thread-posix.c:505
#5  0x00007efd3026eea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007efd3018ea2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7efd1f7fe700 (LWP 51267) "CPU 0/KVM"):
#0  0x00007efd30184237 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x0000565318621997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5653196af6c0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2  0x0000565318621b01 in kvm_cpu_exec (cpu=cpu@entry=0x5653196af6c0) at ../accel/kvm/kvm-all.c:2850
#3  0x000056531862317d in kvm_vcpu_thread_fn (arg=arg@entry=0x5653196af6c0) at ../accel/kvm/kvm-accel-ops.c:51
#4  0x00005653187a8e89 in qemu_thread_start (args=0x7efd1f7f93f0) at ../util/qemu-thread-posix.c:505
#5  0x00007efd3026eea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007efd3018ea2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7efd1ffff700 (LWP 50949) "kvm"):
#0  0x00007efd30182a66 in __ppoll (fds=0x7efd1741a310, nfds=8, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
#1  0x00005653187bde6d in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2  0x00005653187a6999 in fdmon_poll_wait (ctx=0x5653194cd000, ready_list=0x7efd1fffa368, timeout=-1) at ../util/fdmon-poll.c:80
#3  0x00005653187a6076 in aio_poll (ctx=0x5653194cd000, blocking=blocking@entry=true) at ../util/aio-posix.c:660
#4  0x000056531865f946 in iothread_run (opaque=opaque@entry=0x5653194cb260) at ../iothread.c:67
#5  0x00005653187a8e89 in qemu_thread_start (args=0x7efd1fffa3f0) at ../util/qemu-thread-posix.c:505
#6  0x00007efd3026eea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00007efd3018ea2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7efd24d54700 (LWP 50948) "kvm"):
#0  0x00007efd30182a66 in __ppoll (fds=0x7efd18003000, nfds=8, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
#1  0x00005653187bde6d in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2  0x00005653187a6999 in fdmon_poll_wait (ctx=0x5653194ca670, ready_list=0x7efd24d4f368, timeout=-1) at ../util/fdmon-poll.c:80
#3  0x00005653187a6076 in aio_poll (ctx=0x5653194ca670, blocking=blocking@entry=true) at ../util/aio-posix.c:660
#4  0x000056531865f946 in iothread_run (opaque=opaque@entry=0x56531939fd00) at ../iothread.c:67
#5  0x00005653187a8e89 in qemu_thread_start (args=0x7efd24d4f3f0) at ../util/qemu-thread-posix.c:505
#6  0x00007efd3026eea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00007efd3018ea2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7efd25656700 (LWP 50941) "call_rcu"):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00005653187aa04a in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /build/pve-qemu/pve-qemu-kvm-7.2.0/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x56531900b328 <rcu_call_ready_event>) at ../util/qemu-thread-posix.c:430
#3  0x00005653187b294a in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:261
#4  0x00005653187a8e89 in qemu_thread_start (args=0x7efd256513f0) at ../util/qemu-thread-posix.c:505
#5  0x00007efd3026eea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007efd3018ea2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7efd257b91c0 (LWP 50940) "kvm"):
#0  0x00007efd30182a66 in __ppoll (fds=0x565319eca320, nfds=145, timeout=<optimized out>, timeout@entry=0x7fff11a5e1e0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
#1  0x00005653187bde11 in ppoll (__ss=0x0, __timeout=0x7fff11a5e1e0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=1552585705) at ../util/qemu-timer.c:351
#3  0x00005653187bb675 in os_host_main_loop_wait (timeout=1552585705) at ../util/main-loop.c:315
#4  main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:606
#5  0x00005653183d8191 in qemu_main_loop () at ../softmmu/runstate.c:739
#6  0x0000565318211aa7 in qemu_default_main () at ../softmmu/main.c:37
#7  0x00007efd300b6d0a in __libc_start_main (main=0x56531820cc60 <main>, argc=82, argv=0x7fff11a5e3a8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff11a5e398) at ../csu/libc-start.c:308
#8  0x00005653182119da in _start ()
[Inferior 1 (process 50940) detached]





 qm config 191
agent: 1
balloon: 1024
boot: cdn
bootdisk: scsi0
cores: 2
description: fileserver
ide2: none,media=cdrom
memory: 16384
name: solferino-test
net0: virtio=22:1A:64:5B:76:1E,bridge=vmbr0,link_down=1
numa: 0
onboot: 1
ostype: l26
scsi0: nvme01:vm-191-disk-0,discard=on,iothread=1,size=32G
scsi1: nvme01:vm-191-disk-1,discard=on,iothread=1,size=1T
scsihw: virtio-scsi-single
sockets: 2
startup: order=190
tablet: 0
vga: qxl






 pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.19.17-2-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-3
pve-kernel-5.19: 7.2-15
pve-kernel-5.19.17-2-pve: 5.19.17-2
pve-kernel-5.19.17-1-pve: 5.19.17-1
pve-kernel-5.19.7-2-pve: 5.19.7-2
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.104-1-pve: 5.15.104-2
pve-kernel-5.15.85-1-pve: 5.15.85-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-3
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-1
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.6
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.7.0
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-2
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1
 
I can provide the same data for a VM that just got frozen:


Code:
# strace -c -p $(cat /var/run/qemu-server/375.pid)
strace: Process 3239800 attached

^Cstrace: Process 3239800 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.61   19.594097         317     61691           ppoll
  0.78    0.155846           7     21064           write
  0.31    0.061357          11      5142           recvmsg
  0.29    0.057971          10      5404           read
  0.00    0.000095           0       100           sendmsg
  0.00    0.000033           1        20           close
  0.00    0.000024           1        20           accept4
  0.00    0.000015           0        40           fcntl
  0.00    0.000011           0        20           getsockname
------ ----------- ----------- --------- --------- ----------------
100.00   19.869449         212     93501           total

# gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/375.pid)
[New LWP 3239801]
[New LWP 3239823]
[New LWP 3239824]
[New LWP 3239826]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f8b8012ee26 in __ppoll (fds=0x55f46f8345f0, nfds=77, timeout=<optimized out>, timeout@entry=0x7ffcacaa56d0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
44    ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.

Thread 5 (Thread 0x7f8b751bf700 (LWP 3239826) "vnc_worker"):
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0x55f46f83444c) at ../sysdeps/nptl/futex-internal.h:186
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55f46f834458, cond=0x55f46f834420) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=cond@entry=0x55f46f834420, mutex=mutex@entry=0x55f46f834458) at pthread_cond_wait.c:638
#3  0x000055f46d0ae9cb in qemu_cond_wait_impl (cond=0x55f46f834420, mutex=0x55f46f834458, file=0x55f46d125434 "../ui/vnc-jobs.c", line=248) at ../util/qemu-thread-posix.c:220
#4  0x000055f46cb3d5c3 in vnc_worker_thread_loop (queue=0x55f46f834420) at ../ui/vnc-jobs.c:248
#5  0x000055f46cb3e288 in vnc_worker_thread (arg=arg@entry=0x55f46f834420) at ../ui/vnc-jobs.c:361
#6  0x000055f46d0ade89 in qemu_thread_start (args=0x7f8b751ba3b0) at ../util/qemu-thread-posix.c:505
#7  0x00007f8b8021aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8  0x00007f8b8013aa2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f8b77dbf700 (LWP 3239824) "CPU 1/KVM"):
#0  0x00007f8b801305f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x000055f46cf26997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55f46f807af0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2  0x000055f46cf26b01 in kvm_cpu_exec (cpu=cpu@entry=0x55f46f807af0) at ../accel/kvm/kvm-all.c:2850
#3  0x000055f46cf2817d in kvm_vcpu_thread_fn (arg=arg@entry=0x55f46f807af0) at ../accel/kvm/kvm-accel-ops.c:51
#4  0x000055f46d0ade89 in qemu_thread_start (args=0x7f8b77dba3b0) at ../util/qemu-thread-posix.c:505
#5  0x00007f8b8021aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f8b8013aa2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f8b7cc59700 (LWP 3239823) "CPU 0/KVM"):
#0  0x00007f8b801305f7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x000055f46cf26997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55f46f7da7b0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2  0x000055f46cf26b01 in kvm_cpu_exec (cpu=cpu@entry=0x55f46f7da7b0) at ../accel/kvm/kvm-all.c:2850
#3  0x000055f46cf2817d in kvm_vcpu_thread_fn (arg=arg@entry=0x55f46f7da7b0) at ../accel/kvm/kvm-accel-ops.c:51
#4  0x000055f46d0ade89 in qemu_thread_start (args=0x7f8b7cc543b0) at ../util/qemu-thread-posix.c:505
#5  0x00007f8b8021aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f8b8013aa2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f8b7d55b700 (LWP 3239801) "call_rcu"):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x000055f46d0af04a in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /build/pve-qemu/pve-qemu-kvm-7.2.0/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x55f46d910328 <rcu_call_ready_event>) at ../util/qemu-thread-posix.c:430
#3  0x000055f46d0b794a in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:261
#4  0x000055f46d0ade89 in qemu_thread_start (args=0x7f8b7d5563b0) at ../util/qemu-thread-posix.c:505
#5  0x00007f8b8021aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f8b8013aa2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f8b7d7c5200 (LWP 3239800) "kvm"):
#0  0x00007f8b8012ee26 in __ppoll (fds=0x55f46f8345f0, nfds=77, timeout=<optimized out>, timeout@entry=0x7ffcacaa56d0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
#1  0x000055f46d0c2e11 in ppoll (__ss=0x0, __timeout=0x7ffcacaa56d0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2  qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=990431) at ../util/qemu-timer.c:351
#3  0x000055f46d0c0675 in os_host_main_loop_wait (timeout=990431) at ../util/main-loop.c:315
#4  main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:606
#5  0x000055f46ccdd191 in qemu_main_loop () at ../softmmu/runstate.c:739
#6  0x000055f46cb16aa7 in qemu_default_main () at ../softmmu/main.c:37
#7  0x00007f8b80061d0a in __libc_start_main (main=0x55f46cb11c60 <main>, argc=65, argv=0x7ffcacaa5898, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffcacaa5888) at ../csu/libc-start.c:308
#8  0x000055f46cb169da in _start ()
[Inferior 1 (process 3239800) detached]




# qm config 375
agent: 1
boot: order=virtio0;ide0
cipassword: **********
ciuser: ubuntu
cores: 2
ipconfig0: ip=dhcp
memory: 4096
name: qbert02.pp2
net0: virtio=9E:96:2D:6C:67:31,bridge=vmbr0,tag=917
onboot: 1
smbios1: uuid=c5d9d825-11f9-4f6a-9f0d-d133bb4ad6d5
sockets: 1
tags: u18
virtio0: ProxmoxPreProdNFS:375/vm-375-disk-0.qcow2,aio=native,size=12G
vmgenid: ab691b99-fb28-463c-b118-df21770a8c38




# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 6.2.9-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-6.2: 7.4-1
pve-kernel-5.15: 7.4-1
pve-kernel-6.1: 7.3-6
pve-kernel-5.19: 7.2-15
pve-kernel-6.2.9-1-pve: 6.2.9-1
pve-kernel-6.1.15-1-pve: 6.1.15-1
pve-kernel-5.19.17-2-pve: 5.19.17-2
pve-kernel-5.15.104-1-pve: 5.15.104-2
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph: 17.2.5-pve1
ceph-fuse: 17.2.5-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
openvswitch-switch: 2.15.0+ds1-2+deb11u4
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
root@pvepp03:~# 3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

I hope it helps.
We are starting the process of moving all prod to 7.1, and also try PVE8 for low environments.
Thanks
 
while we have been "lucky" during the last days, not seeing any recent freezes, what puzzles me is that even a watchdog is not able to interact with such a frozen VM. That is, no matter what has been configured (eg. reboot), the VM is and remains frozen.

As far as I understand how the kernel watchdog operates, it waits for a specific event (usually a kernel panic) and acts accordingly (eg. reboot).

The fact that even the watchdog is unable to do anything adds even more to the suspicion that the issue is not located within the VM but qemu itself.
 
Okay, can you provide some "hardware details" ?

Check the following:
selinux enabled?
apparmor enabled?

Code:
Try: PROXMOX host

$> apt-get install apparmor-utils policycoreutils
$> sestatus
$> aa-status

/etc/default/grub
------------------------------------------------------------------------------------------------
GRUB_CMDLINE_LINUX="...  apparmor=0 selinux=0"
------------------------------------------------------------------------------------------------

$> update-grub
$> update-initramfs -c -d -u
reboot node

$> sestatus
$> aa-status

Test/Check Proxmox host with these settings (disabled apparmor, disabled selinux).
 
Last edited:
I changed default SCSI controller to virtio scsi and vm's stopped hanging tightly for now.
 
All of my VMs are using VirtIO SCSI from the beginning so I doubt that's the solution.
At least for us, changing it from "VirtIO SCSI" to "VirtIO SCSI single" has made the biggest difference. We've been seeing far less freezes than before ... but as we all know, only time will tell if this really improves things ...

And "VirtIO SCSI single" is actually also the default now.
 
I have "VirtIO SCSI single" from beginning of my 2 vms - they hangs too in 100% time of CPU: one vm with Windows Server 2022 and one vm with Almalinux 9, so i think this is not a solution, but time will tell ... This config seems to be stable, from some time now (i disabled baloon memory).
AlmaLinux 9:
1688468256926.png
 
Last edited:
Maybe you are on to something. Although ballooning is active on all of my VMs almost every VM has the same value for memory and minimum memory. Except that one Linux VM that hangs most of the time. This VM goes from 1 to 16 gb. So as a first step i will put the minimum memory to 16 gb, too.
 
Last edited:
Bad news: had my first crash today since upgrading to PVE 8. Here is the information @fiona had asked for:

root@xenon:~# strace -c -p $(cat /var/run/qemu-server/100.pid)
strace: Process 1483 attached
^Cstrace: Process 1483 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
98.72 11.380835 3724 3056 ppoll
0.95 0.109469 9 11632 write
0.18 0.021113 7 2847 recvmsg
0.08 0.008799 2 2991 read
0.07 0.008591 153 56 sendmsg
0.00 0.000049 4 12 close
0.00 0.000048 0 60 ioctl
0.00 0.000021 1 12 accept4
0.00 0.000012 0 24 fcntl
0.00 0.000008 0 12 getsockname
------ ----------- ----------- --------- --------- ----------------
100.00 11.528945 556 20702 total

root@xenon:~# gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/100.pid
>
> )
[New LWP 1484]
[New LWP 1506]
[New LWP 1509]
[New LWP 1760]
[New LWP 2343438]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fbb31e3c0f6 in __ppoll (fds=0x565308e82010, nfds=80, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
42 ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.

Thread 6 (Thread 0x7fbb26f10280 (LWP 2343438) "iou-wrk-1483"):
#0 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 5 (Thread 0x7fb9154dd6c0 (LWP 1760) "vnc_worker"):
#0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x565309376b98) at ./nptl/futex-internal.c:57
#1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x565309376b98, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
#2 0x00007fbb31dc5d9b in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x565309376b98, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3 0x00007fbb31dc83f8 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x565309376ba8, cond=0x565309376b70) at ./nptl/pthread_cond_wait.c:503
#4 ___pthread_cond_wait (cond=0x565309376b70, mutex=0x565309376ba8) at ./nptl/pthread_cond_wait.c:618
#5 0x000056530600c6fb in ?? ()
#6 0x0000565305a72fdd in ?? ()
#7 0x0000565305a73ce8 in ?? ()
#8 0x000056530600bbe8 in ?? ()
#9 0x00007fbb31dc8fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#10 0x00007fbb31e495bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 4 (Thread 0x7fbb1f5ff6c0 (LWP 1509) "CPU 1/KVM"):
#0 __GI___ioctl (fd=15, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1 0x0000565305e7855f in ?? ()
#2 0x0000565305e786b5 in ?? ()
#3 0x0000565305e79cfd in ?? ()
#4 0x000056530600bbe8 in ?? ()
#5 0x00007fbb31dc8fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6 0x00007fbb31e495bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 3 (Thread 0x7fbb264a06c0 (LWP 1506) "CPU 0/KVM"):
#0 __GI___ioctl (fd=27, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1 0x0000565305e7855f in ?? ()
#2 0x0000565305e786b5 in ?? ()
#3 0x0000565305e79cfd in ?? ()
#4 0x000056530600bbe8 in ?? ()
#5 0x00007fbb31dc8fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6 0x00007fbb31e495bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 2 (Thread 0x7fbb26da26c0 (LWP 1484) "call_rcu"):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x000056530600cd6a in ?? ()
#2 0x00005653060165c2 in ?? ()
#3 0x000056530600bbe8 in ?? ()
#4 0x00007fbb31dc8fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#5 0x00007fbb31e495bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 1 (Thread 0x7fbb26f10280 (LWP 1483) "kvm"):
#0 0x00007fbb31e3c0f6 in __ppoll (fds=0x565308e82010, nfds=80, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
#1 0x0000565306021bee in ?? ()
#2 0x000056530601f4ee in ?? ()
#3 0x0000565305c3baf7 in ?? ()
#4 0x0000565305e82a46 in ?? ()
#5 0x00007fbb31d6718a in __libc_start_call_main (main=main@entry=0x565305a48390 <main>, argc=argc@entry=80, argv=argv@entry=0x7fff379eb008) at ../sysdeps/nptl/libc_start_call_main.h:58
#6 0x00007fbb31d67245 in __libc_start_main_impl (main=0x565305a48390 <main>, argc=80, argv=0x7fff379eb008, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff379eaff8) at ../csu/libc-start.c:381
#7 0x0000565305a49e71 in ?? ()
[Inferior 1 (process 1483) detached]
root@xenon:~# qm config 100
agent: 1
balloon: 2048
bios: ovmf
boot: cdn
bootdisk: scsi0
cores: 2
cpu: kvm64
efidisk0: local-lvm:vm-100-disk-2,efitype=4m,pre-enrolled-keys=1,size=4M
hotplug: disk,network,usb,cpu
ide2: none,media=cdrom
machine: pc-q35-7.2
memory: 8192
name: win10-insider
net0: virtio=22:0A:07:F2:62:E1,bridge=vmbr0
numa: 0
ostype: win11
scsi0: local-lvm:vm-100-disk-1,discard=on,size=192G
scsihw: virtio-scsi-pci
smbios1: uuid=892a4948-9c70-4d86-a37a-55db4ee035b2
sockets: 1
tpmstate0: local-lvm:vm-100-disk-0,size=4M,version=v2.0
root@xenon:~# pveversion -v
proxmox-ve: 8.0.1 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.2
pve-kernel-5.15: 7.4-4
pve-kernel-6.2.16-3-pve: 6.2.16-3
pve-kernel-6.2.11-2-pve: 6.2.11-2
pve-kernel-5.15.108-1-pve: 5.15.108-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.2
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 3.0.1-1
proxmox-backup-file-restore: 3.0.1-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.1
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
 
My Vm also crashed after the upgrade to the newest PVE version:


Code:
root@proxmox2:~# strace -c -p $(cat /var/run/qemu-server/109.pid)
strace: Process 1611012 attached
^Cstrace: Process 1611012 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 96.58  181.696637       32451      5599       325 ppoll
  1.58    2.979040         168     17702           write
  0.57    1.074990         220      4876           read
  0.51    0.950389        2582       368           io_uring_enter
  0.29    0.542830         126      4280           recvmsg
  0.26    0.493407        1912       258           futex
  0.17    0.312506        3063       102           sendmsg
  0.03    0.063388        2112        30           accept4
  0.00    0.006165         205        30           getsockname
  0.00    0.001107          18        60           fcntl
  0.00    0.000628          20        30           close
------ ----------- ----------- --------- --------- ----------------
100.00  188.121087        5643     33335       325 total
 
Code:
root@proxmox2:~# gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/109.pid)
[New LWP 1611013]
[New LWP 1611038]
[New LWP 1611039]
[New LWP 1611040]
[New LWP 1611041]
[New LWP 1611042]
[New LWP 1611043]
[New LWP 1611044]
[New LWP 1611045]
[New LWP 1611046]
[New LWP 1611047]
[New LWP 1611048]
[New LWP 1611049]
[New LWP 1611050]
[New LWP 1611051]
[New LWP 1611052]
[New LWP 1611053]
[New LWP 1611056]
[New LWP 1611058]
[New LWP 346499]
[New LWP 346500]
[New LWP 346501]
[New LWP 346502]
[New LWP 369791]
[New LWP 392966]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fe97f9550f6 in __ppoll (fds=0x563f57a5dca0, nfds=160, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
42      ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.

Thread 26 (Thread 0x7fe97cc08400 (LWP 392966) "iou-wrk-1611012"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 25 (Thread 0x7fe97cc08400 (LWP 369791) "iou-wrk-1611012"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 24 (Thread 0x7fe4bb7fe6c0 (LWP 346502) "proxmox-backup-"):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007fe981010640 in ?? () from /lib/libproxmox_backup_qemu.so.0
#2  0x00007fe980f58116 in ?? () from /lib/libproxmox_backup_qemu.so.0
#3  0x00007fe980f501bd in ?? () from /lib/libproxmox_backup_qemu.so.0
#4  0x00007fe980f4f358 in ?? () from /lib/libproxmox_backup_qemu.so.0
#5  0x00007fe980f3f5e9 in ?? () from /lib/libproxmox_backup_qemu.so.0
#6  0x00007fe980f4efe0 in ?? () from /lib/libproxmox_backup_qemu.so.0
#7  0x00007fe980f45b22 in ?? () from /lib/libproxmox_backup_qemu.so.0
#8  0x00007fe980f3da1c in ?? () from /lib/libproxmox_backup_qemu.so.0
#9  0x00007fe980f5069f in ?? () from /lib/libproxmox_backup_qemu.so.0
#10 0x00007fe980f4ba09 in ?? () from /lib/libproxmox_backup_qemu.so.0
#11 0x00007fe980f41187 in ?? () from /lib/libproxmox_backup_qemu.so.0
#12 0x00007fe980f4c6f5 in ?? () from /lib/libproxmox_backup_qemu.so.0
#13 0x00007fe981031313 in ?? () from /lib/libproxmox_backup_qemu.so.0
#14 0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#15 0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 23 (Thread 0x7fe4c9c996c0 (LWP 346501) "proxmox-backup-"):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007fe981010640 in ?? () from /lib/libproxmox_backup_qemu.so.0
#2  0x00007fe980f58116 in ?? () from /lib/libproxmox_backup_qemu.so.0
#3  0x00007fe980f501bd in ?? () from /lib/libproxmox_backup_qemu.so.0
#4  0x00007fe980f4f358 in ?? () from /lib/libproxmox_backup_qemu.so.0
#5  0x00007fe980f3f5e9 in ?? () from /lib/libproxmox_backup_qemu.so.0
#6  0x00007fe980f4efe0 in ?? () from /lib/libproxmox_backup_qemu.so.0
#7  0x00007fe980f45b22 in ?? () from /lib/libproxmox_backup_qemu.so.0
#8  0x00007fe980f3da1c in ?? () from /lib/libproxmox_backup_qemu.so.0
#9  0x00007fe980f5069f in ?? () from /lib/libproxmox_backup_qemu.so.0
#10 0x00007fe980f4ba09 in ?? () from /lib/libproxmox_backup_qemu.so.0
#11 0x00007fe980f41187 in ?? () from /lib/libproxmox_backup_qemu.so.0
#12 0x00007fe980f4c6f5 in ?? () from /lib/libproxmox_backup_qemu.so.0
#13 0x00007fe981031313 in ?? () from /lib/libproxmox_backup_qemu.so.0
#14 0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#15 0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 22 (Thread 0x7fe4b8cf26c0 (LWP 346500) "proxmox-backup-"):
#0  0x00007fe97f961c06 in epoll_wait (epfd=230, events=0x563f577a5140, maxevents=1024, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
#1  0x00007fe980f63307 in ?? () from /lib/libproxmox_backup_qemu.so.0
#2  0x00007fe980f56efd in ?? () from /lib/libproxmox_backup_qemu.so.0
#3  0x00007fe980f40c15 in ?? () from /lib/libproxmox_backup_qemu.so.0
#4  0x00007fe980f581a1 in ?? () from /lib/libproxmox_backup_qemu.so.0
#5  0x00007fe980f501bd in ?? () from /lib/libproxmox_backup_qemu.so.0
#6  0x00007fe980f4f358 in ?? () from /lib/libproxmox_backup_qemu.so.0
#7  0x00007fe980f3f5e9 in ?? () from /lib/libproxmox_backup_qemu.so.0
#8  0x00007fe980f4efe0 in ?? () from /lib/libproxmox_backup_qemu.so.0
#9  0x00007fe980f45b22 in ?? () from /lib/libproxmox_backup_qemu.so.0
#10 0x00007fe980f3da1c in ?? () from /lib/libproxmox_backup_qemu.so.0
#11 0x00007fe980f5069f in ?? () from /lib/libproxmox_backup_qemu.so.0
#12 0x00007fe980f4ba09 in ?? () from /lib/libproxmox_backup_qemu.so.0
#13 0x00007fe980f41187 in ?? () from /lib/libproxmox_backup_qemu.so.0
#14 0x00007fe980f4c6f5 in ?? () from /lib/libproxmox_backup_qemu.so.0
#15 0x00007fe981031313 in ?? () from /lib/libproxmox_backup_qemu.so.0
#16 0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#17 0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 21 (Thread 0x7fe4b74006c0 (LWP 346499) "proxmox-backup-"):
#0  0x00007fe980b63131 in ?? () from /lib/x86_64-linux-gnu/libzstd.so.1
#1  0x00007fe980b63595 in ?? () from /lib/x86_64-linux-gnu/libzstd.so.1
#2  0x00007fe980b63648 in ?? () from /lib/x86_64-linux-gnu/libzstd.so.1
#3  0x00007fe980b64a5a in ?? () from /lib/x86_64-linux-gnu/libzstd.so.1
#4  0x00007fe980b64c30 in ?? () from /lib/x86_64-linux-gnu/libzstd.so.1
#5  0x00007fe980b72f6e in ?? () from /lib/x86_64-linux-gnu/libzstd.so.1
#6  0x00007fe980b68d62 in ?? () from /lib/x86_64-linux-gnu/libzstd.so.1
#7  0x00007fe980b69001 in ?? () from /lib/x86_64-linux-gnu/libzstd.so.1
#8  0x00007fe980b6b7e7 in ?? () from /lib/x86_64-linux-gnu/libzstd.so.1
#9  0x00007fe980b6c271 in ZSTD_compressContinue () from /lib/x86_64-linux-gnu/libzstd.so.1
#10 0x00007fe980b71a69 in ZSTD_compressStream2 () from /lib/x86_64-linux-gnu/libzstd.so.1
#11 0x00007fe980b71b6b in ZSTD_compressStream () from /lib/x86_64-linux-gnu/libzstd.so.1
#12 0x00007fe980eb1740 in ?? () from /lib/libproxmox_backup_qemu.so.0
#13 0x00007fe980eb6350 in ?? () from /lib/libproxmox_backup_qemu.so.0
#14 0x00007fe980ebb30a in ?? () from /lib/libproxmox_backup_qemu.so.0
#15 0x00007fe980eb81c2 in ?? () from /lib/libproxmox_backup_qemu.so.0
#16 0x00007fe980eb604b in ?? () from /lib/libproxmox_backup_qemu.so.0
#17 0x00007fe980eb8fc0 in ?? () from /lib/libproxmox_backup_qemu.so.0
#18 0x00007fe980ebaf6e in ?? () from /lib/libproxmox_backup_qemu.so.0
#19 0x00007fe980d8ea6a in ?? () from /lib/libproxmox_backup_qemu.so.0
#20 0x00007fe980ceeec9 in ?? () from /lib/libproxmox_backup_qemu.so.0
#21 0x00007fe980d0c9fa in ?? () from /lib/libproxmox_backup_qemu.so.0
#22 0x00007fe980d9b019 in ?? () from /lib/libproxmox_backup_qemu.so.0
#23 0x00007fe980ccef99 in ?? () from /lib/libproxmox_backup_qemu.so.0
#24 0x00007fe980ca6256 in ?? () from /lib/libproxmox_backup_qemu.so.0
#25 0x00007fe980f4fcdb in ?? () from /lib/libproxmox_backup_qemu.so.0
#26 0x00007fe980f4f8c0 in ?? () from /lib/libproxmox_backup_qemu.so.0
#27 0x00007fe980f3f5e9 in ?? () from /lib/libproxmox_backup_qemu.so.0
#28 0x00007fe980f4efe0 in ?? () from /lib/libproxmox_backup_qemu.so.0
#29 0x00007fe980f45b22 in ?? () from /lib/libproxmox_backup_qemu.so.0
#30 0x00007fe980f3da1c in ?? () from /lib/libproxmox_backup_qemu.so.0
#31 0x00007fe980f5069f in ?? () from /lib/libproxmox_backup_qemu.so.0
#32 0x00007fe980f4ba09 in ?? () from /lib/libproxmox_backup_qemu.so.0
#33 0x00007fe980f41187 in ?? () from /lib/libproxmox_backup_qemu.so.0
#34 0x00007fe980f4c6f5 in ?? () from /lib/libproxmox_backup_qemu.so.0
#35 0x00007fe981031313 in ?? () from /lib/libproxmox_backup_qemu.so.0
#36 0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#37 0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 20 (Thread 0x7fe526bbf6c0 (LWP 1611058) "vnc_worker"):
#0  __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x563f589bd9ec) at ./nptl/futex-internal.c:57
#1  __futex_abstimed_wait_common (futex_word=futex_word@entry=0x563f589bd9ec, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87
#2  0x00007fe97f8ded9b in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x563f589bd9ec, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139
#3  0x00007fe97f8e13f8 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x563f589bd9f8, cond=0x563f589bd9c0) at ./nptl/pthread_cond_wait.c:503
#4  ___pthread_cond_wait (cond=0x563f589bd9c0, mutex=0x563f589bd9f8) at ./nptl/pthread_cond_wait.c:618
#5  0x0000563f55dae6fb in ?? ()
#6  0x0000563f55814fdd in ?? ()
#7  0x0000563f55815ce8 in ?? ()
#8  0x0000563f55dadbe8 in ?? ()
#9  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#10 0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 19 (Thread 0x7fe527fff6c0 (LWP 1611056) "SPICE Worker"):
#0  0x00007fe97f954fff in __GI___poll (fds=0x7fe5100014f0, nfds=2, timeout=2147483647) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00007fe9812b29ae in ?? () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007fe9812b2cef in g_main_loop_run () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#3  0x00007fe9819b1fa9 in ?? () from /lib/x86_64-linux-gnu/libspice-server.so.1
#4  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#5  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 18 (Thread 0x7fe53a3ff6c0 (LWP 1611053) "CPU 15/KVM"):
#0  __GI___ioctl (fd=52, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 17 (Thread 0x7fe53afff6c0 (LWP 1611052) "CPU 14/KVM"):
#0  __GI___ioctl (fd=51, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 16 (Thread 0x7fe53bbff6c0 (LWP 1611051) "CPU 13/KVM"):
#0  __GI___ioctl (fd=50, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 15 (Thread 0x7fe5549ff6c0 (LWP 1611050) "CPU 12/KVM"):
#0  __GI___ioctl (fd=49, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 14 (Thread 0x7fe5555ff6c0 (LWP 1611049) "CPU 11/KVM"):
#0  __GI___ioctl (fd=48, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
 
Code:
Thread 13 (Thread 0x7fe5561ff6c0 (LWP 1611048) "CPU 10/KVM"):
#0  __GI___ioctl (fd=47, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 12 (Thread 0x7fe556dff6c0 (LWP 1611047) "CPU 9/KVM"):
#0  __GI___ioctl (fd=46, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 11 (Thread 0x7fe5577fe6c0 (LWP 1611046) "CPU 8/KVM"):
#0  __GI___ioctl (fd=45, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 10 (Thread 0x7fe557fff6c0 (LWP 1611045) "CPU 7/KVM"):
#0  __GI___ioctl (fd=44, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 9 (Thread 0x7fe56d3ff6c0 (LWP 1611044) "CPU 6/KVM"):
#0  __GI___ioctl (fd=43, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 8 (Thread 0x7fe56dfff6c0 (LWP 1611043) "CPU 5/KVM"):
#0  __GI___ioctl (fd=42, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 7 (Thread 0x7fe56ebff6c0 (LWP 1611042) "CPU 4/KVM"):
#0  __GI___ioctl (fd=41, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 6 (Thread 0x7fe56f7ff6c0 (LWP 1611041) "CPU 3/KVM"):
#0  __GI___ioctl (fd=40, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 5 (Thread 0x7fe9749ff6c0 (LWP 1611040) "CPU 2/KVM"):
#0  __GI___ioctl (fd=39, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 4 (Thread 0x7fe9755ff6c0 (LWP 1611039) "CPU 1/KVM"):
#0  __GI___ioctl (fd=38, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 3 (Thread 0x7fe977e3d6c0 (LWP 1611038) "CPU 0/KVM"):
#0  __GI___ioctl (fd=37, request=44672) at ../sysdeps/unix/sysv/linux/ioctl.c:36
#1  0x0000563f55c1a55f in ?? ()
#2  0x0000563f55c1a6b5 in ?? ()
#3  0x0000563f55c1bcfd in ?? ()
#4  0x0000563f55dadbe8 in ?? ()
#5  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#6  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 2 (Thread 0x7fe97c9a46c0 (LWP 1611013) "call_rcu"):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x0000563f55daed6a in ?? ()
#2  0x0000563f55db85c2 in ?? ()
#3  0x0000563f55dadbe8 in ?? ()
#4  0x00007fe97f8e1fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#5  0x00007fe97f9625bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Thread 1 (Thread 0x7fe97cc08400 (LWP 1611012) "kvm"):
#0  0x00007fe97f9550f6 in __ppoll (fds=0x563f57a5dca0, nfds=160, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:42
#1  0x0000563f55dc3bee in ?? ()
#2  0x0000563f55dc14ee in ?? ()
#3  0x0000563f559ddaf7 in ?? ()
#4  0x0000563f55c24a46 in ?? ()
#5  0x00007fe97f88018a in __libc_start_call_main (main=main@entry=0x563f557ea390 <main>, argc=argc@entry=110, argv=argv@entry=0x7ffe735acb58) at ../sysdeps/nptl/libc_start_call_main.h:58
#6  0x00007fe97f880245 in __libc_start_main_impl (main=0x563f557ea390 <main>, argc=110, argv=0x7ffe735acb58, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe735acb48) at ../csu/libc-start.c:381
#7  0x0000563f557ebe71 in ?? ()
[Inferior 1 (process 1611012) detached]
 
@showiproute @VivienM please make sure you have the pve-qemu-kvm-dbgsym package installed. If you have, are you sure the VM was already running with the new QEMU binary? Otherwise, there shouldn't be all question marks in the debugger's trace output.

Questions for all:
  • Are the hangs related to certain actions inside the guest, e.g. reboot?
  • Do you see any messages about failing QMP commands in the system logs?
  • What can you see when accessing the VM's Console in the web UI?
  • Any interesting logs inside the guest (after you reset it or get it working by live-migration if that works for you)?
 
@fiona: I haven't had pve-qemu-kvm-dbgsym installed yet - but done that now.
In general my Windows VMs are using the newest 8.0 QEMU version.

My 100 % peak started according to the server overview at midnight.
If I click on the console tab on the VM page I can just see a black screen but no error at the logs.
 
I'm very interested at resolve this bug, that stops me from deploy Proxmox in large installation. Is this issue propagated beyond this forum ? Or we alone there ?
 
I'm very interested at resolve this bug, that stops me from deploy Proxmox in large installation. Is this issue propagated beyond this forum ? Or we alone there ?
At least I didn't find any upstream reports about this issue and no promising patches on the QEMU development mailing list. But more eyes certainly can't hurt. Neither me nor any of my coworkers ran into this bug yet, so it's hard for us to debug further. We did look at a customers machine, but in one case it was the PLT corruption bug, which likely is more low-level than QEMU and the other time, it was a different bug (QEMU process was not stuck in ppoll).
 
At least I didn't find any upstream reports about this issue and no promising patches on the QEMU development mailing list. But more eyes certainly can't hurt. Neither me nor any of my coworkers ran into this bug yet, so it's hard for us to debug further. We did look at a customers machine, but in one case it was the PLT corruption bug, which likely is more low-level than QEMU and the other time, it was a different bug (QEMU process was not stuck in ppoll).
Thank You for the answer. Maybe it's time to propagate this bug with the infos from here, the users: gdb stacktrace, strace etc to qemu|kernel devel list? Maybe it will help ... I'm a regular user and unfortunately I don't have enough technical knowledge to run such a topic on the devel list. My 2 cents to the topic.
 
Questions for all:
  • Are the hangs related to certain actions inside the guest, e.g. reboot?
  • Do you see any messages about failing QMP commands in the system logs?
  • What can you see when accessing the VM's Console in the web UI?
  • Any interesting logs inside the guest (after you reset it or get it working by live-migration if that works for you)?


- Hangs seem completely unrelated to guest actions: VM uptime ranges from a few hours to ~50 days, CPU and memory usage at hang time varies a lot from nothing to much, same with disk and network I/O. Happened with Windows OS from (10, 2019, 2022) and Linux OS (Ubuntu 18, 20, 22, Debian Buster and Bullseye). All have QEMU Agent configured and running. I've tried to reproduce the issue generating different workloads on guest VMs without success.

- No QMP messages on syslog. When the VM ins hung it does not reply to QMP commands (ping, reboot, stop).

- The webUI console of the VM shows a frozen screen with whatever the VM has at that moment, but its unresponsive and neither keyboard or mouse input works. The date/time shown by Windows welcome screen does not refresh, i.e. graphic output doesn't work either.

- Nothing relevant on any OS. From the guest perspective seems as if the time had paused and simply unpaused after the live migration has ended. Nothing inside the guest works while the VM is hung (tested with logger every second to syslog).

Adding to that:

- It has happened with at least 5 different clusters, some Intel, some AMD Epyc, of different generations.

- Seems to happen more often on memory constrained hosts, were KSM do have some amount of memory merged, though it has happened on hosts with lots of free memory too.


Any chance we could get Kernel 5.15 to PVE8 meanwhile this issue gets sorted out? For me PVE has been very stable for years until this issue arose. Having the option to use 5.15 on PVE8 would help us deploy the newer version without the risk of suffering these hangs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!