[SOLVED] VMs freeze with 100% CPU

Hi,
when a VM gets stuck, you can run strace -c -p $(cat /var/run/qemu-server/<ID>.pid) with the ID of the VM. Press Ctrl+C after about 10 seconds to get the output.

You can also install debugger and debug symbols apt install pve-qemu-kvm-dbg gdb and then run gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/<ID>.pid).

When you share this information, please also share the output of qm config <ID> and pveversion -v to make it easier to correlate things.

If we're lucky, those will give some idea where it's stuck.

If you don't have latest microcode and BIOS update installed, please try that first.
Hi Fiona,
we are "lucky" and have the issue on one VM (win).

The output:
Code:
strace -c -p $(cat /var/run/qemu-server/$ID.pid)
strace: Process 38494 attached
^Cstrace: Process 38494 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.71   19.803374       34621       572           ppoll
  0.14    0.028442          13      2136           write
  0.08    0.016147          29       549           read
  0.06    0.012145          23       523           recvmsg
  0.00    0.000007           0        10           sendmsg
  0.00    0.000002           1         2           close
  0.00    0.000001           0         2           accept4
  0.00    0.000000           0         2           getsockname
  0.00    0.000000           0         4           fcntl
------ ----------- ----------- --------- --------- ----------------
100.00   19.860118        5226      3800           total

gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/$ID.pid)
[New LWP 38495]
[New LWP 38670]
[New LWP 38671]
[New LWP 38672]
[New LWP 38673]
[New LWP 38678]
[New LWP 531848]
[New LWP 532278]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007ff555ef0e26 in internal_fallocate64 (fd=-137380896, offset=80, len=140735731015936) at ../sysdeps/posix/posix_fallocate64.c:36
36      ../sysdeps/posix/posix_fallocate64.c: No such file or directory.

Thread 9 (Thread 0x7ff54b2df040 (LWP 532278) "iou-wrk-38494"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 8 (Thread 0x7ff54b2df040 (LWP 531848) "iou-wrk-38494"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 7 (Thread 0x7ff32bfff700 (LWP 38678) "vnc_worker"):
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0x558ff7cf6648) at ../sysdeps/nptl/futex-internal.h:186
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x558ff7cf6658, cond=0x558ff7cf6620) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x558ff7cf6620, mutex=0x558ff7cf6658) at pthread_cond_wait.c:638
#3  0x0000558ff511155b in ?? ()
#4  0x0000558ff4bdf5e3 in ?? ()
#5  0x0000558ff4be02a8 in ?? ()
#6  0x0000558ff5110a19 in ?? ()
#7  0x00007ff555fdeea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8  0x00007ff555efca2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7ff33bdff700 (LWP 38673) "CPU 3/KVM"):
#0  0x00007ff555ef25f7 in preadv64v2 (fd=-137491808, vector=0x558ff4f8f817, count=0, offset=1, flags=44672) at ../sysdeps/unix/sysv/linux/preadv64v2.c:31
#1  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7ff548fff700 (LWP 38672) "CPU 2/KVM"):
#0  0x00007ff555ef25f7 in preadv64v2 (fd=-137555216, vector=0x558ff4f8f817, count=0, offset=1, flags=44672) at ../sysdeps/unix/sysv/linux/preadv64v2.c:31
#1  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7ff549bff700 (LWP 38671) "CPU 1/KVM"):
#0  0x00007ff555ef25f7 in preadv64v2 (fd=-137621056, vector=0x558ff4f8f817, count=0, offset=1, flags=44672) at ../sysdeps/unix/sysv/linux/preadv64v2.c:31
#1  0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7ff54a87b700 (LWP 38670) "CPU 0/KVM"):
#0  0x00007ff555ef25f7 in preadv64v2 (fd=-137845280, vector=0x558ff4f8f817, count=0, offset=1, flags=44672) at ../sysdeps/unix/sysv/linux/preadv64v2.c:31
#1  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7ff54b17d700 (LWP 38495) "call_rcu"):
#0  0x00007ff555ef62e9 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x0000558ff5111bda in ?? ()
#2  0x0000558ff511a16a in ?? ()
#3  0x0000558ff5110a19 in ?? ()
#4  0x00007ff555fdeea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#5  0x00007ff555efca2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7ff54b2df040 (LWP 38494) "kvm"):
#0  0x00007ff555ef0e26 in internal_fallocate64 (fd=-137380896, offset=80, len=140735731015936) at ../sysdeps/posix/posix_fallocate64.c:36
#1  0x0000000000000000 in ?? ()
[Inferior 1 (process 38494) detached]
This pve-node had the issue the first time.
Code:
pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.19.17-1-pve)
pve-manager: 7.3-3 (running version: 7.3-3/c3928077)
pve-kernel-5.15: 7.2-14
pve-kernel-5.19: 7.2-14
pve-kernel-helper: 7.2-14
pve-kernel-5.13: 7.1-9
pve-kernel-5.19.17-1-pve: 5.19.17-1
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.15.53-1-pve: 5.15.53-1
pve-kernel-5.15.39-1-pve: 5.15.39-1
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
openvswitch-switch: 2.15.0+ds1-2+deb11u1
proxmox-backup-client: 2.3.1-1
proxmox-backup-file-restore: 2.3.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.5-6
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
pve-zsync: 2.2.3
qemu-server: 7.3-1
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1

Hope it's helpfull.

Udo
 
Code:
Thread 1 (Thread 0x7ff54b2df040 (LWP 38494) "kvm"):
#0  0x00007ff555ef0e26 in internal_fallocate64 (fd=-137380896, offset=80, len=140735731015936) at ../sysdeps/posix/posix_fallocate64.c:36
#1  0x0000000000000000 in ?? ()
[Inferior 1 (process 38494) detached]
Hope it's helpfull.
Yes it is. I've seen this issue once before and @mira and me had the chance to debug it on a customer's machine. What happened there is that the Procedure Linkage Table got corrupted and when the address of __ppoll was looked up, it ended up with the address of internal_fallocate64 instead, because of the corruption. But the memory region is mapped read-only, so the QEMU process shouldn't be able to mess it up and so it most likely is a low-level, i.e. kernel/firmware/hardware bug.

From the first post it sounds like you also experienced the issue with the 6.2 kernel and on both AMD and Intel? Would still be good to verify with the backtrace, but that likely rules out firmware/hardware bugs.

Can you try using aio=threads instead of the default aio=io_uring in the advanced disk options?
 
  • Like
Reactions: Dunuin
Yes it is. I've seen this issue once before and @mira and me had the chance to debug it on a customer's machine. What happened there is that the Procedure Linkage Table got corrupted and when the address of __ppoll was looked up, it ended up with the address of internal_fallocate64 instead, because of the corruption. But the memory region is mapped read-only, so the QEMU process shouldn't be able to mess it up and so it most likely is a low-level, i.e. kernel/firmware/hardware bug.

From the first post it sounds like you also experienced the issue with the 6.2 kernel and on both AMD and Intel? Would still be good to verify with the backtrace, but that likely rules out firmware/hardware bugs.

Can you try using aio=threads instead of the default aio=io_uring in the advanced disk options?
Hi,
yes, this happens on intel* and AMD (AMD EPYC 7542 32-Core Processor)

* Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz + Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz

iothread is on mostly all our VMs aktive due issues with migrationg before - so also the freezed VMs from my first post:
Code:
aio=threads,discard=on,iothread=1
But the VM freezed today hasn't this settings, but due the same issue with aio=threads it's looks not, that this is the solution…
Settings from the straced VM
Code:
cpu: kvm64,flags=+pcid;+spec-ctrl
scsihw: virtio-scsi-pci
scsi0: pve01pool:vm-103-disk-1,discard=on,size=80G
scsi1: pve01pool:vm-103-disk-2,discard=on,size=500G
ballooning was enabled before but is now disabled.

We see the issue with 6.2 and 5.19 kernel.


Udo
 
Hi,
when a VM gets stuck, you can run strace -c -p $(cat /var/run/qemu-server/<ID>.pid) with the ID of the VM. Press Ctrl+C after about 10 seconds to get the output.



When you share this information, please also share the output of qm config <ID> and pveversion -v to make it easier to correlate things.

We are also suffering from random freezes on a few windows VMs. (seems like 3 out of around 90 vms are affected here).

Output :
strace:
Code:
root@pve291:~# strace -c -p $(cat /var/run/qemu-server/2441.pid)
strace: Process 1240671 attached
^Cstrace: Process 1240671 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 90.65    0.791175         142      5560           ppoll
  6.04    0.052695           2     21018           write
  1.62    0.014107           2      5407           read
  1.58    0.013752           2      5144           recvmsg
  0.07    0.000612           6       100           sendmsg
  0.02    0.000211          10        21           close
  0.01    0.000083           3        21           accept4
  0.01    0.000052           1        42           fcntl
  0.01    0.000044           2        21           getsockname
  0.00    0.000011           1         9           futex
  0.00    0.000010           2         4           ioctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.872752          23     37347           total

root@pve291:~#
qm config:
Code:
root@pve291:~# qm config 2441
agent: 1
boot: order=scsi0;ide2;net0
cores: 6
cpu: Broadwell
ide2: none,media=cdrom
ipconfig0: ip=<removed>
machine: pc-i440fx-7.1
memory: 65536
meta: creation-qemu=7.1.0,ctime=1679561676
name: adm-wts10-pw
net0: virtio=06:7D:EA:9D:3F:5D,bridge=vmbr0,firewall=1,tag=212
numa: 1
ostype: win11
scsi0: bco:vm-2441-disk-0,aio=native,cache=none,discard=on,iothread=1,size=100G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=744442b2-fc16-4afd-84f8-011efc82efa2
sockets: 2
tablet: 1
vcpus: 10
vmgenid: 31cc461c-da95-48d7-ba75-24cc265b9c96
root@pve291:~#

pveversion -v
Code:
root@pve291:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.13.19-6-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.102-1-pve: 5.15.102-1
pve-kernel-5.13.19-6-pve: 5.13.19-15
ceph: 17.2.5-pve1
ceph-fuse: 17.2.5-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
openvswitch-switch: 2.15.0+ds1-2+deb11u2.1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20221111-2
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1
root@pve291:~#

The host running this VM is a Dell server with Intel Xeon CPUs:
1686414699402.png

Storage for the VM is on Ceph
 
Hi,
when a VM gets stuck, you can run strace -c -p $(cat /var/run/qemu-server/<ID>.pid) with the ID of the VM. Press Ctrl+C after about 10 seconds to get the output.

You can also install debugger and debug symbols apt install pve-qemu-kvm-dbg gdb and then run gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/<ID>.pid).

When you share this information, please also share the output of qm config <ID> and pveversion -v to make it easier to correlate things.

If we're lucky, those will give some idea where it's stuck.

If you don't have latest microcode and BIOS update installed, please try that first.
Hi Fiona,

I've been having the same issue, primarily with Windows guest VMs. It's happened to all my Windows VMs; the one that the data is for is running the insider builds of Windows 11, but I've had this happen with the release version of Windows 10 too. I think it may have happened once to a Linux VM; FreeBSD VMs have been rock solid. This is a homelab so nothing too serious, but rather annoying. Running 6.2 kernels now, but I am pretty sure I saw this with 5.whatever it was, this is why I went to the newer kernel branch.

The data you have asked for, hope this is helpful:
root@xenon:~# strace -c -p $(cat /var/run/qemu-server/100.pid)
strace: Process 2604346 attached
^Cstrace: Process 2604346 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
98.86 10.178630 3363 3026 ppoll
0.88 0.090523 7 11488 write
0.20 0.021076 7 2811 recvmsg
0.05 0.005662 1 2955 read
0.00 0.000353 6 56 sendmsg
0.00 0.000031 2 12 close
0.00 0.000021 1 12 accept4
0.00 0.000007 0 12 getsockname
0.00 0.000007 0 24 fcntl
------ ----------- ----------- --------- --------- ----------------
100.00 10.296310 504 20396 total

root@xenon:~# gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/100.pid)[New LWP 2604347]
[New LWP 2604368]
[New LWP 2604371]
[New LWP 2604373]
[New LWP 736069]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f7c8b5d7a66 in __ppoll (fds=0x55c2863c2810, nfds=79, timeout=<optimized out>, timeout@entry=0x7ffc7b228aa0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
44 ../sysdeps/unix/sysv/linux/ppoll.c: No such file or directory.

Thread 6 (Thread 0x7f7c809d6040 (LWP 736069) "iou-wrk-2604346"):
#0 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 5 (Thread 0x7f7a72fbf700 (LWP 2604373) "vnc_worker"):
#0 futex_wait_cancelable (private=0, expected=0, futex_word=0x55c2868d6aa8) at ../sysdeps/nptl/futex-internal.h:186
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55c2868d6ab8, cond=0x55c2868d6a80) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=cond@entry=0x55c2868d6a80, mutex=mutex@entry=0x55c2868d6ab8) at pthread_cond_wait.c:638
#3 0x000055c2846d19cb in qemu_cond_wait_impl (cond=0x55c2868d6a80, mutex=0x55c2868d6ab8, file=0x55c284748434 "../ui/vnc-jobs.c", line=248) at ../util/qemu-thread-posix.c:220
#4 0x000055c2841605c3 in vnc_worker_thread_loop (queue=0x55c2868d6a80) at ../ui/vnc-jobs.c:248
#5 0x000055c284161288 in vnc_worker_thread (arg=arg@entry=0x55c2868d6a80) at ../ui/vnc-jobs.c:361
#6 0x000055c2846d0e89 in qemu_thread_start (args=0x7f7a72fba570) at ../util/qemu-thread-posix.c:505
#7 0x00007f7c8bcdbea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8 0x00007f7c8b5e3a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f7c79fff700 (LWP 2604371) "CPU 1/KVM"):
#0 0x00007f7c8b5d9237 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055c284549997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55c2863c6030, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055c284549b01 in kvm_cpu_exec (cpu=cpu@entry=0x55c2863c6030) at ../accel/kvm/kvm-all.c:2850
#3 0x000055c28454b17d in kvm_vcpu_thread_fn (arg=arg@entry=0x55c2863c6030) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055c2846d0e89 in qemu_thread_start (args=0x7f7c79ffa570) at ../util/qemu-thread-posix.c:505
#5 0x00007f7c8bcdbea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f7c8b5e3a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f7c7befe700 (LWP 2604368) "CPU 0/KVM"):
#0 0x00007f7c8b5d9237 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055c284549997 in kvm_vcpu_ioctl (cpu=cpu@entry=0x55c2858e6d20, type=type@entry=44672) at ../accel/kvm/kvm-all.c:3035
#2 0x000055c284549b01 in kvm_cpu_exec (cpu=cpu@entry=0x55c2858e6d20) at ../accel/kvm/kvm-all.c:2850
#3 0x000055c28454b17d in kvm_vcpu_thread_fn (arg=arg@entry=0x55c2858e6d20) at ../accel/kvm/kvm-accel-ops.c:51
#4 0x000055c2846d0e89 in qemu_thread_start (args=0x7f7c7bef9570) at ../util/qemu-thread-posix.c:505
#5 0x00007f7c8bcdbea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f7c8b5e3a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f7c80874700 (LWP 2604347) "call_rcu"):
#0 syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1 0x000055c2846d204a in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /build/pve-qemu/pve-qemu-kvm-7.2.0/include/qemu/futex.h:29
#2 qemu_event_wait (ev=ev@entry=0x55c284f33328 <rcu_call_ready_event>) at ../util/qemu-thread-posix.c:430
#3 0x000055c2846da94a in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:261
#4 0x000055c2846d0e89 in qemu_thread_start (args=0x7f7c8086f570) at ../util/qemu-thread-posix.c:505
#5 0x00007f7c8bcdbea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007f7c8b5e3a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f7c809d6040 (LWP 2604346) "kvm"):
#0 0x00007f7c8b5d7a66 in __ppoll (fds=0x55c2863c2810, nfds=79, timeout=<optimized out>, timeout@entry=0x7ffc7b228aa0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
#1 0x000055c2846e5e11 in ppoll (__ss=0x0, __timeout=0x7ffc7b228aa0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/poll2.h:77
#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=timeout@entry=2999188054) at ../util/qemu-timer.c:351
#3 0x000055c2846e3675 in os_host_main_loop_wait (timeout=2999188054) at ../util/main-loop.c:315
#4 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:606
#5 0x000055c284300191 in qemu_main_loop () at ../softmmu/runstate.c:739
#6 0x000055c284139aa7 in qemu_default_main () at ../softmmu/main.c:37
#7 0x00007f7c8b50bd0a in __libc_start_main (main=0x55c284134c60 <main>, argc=79, argv=0x7ffc7b228c68, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc7b228c58) at ../csu/libc-start.c:308
#8 0x000055c2841399da in _start ()
[Inferior 1 (process 2604346) detached]

root@xenon:~# qm config 100
agent: 1
balloon: 2048
bios: ovmf
boot: cdn
bootdisk: scsi0
cores: 2
cpu: kvm64
efidisk0: local-lvm:vm-100-disk-2,efitype=4m,pre-enrolled-keys=1,size=4M
hotplug: disk,network,usb,cpu
ide2: none,media=cdrom
machine: pc-q35-7.2
memory: 8192
name: win10-insider
net0: virtio=22:0A:07:F2:62:E1,bridge=vmbr0
numa: 0
ostype: win11
scsi0: local-lvm:vm-100-disk-1,discard=on,size=192G
scsihw: virtio-scsi-pci
smbios1: uuid=892a4948-9c70-4d86-a37a-55db4ee035b2
sockets: 1
tpmstate0: local-lvm:vm-100-disk-0,size=4M,version=v2.0
root@xenon:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 6.2.9-1-pve)
pve-manager: 7.4-13 (running version: 7.4-13/46c37d9c)
pve-kernel-6.2: 7.4-3
pve-kernel-5.15: 7.4-3
pve-kernel-6.2.11-2-pve: 6.2.11-2
pve-kernel-6.2.11-1-pve: 6.2.11-1
pve-kernel-6.2.9-1-pve: 6.2.9-1
pve-kernel-5.15.107-2-pve: 5.15.107-2
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-1
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.7.2
pve-cluster: 7.3-3
pve-container: 4.4-4
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-4
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1
 
Hi,
when a VM gets stuck, you can run strace -c -p $(cat /var/run/qemu-server/<ID>.pid) with the ID of the VM. Press Ctrl+C after about 10 seconds to get the output.

You can also install debugger and debug symbols apt install pve-qemu-kvm-dbg gdb and then run gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/<ID>.pid).

When you share this information, please also share the output of qm config <ID> and pveversion -v to make it easier to correlate things.

If we're lucky, those will give some idea where it's stuck.

If you don't have latest microcode and BIOS update installed, please try that first.

I also got another frozen VM today. Output of the commands you asked for are as follows:

Strace:
Code:
strace -c -p $(cat /var/run/qemu-server/144.pid)
strace: Process 3703171 attached
^Cstrace: Process 3703171 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 98.72   42.483064       15711      2704           ppoll
  0.90    0.389272          38     10180           write
  0.31    0.134712          54      2492           recvmsg
  0.06    0.027261          10      2617           read
  0.00    0.000144           2        49           sendmsg
  0.00    0.000041           3        11           accept4
  0.00    0.000037           3        11           close
  0.00    0.000022           1        22           fcntl
  0.00    0.000010           0        11           getsockname
------ ----------- ----------- --------- --------- ----------------
100.00   43.034563        2377     18097           total

gdb:
*Too long to paste here, so attached as a file*

qm config:
Code:
qm config 144
balloon: 4096
boot: order=scsi0
cores: 4
cpu: Broadwell
ide2: none,media=cdrom
memory: 32768
name: GEIS-beta-15
net0: virtio=46:D3:97:24:58:FF,bridge=vmbr0,tag=37
numa: 1
onboot: 1
ostype: l26
scsi0: hesi-storage:vm-144-disk-0,size=2000G
scsihw: virtio-scsi-pci
smbios1: uuid=0da1ad38-77db-4165-8bbd-83c568f8cbcc
sockets: 2

pveversion:
Code:
pveversion -v
proxmox-ve: 7.4-1 (running kernel: 6.2.11-2-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-6.2: 7.4-3
pve-kernel-5.15: 7.4-3
pve-kernel-6.2.11-2-pve: 6.2.11-2
pve-kernel-6.2.9-1-pve: 6.2.9-1
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.104-1-pve: 5.15.104-2
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 17.2.6-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-3
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-1
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.6
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.2-1
proxmox-backup-file-restore: 2.4.2-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.7.0
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-2
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1

I now disabled ballooning and use `aio_threads` as you suggested. I will keep you up to in the loop whether this helps
 

Attachments

  • hanging_vm_debug.txt
    18.4 KB · Views: 20
Last edited:
There might be at least two different issues here:
  • the one reported by @udo and one customer which is the strange PLT corruption and where it will loop wrongly jumping to internal_fallocate64 and the vCPU threads will be in preadv64v2
  • the one reported by @VivienM and @coenvl where it hangs in ppoll. It might wait for some event that never happens. It might also be some other form of the PLT corruption, but where it's not obvious that it jumps to some strange place in the code. The former one is more likely IMHO, but I really can't tell just from the log. Can you tell us what CPU model your host has?
 
Hi, on a hunch: Since at least @coenvl and @hans-olav are using Ceph storage: If you experience freezes and some VM disks are stored on Ceph, it would be interesting to check the number of open file descriptors of the QEMU processes. If this number is close to the file descriptor limit, the freezes could be related to bug 4507 [1]. Could you run the following command and post the output? It queries the current file descriptor limit and number of open file descriptors for every QEMU process.
Code:
for pid in $(pidof kvm); do prlimit -p $pid | grep NOFILE; ls -1 /proc/$pid/fd/ | wc -l; done

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=4507
 
  • Like
Reactions: fiona
There might be at least two different issues here:
  • the one reported by @udo and one customer which is the strange PLT corruption and where it will loop wrongly jumping to internal_fallocate64 and the vCPU threads will be in preadv64v2
  • the one reported by @VivienM and @coenvl where it hangs in ppoll. It might wait for some event that never happens. It might also be some other form of the PLT corruption, but where it's not obvious that it jumps to some strange place in the code. The former one is more likely IMHO, but I really can't tell just from the log. Can you tell us what CPU model your host has?

Thank you so much already for helping out here. The host that the log was from this morning has 2 Intel Xeon Silver 4114T CPUs, that are using the Skylake architecture. The problem also occurs on other nodes that have Intel Xeon CPU E5-2640 v4 CPUs with a Broadwell architecture.


@fweber, as far as I can tell the file descriptors are okay right now, but of course at the moment the VM is running fine. The output of the file descriptors are as follows:

Code:
for pid in $(pidof kvm); do prlimit -p $pid | grep NOFILE; ls -1 /proc/$pid/fd/ | wc -l; done
NOFILE     max number of open files                1024      4096 files
0
NOFILE     max number of open files                  1024     1048576 files
79
NOFILE     max number of open files                1024      4096 files
0
NOFILE     max number of open files                  1024     1048576 files
91
NOFILE     max number of open files                1024      4096 files
0
NOFILE     max number of open files                  1024     1048576 files
79
NOFILE     max number of open files                1024      4096 files
0
NOFILE     max number of open files                  1024     1048576 files
79
NOFILE     max number of open files                1024      4096 files
0
NOFILE     max number of open files                  1024     1048576 files
79
NOFILE     max number of open files                1024      4096 files
0
NOFILE     max number of open files                  1024     1048576 files
91
NOFILE     max number of open files                1024      4096 files
0
NOFILE     max number of open files                  1024     1048576 files
91
NOFILE     max number of open files                1024      4096 files
0
NOFILE     max number of open files                  1024     1048576 files
91
NOFILE     max number of open files                1024      4096 files
0
NOFILE     max number of open files                  1024     1048576 files
79
NOFILE     max number of open files                1024      4096 files
0
NOFILE     max number of open files                1024    524288 files
90
NOFILE     max number of open files                1024      4096 files
0
NOFILE     max number of open files                1024    524288 files
79
NOFILE     max number of open files                1024      4096 files
0
NOFILE     max number of open files                  1024     1048576 files
79
NOFILE     max number of open files                1024      4096 files
0
NOFILE     max number of open files                1024    524288 files
80
 
Hi, on a hunch: Since at least @coenvl and @hans-olav are using Ceph storage: If you experience freezes and some VM disks are stored on Ceph, it would be interesting to check the number of open file descriptors of the QEMU processes. If this number is close to the file descriptor limit, the freezes could be related to bug 4507 [1]. Could you run the following command and post the output? It queries the current file descriptor limit and number of open file descriptors for every QEMU process.
Code:
for pid in $(pidof kvm); do prlimit -p $pid | grep NOFILE; ls -1 /proc/$pid/fd/ | wc -l; done

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=4507

I will try this the next time i catch a live freeze situation. It usually ends in a reboot after 1 hour of freeze if left untouched.
 
There might be at least two different issues here:
  • the one reported by @udo and one customer which is the strange PLT corruption and where it will loop wrongly jumping to internal_fallocate64 and the vCPU threads will be in preadv64v2
  • the one reported by @VivienM and @coenvl where it hangs in ppoll. It might wait for some event that never happens. It might also be some other form of the PLT corruption, but where it's not obvious that it jumps to some strange place in the code. The former one is more likely IMHO, but I really can't tell just from the log. Can you tell us what CPU model your host has?

This is an Intel i5-4590 (Haswell). (It's a homelab, nothing fancy here)

I should add in case it's relevant that I have intel-microcode installed; this machine should be running the latest (last?) BIOS from Dell.
 
Last edited:
hi to all !

I also use proxmox and ceph. And encountered this problem maybe 2 month ago. I have installation that worked 5 years without problems. I didn't upgrade hardware for this 5 years. My processors are 48 x Intel(R) Xeon(R) CPU E5-4607 0 @ 2.20GHz (4 Sockets)

And my debug of freezing vm:
Code:
strace -c -p $(cat /var/run/qemu-server/104.pid)
strace: Process 2919614 attached
^Cstrace: Process 2919614 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 38.29    0.044231          26      1664           ppoll
 22.90    0.026459           4      6344           write
 18.21    0.021038           6      3341           clock_gettime
 16.59    0.019163          11      1631           read
  2.90    0.003354         139        24           ioctl
  1.04    0.001200           0      1553           recvmsg
  0.04    0.000050           1        30           sendmsg
  0.01    0.000012           2         6           accept4
  0.01    0.000010           1         6           close
  0.01    0.000007           1         6           getsockname
  0.01    0.000006           0        12           fcntl
------ ----------- ----------- --------- --------- ----------------
100.00    0.115530           7     14617           total



gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/104.pid)
[New LWP 2919615]
[New LWP 2919679]
[New LWP 2919685]
[New LWP 2919824]
[New LWP 2919832]
[New LWP 2919896]
[New LWP 2920041]
[New LWP 2921091]
[New LWP 2961283]
[New LWP 2977065]
[New LWP 1096174]

warning: Could not load vsyscall page because no executable was specified
0x00007fc2d2e354f6 in ?? ()

Thread 12 (LWP 1096174 "iou-wrk-2919679"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 11 (LWP 2977065 "iou-wrk-2919679"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 10 (LWP 2961283 "iou-wrk-2919614"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 9 (LWP 2921091 "iou-wrk-2919614"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 8 (LWP 2920041 "iou-wrk-2919614"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 7 (LWP 2919896 "iou-wrk-2919679"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 6 (LWP 2919832 "iou-wrk-2919614"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 5 (LWP 2919824 "iou-wrk-2919679"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 4 (LWP 2919685 "kvm"):
#0  0x00007fc2d2f187b2 in ?? ()
#1  0x0000000000002eff in ?? ()
#2  0x0000000000000001 in ?? ()
#3  0x0000000000000004 in ?? ()
#4  0x00007fc277de9f50 in ?? ()
#5  0x0000000100000010 in ?? ()
#6  0x0000001800093400 in ?? ()
#7  0x00007fc2d2f18540 in ?? ()
#8  0x00007fc277de9f50 in ?? ()
#9  0x00007fc277dea080 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 3 (LWP 2919679 "kvm"):
#0  0x00007fc2d2e36cc7 in ?? ()
#1  0x000056266b816f87 in ?? ()
#2  0x0000000000000400 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 2 (LWP 2919615 "kvm"):
#0  0x00007fc2d2e3a9b9 in ?? ()
#1  0x000056266b98a9fa in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 1 (LWP 2919614 "kvm"):
#0  0x00007fc2d2e354f6 in ?? ()
#1  0xffffffff002a0e5f in ?? ()
#2  0x000056266f869800 in ?? ()
#3  0x000000000000004a in ?? ()
#4  0x0000000000000000 in ?? ()
[Inferior 1 (process 2919614) detached]


qm config 104
agent: 1
balloon: 0
boot: order=scsi0;ide2
cores: 1
ide2: none,media=cdrom
memory: 1024
meta: creation-qemu=6.2.0,ctime=1660818675
name: nfs-storage
net0: virtio=C6:54:0A:3E:C0:27,bridge=vmbr1
net1: virtio=22:2A:6F:F2:14:02,bridge=vmbr0
numa: 0
ostype: l26
scsi0: ceph_storage:vm-104-disk-0,aio=native,size=50G
scsi1: ceph_storage:vm-104-disk-1,aio=native,size=130G
smbios1: uuid=ac8af423-c341-4d85-bd15-86986832fa6e
sockets: 1
unused0: hdd_storage:vm-104-disk-0
unused1: hdd_storage:vm-104-disk-1
vmgenid: 27799ca6-f4f4-478e-abb1-eff84827c98b


pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.35-2-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-4
pve-kernel-helper: 7.2-4
pve-kernel-5.4: 6.4-16
pve-kernel-5.15.35-2-pve: 5.15.35-5
pve-kernel-5.4.178-1-pve: 5.4.178-1
pve-kernel-5.4.174-2-pve: 5.4.174-2
pve-kernel-4.4.134-1-pve: 4.4.134-112
pve-kernel-4.4.98-5-pve: 4.4.98-105
pve-kernel-4.4.95-1-pve: 4.4.95-99
ceph: 16.2.9-pve1
ceph-fuse: 16.2.9-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-3
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-5
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.12-1
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1

soft limits debug:
Code:
for pid in $(pidof kvm); do prlimit -p $pid | grep NOFILE; ls -1 /proc/$pid/fd/ | wc -l; done
NOFILE     max number of open files                1024    524288 files
46
NOFILE     max number of open files                  1024     1048576 files
45
NOFILE     max number of open files                1024    524288 files
101
NOFILE     max number of open files                1024    524288 files
38
NOFILE     max number of open files                1024    524288 files
104
NOFILE     max number of open files                1024    524288 files
53
NOFILE     max number of open files                1024    524288 files
38
NOFILE     max number of open files                1024    524288 files
47
NOFILE     max number of open files                1024    524288 files
101
NOFILE     max number of open files                1024    524288 files
44
 
hi to all !

I also use proxmox and ceph. And encountered this problem maybe 2 month ago. I have installation that worked 5 years without problems. I didn't upgrade hardware for this 5 years. My processors are 48 x Intel(R) Xeon(R) CPU E5-4607 0 @ 2.20GHz (4 Sockets)

And my debug of freezing vm:
Code:
strace -c -p $(cat /var/run/qemu-server/104.pid)
strace: Process 2919614 attached
^Cstrace: Process 2919614 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 38.29    0.044231          26      1664           ppoll
 22.90    0.026459           4      6344           write
 18.21    0.021038           6      3341           clock_gettime
 16.59    0.019163          11      1631           read
  2.90    0.003354         139        24           ioctl
  1.04    0.001200           0      1553           recvmsg
  0.04    0.000050           1        30           sendmsg
  0.01    0.000012           2         6           accept4
  0.01    0.000010           1         6           close
  0.01    0.000007           1         6           getsockname
  0.01    0.000006           0        12           fcntl
------ ----------- ----------- --------- --------- ----------------
100.00    0.115530           7     14617           total
This doesn't look like the QEMU process is stuck. How is the VM stuck? What do the logs within the guest say?

Code:
gdb --batch --ex 't a a bt' -p $(cat /var/run/qemu-server/104.pid)
[New LWP 2919615]
[New LWP 2919679]
[New LWP 2919685]
[New LWP 2919824]
[New LWP 2919832]
[New LWP 2919896]
[New LWP 2920041]
[New LWP 2921091]
[New LWP 2961283]
[New LWP 2977065]
[New LWP 1096174]

warning: Could not load vsyscall page because no executable was specified
0x00007fc2d2e354f6 in ?? ()

Thread 12 (LWP 1096174 "iou-wrk-2919679"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 11 (LWP 2977065 "iou-wrk-2919679"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 10 (LWP 2961283 "iou-wrk-2919614"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 9 (LWP 2921091 "iou-wrk-2919614"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 8 (LWP 2920041 "iou-wrk-2919614"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 7 (LWP 2919896 "iou-wrk-2919679"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 6 (LWP 2919832 "iou-wrk-2919614"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 5 (LWP 2919824 "iou-wrk-2919679"):
#0  0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x0

Thread 4 (LWP 2919685 "kvm"):
#0  0x00007fc2d2f187b2 in ?? ()
#1  0x0000000000002eff in ?? ()
#2  0x0000000000000001 in ?? ()
#3  0x0000000000000004 in ?? ()
#4  0x00007fc277de9f50 in ?? ()
#5  0x0000000100000010 in ?? ()
#6  0x0000001800093400 in ?? ()
#7  0x00007fc2d2f18540 in ?? ()
#8  0x00007fc277de9f50 in ?? ()
#9  0x00007fc277dea080 in ?? ()
#10 0x0000000000000000 in ?? ()

Thread 3 (LWP 2919679 "kvm"):
#0  0x00007fc2d2e36cc7 in ?? ()
#1  0x000056266b816f87 in ?? ()
#2  0x0000000000000400 in ?? ()
#3  0x0000000000000000 in ?? ()

Thread 2 (LWP 2919615 "kvm"):
#0  0x00007fc2d2e3a9b9 in ?? ()
#1  0x000056266b98a9fa in ?? ()
#2  0x0000000000000000 in ?? ()

Thread 1 (LWP 2919614 "kvm"):
#0  0x00007fc2d2e354f6 in ?? ()
#1  0xffffffff002a0e5f in ?? ()
#2  0x000056266f869800 in ?? ()
#3  0x000000000000004a in ?? ()
#4  0x0000000000000000 in ?? ()
[Inferior 1 (process 2919614) detached]
There's no debug information here. It could be that the VM was still running with an older version than the installed one, otherwise make sure to install the pve-qemu-kvm-dbg package.

soft limits debug:
Code:
for pid in $(pidof kvm); do prlimit -p $pid | grep NOFILE; ls -1 /proc/$pid/fd/ | wc -l; done
NOFILE     max number of open files                1024    524288 files
46
NOFILE     max number of open files                  1024     1048576 files
45
NOFILE     max number of open files                1024    524288 files
101
NOFILE     max number of open files                1024    524288 files
38
NOFILE     max number of open files                1024    524288 files
104
NOFILE     max number of open files                1024    524288 files
53
NOFILE     max number of open files                1024    524288 files
38
NOFILE     max number of open files                1024    524288 files
47
NOFILE     max number of open files                1024    524288 files
101
NOFILE     max number of open files                1024    524288 files
44
Doesn't look problematic. Anything in the Ceph logs?
 
Adding a +1 here.

In my case I'm having this issue in a 5 node cluster with Intel CPUs of different generations, all Windows 10 or 2019 VMs. vCPU is "Broadwell-noTSX-IBRS" to allow live migration (although it does happen with vCPU host too). Numa is enabled in VM settings. Latests BIOS and microcode packages installed. Storage is CEPH with 16 OSD. Tried with kernels 6.1 and 6.2 and currently testing with 5.15.

This issue is happening with the VMs on two of the five nodes. The KSM in these two hosts usually deduplicate quite a bit of memory, 40GB+. When the VM hungs with 100% CPU if I live migrate the VM to another node it does recover and works again. I can live migrate it back to the original host and keeps working for some time (which varies from a few days to weeks). Feels like this is somehow related to the way QEMU tries to get memory at some point.

Does your host(s) also have 40GB+ KSM usage? Have you tried live migrating the VMs?

Will get the requested details next time I see this issue.
 
This doesn't look like the QEMU process is stuck. How is the VM stuck? What do the logs within the guest say?


There's no debug information here. It could be that the VM was still running with an older version than the installed one, otherwise make sure to install the pve-qemu-kvm-dbg package.


Doesn't look problematic. Anything in the Ceph logs?


How is the VM stuck? What do the logs within the guest say?
There is no ssh access, the vnc console does not respond to requests. The static picture is standing. The log on the machine itself is abruptly interrupted and resumes after restarting:

Code:
June 16, 20:12:22 dev-kafka-2-kraft kafka-server-start.sh[23704]: [2023-06-16 20:12:22,285] INFO [Craft Manager NodeID=2] Request to vote VoteRequestDa>
June 16, 20:12:22 dev-kafka-2-kraft kafka-server-start.sh[23704]: [2023-06-16 20:12:22,368] INFORMATION [Root Manager Node ID=2] Transition to Fo is completed>
June 16, 20:12:22 dev-kafka-2-kraft kafka-server-start.sh[23704]: [2023-06-16 20:12:22,370] INFORMATION [BrokerToControllerChannelManager broker=2 name=h>
-- Bootable fbd87f21176742cb8ab0717732d2b6bc --
June 16, 21:27:04 dev-kafka-2-kraft kernel: Linux version 5.15.0-73-generic (buildd@bos03-amd64-060) (gcc (Ubuntu 11.3.0-1ubuntu1~04/22/04) 11.3.0,>
June 16, 21:27:04 dev-kafka kernel-2-kraft: Command line: BOOT_IMAGE=/vmlinuz-5.15.0-73- common root=/dev/mapper/ap--vg-ap--lv--root root=1 ip>
June 16, 21:27:04 dev-kafka-2-kraft core: Core-supported processors:
on the proxmox node itself in the syslog i see this:

Code:
 June 16, 18:36:32 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_18] [SAT], 16 currently unreadable (pending) sectors
June 16, 18:36:32 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_18] [SAT], 16 autonomous incorrigible sectors
June 16, 18:36:32 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_24], INTELLIGENT failure: FAILURE PREDICTION THRESHOLD EXCEEDED
June 16, 19:06:33 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_18] [SAT], 16 currently unreadable (pending) sectors
June 16, 19:06:33 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_18] [SAT], 16 autonomous incorrigible sectors
June 16, 19:06:33 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_24], INTELLIGENT failure: FAILURE PREDICTION THRESHOLD EXCEEDED
June 16, 19:36:33 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_18] [SAT], 16 currently unreadable sectors (pending)
June 16, 19:36:33 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_18] [SAT], 16 autonomous incorrigible sectors
June 16, 19:36:33 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_24], INTELLIGENT failure: FAILURE PREDICTION THRESHOLD EXCEEDED
June 16, 20:06:33 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_18] [SAT], 16 currently unreadable (pending) sectors
June 16, 20:06:33 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_18] [SAT], 16 autonomous incorrigible sectors
June 16, 20:06:33 petr-stor4 smartd[1489]: Device: /dev/bus/0 [megaraid_disk_24], INTELLIGENT failure: FAILURE PREDICTION THRESHOLD EXCEEDED
June 16, 20:08:14 petr-stor4 pvestatd[2860]: VM command error 105 qmp - VM command error 105 qmp 'request-proxmox-support' - timeout received
June 16, 20:18:49 petr-stor4 pvedaemon[1718506]: Failed to execute VM 121 qmp command - failed to execute VM 121 qmp 'guest-ping' command - timeout received
June 16, 20:19:08 petr-stor4 pvedaemon[1727264]: Failed to execute VM 121 qmp command - failed to execute VM 121 qmp 'guest-ping' command - timeout received
June 16, 20:19:28 petr-stor4 pvedaemon[1717223]: Failed to execute VM 121 qmp command - failed to execute VM 121 qmp 'guest-ping' command - timeout received
June 16, 20:21:24 petr-stor4 pvedaemon[1727264]: Failed to execute VM 121 qmp command - failed to execute VM 121 qmp 'guest-ping' command - timeout received
June 16, 20:21:43 petr-stor4 pvedaemon[1717223]: Failed to execute VM 121 qmp command - failed to execute VM 121 qmp 'guest-ping' command - timeout received
June 16, 20:22:05 petr-stor4 pvedaemon[1718506]: Failed to execute VM 121 qmp command - failed to execute VM 121 qmp 'guest-ping' command - timeout received

maybe my osd.54 slowly dying and so the machines freeze ? But i have replication factor 2/3 in my ceph... I have a Ceph warning in the PVE UI yesterday which said 1 daemons have recently crashed osd.54 crashed on host *****. But for now osd.54 service works well. Backtrace of osd.54 crash:

Code:
{
    "backtrace": [
        "/lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7f69cb483140]",
        "(BlueStore::Extent::~Extent()+0x27) [0x55d6e1ebb8e7]",
        "(BlueStore::Onode::put()+0x2c5) [0x55d6e1e32f25]",
        "(std::_Hashtable<ghobject_t, std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, mempool::pool_allocator<(mempool::pool_index_t)4, std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> > >, std::__detail::_Select1st, std::equal_to<ghobject_t>, std::hash<ghobject_t>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_erase(unsigned long, std::__detail::_Hash_node_base*, std::__detail::_Hash_node<std::pair<ghobject_t const, boost::intrusive_ptr<BlueStore::Onode> >, true>*)+0x67) [0x55d6e1ebc2c7]",
        "(LruOnodeCacheShard::_trim_to(unsigned long)+0xca) [0x55d6e1ebfb5a]",
        "(BlueStore::OnodeSpace::add(ghobject_t const&, boost::intrusive_ptr<BlueStore::Onode>&)+0x15d) [0x55d6e1e3371d]",
        "(BlueStore::Collection::get_onode(ghobject_t const&, bool, bool)+0x399) [0x55d6e1e3a309]",
        "(BlueStore::_txc_add_transaction(BlueStore::TransContext*, ceph::os::Transaction*)+0x154d) [0x55d6e1e814dd]",
        "(BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x2e0) [0x55d6e1e82430]",
        "(non-virtual thunk to PrimaryLogPG::queue_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, boost::intrusive_ptr<OpRequest>)+0x52) [0x55d6e1aa8412]",
        "(ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t const&, eversion_t const&, std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&, eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t> >&&, std::optional<pg_hit_set_history_t>&, Context*, unsigned long, osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x7b4) [0x55d6e1cb8ef4]",
        "(PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*, PrimaryLogPG::OpContext*)+0x53d) [0x55d6e1a2418d]",
        "(PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0xd46) [0x55d6e1a80326]",
        "(PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x334a) [0x55d6e1a87c6a]",
        "(OSD::dequeue_op(boost::intrusive_ptr<PG>, boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1bc) [0x55d6e18f789c]",
        "(ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x65) [0x55d6e1b77505]",
        "(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xa27) [0x55d6e1924367]",
        "(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x41a) [0x55d6e1fcd3da]",
        "(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55d6e1fcf9b0]",
        "/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f69cb477ea7]",
        "clone()"
    ],
    "ceph_version": "16.2.9",
    "crash_id": "2023-06-15T20:43:02.275025Z_aad0cf01-3839-41a3-b8bd-d516080722b1",
    "entity_name": "osd.54",
    "os_id": "11",
    "os_name": "Debian GNU/Linux 11 (bullseye)",
    "os_version": "11 (bullseye)",
    "os_version_id": "11",
    "process_name": "ceph-osd",
    "stack_sig": "f33237076f54d8500909a0c8c279f6639d4e914520f35b288af4429eebfd958e",
    "timestamp": "2023-06-15T20:43:02.275025Z",
    "utsname_hostname": "petr-stor4",
    "utsname_machine": "x86_64",
    "utsname_release": "5.15.35-2-pve",
    "utsname_sysname": "Linux",
    "utsname_version": "#1 SMP PVE 5.15.35-5 (Wed, 08 Jun 2022 15:02:51 +0200)"
}
 
Hi!

From my experience, this settings are the most stable configuration:
> BIOS:
-disable C-States/Freq.Spectrum;
-disable PowerSaving;
-disable SMP/Hyperthreading;

> Windows Guest:
- Disable memory balloning ( use static allocation from the HOST memory, balloon=0 ),
- Do not use any "virtio-*" type of hardware,
- Storage emulation use "sata0/sata1/...", with "default" scsi-controller and "default" No cache,
- Use e1000 type of network card.

> Linux/BSD:
- Disable memory balloning ( use static allocation from the HOST memory, balloon=0 ),
- Use virtio-* type of hardware,
- Storage emulation use "scsi0/scsi1/...", with "virtio-scsi" scsi-controller and "default" No cache,
- Use virtio type of network card.
 
Hi!

From my experience, this settings are the most stable configuration:
> BIOS:
-disable C-States/Freq.Spectrum;
-disable PowerSaving;
-disable SMP/Hyperthreading;

> Windows Guest:
- Disable memory balloning ( use static allocation from the HOST memory, balloon=0 ),
- Do not use any "virtio-*" type of hardware,
- Storage emulation use "sata0/sata1/...", with "default" scsi-controller and "default" No cache,
- Use e1000 type of network card.

> Linux/BSD:
- Disable memory balloning ( use static allocation from the HOST memory, balloon=0 ),
- Use virtio-* type of hardware,
- Storage emulation use "scsi0/scsi1/...", with "virtio-scsi" scsi-controller and "default" No cache,
- Use virtio type of network card.
In my case at least (homelab with a fairly memory-constrained but CPU-underused host, which I presume is typical for most homelabs), these settings are not really ideal...
 
Got a freeze today in another cluster. This time I tried to suspend the VM, but it was still frozen when I started it again. But, hibernating the VM did allow the VM to work properly after starting it again. So as a workaround we can try to either live migrate to other node or hibernate/start the VM. This cluster uses Xeon(R) Gold 5218 CPU with latest microcode installed and running.

This is the information that I've gathered:

pveversion -v
Code:
proxmox-ve: 7.4-1 (running kernel: 6.2.9-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-6.2: 7.4-1
pve-kernel-5.15: 7.4-1
pve-kernel-5.11: 7.0-10
pve-kernel-6.2.9-1-pve: 6.2.9-1
pve-kernel-5.0: 6.0-11
pve-kernel-5.15.104-1-pve: 5.15.104-2
pve-kernel-5.15.74-1-pve: 5.15.74-1
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 17.2.5-pve1
ceph-fuse: 17.2.5-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

Open files for QEMU process
Code:
NOFILE     max number of open files                1024    524288 files
145

Strace for 12 seconds
Code:
strace: Process 3242204 attached
strace: Process 3242204 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97,99   43,116466        8673      4971           ppoll
  1,18    0,520868          27     18715           write
  0,46    0,201465          41      4814           read
  0,37    0,160786          35      4580           recvmsg
  0,00    0,000068           0        90           sendmsg
  0,00    0,000022           1        19           close
  0,00    0,000021           1        18           accept4
  0,00    0,000009           0        36           fcntl
  0,00    0,000004           0        18           getsockname
  0,00    0,000001           1         1           ioctl
------ ----------- ----------- --------- --------- ----------------
100,00   43,999710        1322     33262           total

GDB info
(attached)
Looks similar to the one posted by @udo

KSM status
KSM sharing 4k*4991264 bytes (aprox 19497 MBytes)

Memory detail (echo m > /proc/sysrq-trigger)
Code:
Jun 18 12:07:09 magallanes02 kernel: [4804375.885048]  unevictable:44806 dirty:1953 writeback:4
Jun 18 12:07:09 magallanes02 kernel: [4804375.885048]  slab_reclaimable:629607 slab_unreclaimable:429928
Jun 18 12:07:09 magallanes02 kernel: [4804375.885048]  mapped:56183 shmem:19149 pagetables:130632
Jun 18 12:07:09 magallanes02 kernel: [4804375.885048]  sec_pagetables:99925 bounce:0
Jun 18 12:07:09 magallanes02 kernel: [4804375.885048]  kernel_misc_reclaimable:0
Jun 18 12:07:09 magallanes02 kernel: [4804375.885048]  free:897594 free_pcp:31700 free_cma:0
Jun 18 12:07:09 magallanes02 kernel: [4804375.885054] Node 0 active_anon:94954592kB inactive_anon:21663136kB active_file:1463264kB inactive_file:8165648kB unevictable:175928kB isolated(anon):0kB isolated(file):0kB mapped:44232kB dirty:7508kB writeback:16kB shmem:25688kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 3928064kB writeback_tmp:0kB kernel_stack:20520kB pagetables:343868kB sec_pagetables:218896kB all_unreclaimable? no
Jun 18 12:07:09 magallanes02 kernel: [4804375.885059] Node 1 active_anon:79694956kB inactive_anon:16951816kB active_file:2755984kB inactive_file:27130656kB unevictable:3296kB isolated(anon):0kB isolated(file):0kB mapped:180500kB dirty:304kB writeback:0kB shmem:50908kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1867776kB writeback_tmp:0kB kernel_stack:14184kB pagetables:178660kB sec_pagetables:180804kB all_unreclaimable? no
Jun 18 12:07:09 magallanes02 kernel: [4804375.885064] Node 0 DMA free:11264kB boost:0kB min:4kB low:16kB high:28kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15976kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jun 18 12:07:09 magallanes02 kernel: [4804375.885070] lowmem_reserve[]: 0 1544 128464 128464 128464
Jun 18 12:07:09 magallanes02 kernel: [4804375.885075] Node 0 DMA32 free:509616kB boost:0kB min:540kB low:2120kB high:3700kB reserved_highatomic:0KB active_anon:1000112kB inactive_anon:120212kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:1716092kB managed:1650556kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
Jun 18 12:07:09 magallanes02 kernel: [4804375.885081] lowmem_reserve[]: 0 0 126920 126920 126920
Jun 18 12:07:09 magallanes02 kernel: [4804375.885085] Node 0 Normal free:1171436kB boost:4096kB min:48512kB low:178476kB high:308440kB reserved_highatomic:247808KB active_anon:93954480kB inactive_anon:21542924kB active_file:1463264kB inactive_file:8165648kB unevictable:175928kB writepending:7524kB present:132120576kB managed:129973948kB mlocked:175928kB bounce:0kB free_pcp:75920kB local_pcp:0kB free_cma:0kB
Jun 18 12:07:09 magallanes02 kernel: [4804375.885090] lowmem_reserve[]: 0 0 0 0 0
Jun 18 12:07:09 magallanes02 kernel: [4804375.885094] Node 1 Normal free:1898060kB boost:22528kB min:67676kB low:199784kB high:331892kB reserved_highatomic:2048KB active_anon:79694956kB inactive_anon:16951816kB active_file:2755984kB inactive_file:27130656kB unevictable:3296kB writepending:304kB present:134217728kB managed:132108892kB mlocked:3296kB bounce:0kB free_pcp:50880kB local_pcp:672kB free_cma:0kB
Jun 18 12:07:09 magallanes02 kernel: [4804375.885099] lowmem_reserve[]: 0 0 0 0 0
Jun 18 12:07:09 magallanes02 kernel: [4804375.885102] Node 0 DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11264kB
Jun 18 12:07:09 magallanes02 kernel: [4804375.885113] Node 0 DMA32: 13606*4kB (UME) 5975*8kB (UM) 2870*16kB (UME) 1394*32kB (UME) 699*64kB (UME) 366*128kB (UME) 184*256kB (UME) 90*512kB (UME) 31*1024kB (UME) 1*2048kB (M) 24*4096kB (ME) = 509616kB
Jun 18 12:07:09 magallanes02 kernel: [4804375.885128] Node 0 Normal: 168009*4kB (UMEH) 10802*8kB (UMEH) 7987*16kB (UMEH) 4467*32kB (UMEH) 1445*64kB (UMEH) 314*128kB (UMEH) 26*256kB (UMEH) 2*512kB (H) 0*1024kB 1*2048kB (H) 0*4096kB = 1171588kB
Jun 18 12:07:09 magallanes02 kernel: [4804375.885141] Node 1 Normal: 81378*4kB (UMEH) 13114*8kB (UMEH) 8182*16kB (UMEH) 6338*32kB (UMEH) 3794*64kB (UMEH) 2079*128kB (UMEH) 1043*256kB (UMEH) 426*512kB (UMH) 136*1024kB (UMEH) 0*2048kB 0*4096kB = 1897464kB
Jun 18 12:07:09 magallanes02 kernel: [4804375.885154] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jun 18 12:07:09 magallanes02 kernel: [4804375.885156] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun 18 12:07:09 magallanes02 kernel: [4804375.885158] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
Jun 18 12:07:09 magallanes02 kernel: [4804375.885159] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
Jun 18 12:07:09 magallanes02 kernel: [4804375.885160] 10356785 total pagecache pages
Jun 18 12:07:09 magallanes02 kernel: [4804375.885161] 453649 pages in swap cache
Jun 18 12:07:09 magallanes02 kernel: [4804375.885162] Free swap  = 557640kB
Jun 18 12:07:09 magallanes02 kernel: [4804375.885163] Total swap = 8388604kB
Jun 18 12:07:09 magallanes02 kernel: [4804375.885170] 67017593 pages RAM
Jun 18 12:07:09 magallanes02 kernel: [4804375.885171] 0 pages HighMem/MovableOnly
Jun 18 12:07:09 magallanes02 kernel: [4804375.885172] 1080404 pages reserved
Jun 18 12:07:09 magallanes02 kernel: [4804375.885172] 0 pages hwpoisoned
 

Attachments

  • gdb-210.txt
    18.4 KB · Views: 6

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!