WSUS cleanup crashes VM

showiproute

Well-Known Member
Mar 11, 2020
601
30
48
35
Austria
Hello everyone,

I am currently struggeling with a problem on a Win 2k19 server VM.
Config: 8cores/2socket (= 16 CPUs for VM) & 32 GB RAM (fixed value).

Usually the memory within the VM is consumed by IIS & SQL Server for WSUS.

Yesterday night I started a cleanup process:
1) Checked and defraged the SQL database (worked fine)
2) Cleaned up no longer/superseded updates. As this process takes ages I left it running during the night.

Today morning I recognized that the VM crashed and not marked as "running" with PVE GUI.
Hitting the regular start button also did not do the job.

In the end I had to restart the whole physical server to get my VM back running.


I remember that I had some issues earlier before that's why I tend not to do any maintenance on my WSUS server.



Any ideas what could cause the issues or where to start debugging?
 
check the PVE task logs and the system journal from the night for pointers on what went wrong. pveversion -v and the VM config might also be helpful information.
 
Hello @fabian.
Task is what I have already checked. The start command would tell me
Code:
stopped: unexpected status
TASK ERROR: start failed: command '/usr/bin/kvm -id 109 -name WindowsServer -no-shutdown -chardev 'socket,id=qmp,path=/var/run/qemu-server/109.qmp,server=on,wait=off' -mon 'chardev=qmp,mode=control' -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' -mon 'chardev=qmp-event,mode=control' -pidfile /var/run/qemu-server/109.pid -daemonize -smbios 'type=1,uuid=4e7d194d-448c-4585-b349-766632e7df21' -drive 'if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=/tmp/109-ovmf.fd' -smp '16,sockets=2,cores=8,maxcpus=16' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vnc 'unix:/var/run/qemu-server/109.vnc,password=on' -no-hpet -cpu 'host,hv_ipi,hv_relaxed,hv_reset,hv_runtime,hv_spinlocks=0x1fff,hv_stimer,hv_synic,hv_time,hv_vapic,hv_vpindex,+kvm_pv_eoi,+kvm_pv_unhalt' -m 32768 -object 'memory-backend-ram,id=ram-node0,size=16384M' -numa 'node,nodeid=0,cpus=0-7,memdev=ram-node0' -object 'memory-backend-ram,id=ram-node1,size=16384M' -numa 'node,nodeid=1,cpus=8-15,memdev=ram-node1' -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg -device 'vmgenid,guid=fa9d8f08-4ff4-4984-83d4-e249e8a34cbd' -device 'vfio-pci,host=0000:83:10.0,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0' -device 'qxl-vga,id=vga,bus=pcie.0,addr=0x1' -chardev 'socket,path=/var/run/qemu-server/109.qga,server=on,wait=off,id=qga0' -device 'virtio-serial,id=qga0,bus=pci.0,addr=0x8' -device 'virtserialport,chardev=qga0,name=org.qemu.guest_agent.0' -device 'virtio-serial,id=spice,bus=pci.0,addr=0x9' -chardev 'spicevmc,id=vdagent,name=vdagent' -device 'virtserialport,chardev=vdagent,name=com.redhat.spice.0' -spice 'tls-port=61003,addr=127.0.0.1,tls-ciphers=HIGH,seamless-migration=on' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:958980882779' -drive 'if=none,id=drive-ide2,media=cdrom,aio=io_uring' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2' -device 'virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1' -drive 'file=/dev/pve/vm-109-disk-0,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' -device 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,rotation_rate=1' -device 'virtio-scsi-pci,id=virtioscsi1,bus=pci.3,addr=0x2' -drive 'file=/dev/zvol/Storage_14TB/vm-109-disk-1,if=none,id=drive-scsi1,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' -device 'scsi-hd,bus=virtioscsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1' -device 'virtio-scsi-pci,id=virtioscsi2,bus=pci.3,addr=0x3' -drive 'file=/dev/zvol/Storage_8TB/vm-109-disk-0,if=none,id=drive-scsi2,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' -device 'scsi-hd,bus=virtioscsi2.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi2,id=scsi2' -device 'virtio-scsi-pci,id=virtioscsi3,bus=pci.3,addr=0x4' -drive 'file=/dev/zvol/Storage_14TB/vm-109-disk-0,if=none,id=drive-scsi3,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' -device 'scsi-hd,bus=virtioscsi3.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi3,id=scsi3' -device 'virtio-scsi-pci,id=virtioscsi4,bus=pci.3,addr=0x5' -drive 'file=/dev/zvol/Storage_12TB/vm-109-disk-2,if=none,id=drive-scsi4,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' -device 'scsi-hd,bus=virtioscsi4.0,channel=0,scsi-id=0,lun=4,drive=drive-scsi4,id=scsi4' -device 'virtio-scsi-pci,id=virtioscsi5,bus=pci.3,addr=0x6' -drive 'file=/dev/zvol/RAID_Storage_4TB/vm-109-disk-1,if=none,id=drive-scsi5,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' -device 'scsi-hd,bus=virtioscsi5.0,channel=0,scsi-id=0,lun=5,drive=drive-scsi5,id=scsi5' -device 'virtio-scsi-pci,id=virtioscsi6,bus=pci.3,addr=0x7' -drive 'file=/dev/zvol/RAID_Storage_4TB/vm-109-disk-0,if=none,id=drive-scsi6,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' -device 'scsi-hd,bus=virtioscsi6.0,channel=0,scsi-id=0,lun=6,drive=drive-scsi6,id=scsi6' -device 'virtio-scsi-pci,id=virtioscsi7,bus=pci.3,addr=0x8' -drive 'file=/dev/zvol/RAID_Storage_4TB/vm-109-disk-3,if=none,id=drive-scsi7,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' -device 'scsi-hd,bus=virtioscsi7.0,channel=0,scsi-id=0,lun=7,drive=drive-scsi7,id=scsi7' -device 'virtio-scsi-pci,id=virtioscsi8,bus=pci.3,addr=0x9' -drive 'file=/dev/zvol/Storage_12TB/vm-109-disk-1,if=none,id=drive-scsi8,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' -device 'scsi-hd,bus=virtioscsi8.0,channel=0,scsi-id=0,lun=8,drive=drive-scsi8,id=scsi8' -device 'virtio-scsi-pci,id=virtioscsi9,bus=pci.3,addr=0xa' -drive 'file=/dev/zvol/Storage_12TB/vm-109-disk-0,if=none,id=drive-scsi9,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' -device 'scsi-hd,bus=virtioscsi9.0,channel=0,scsi-id=0,lun=9,drive=drive-scsi9,id=scsi9' -device 'virtio-scsi-pci,id=virtioscsi10,bus=pci.3,addr=0xb' -drive 'file=/dev/zvol/Storage_10TB/vm-109-disk-0,if=none,id=drive-scsi10,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap' -device 'scsi-hd,bus=virtioscsi10.0,channel=0,scsi-id=0,lun=10,drive=drive-scsi10,id=scsi10' -rtc 'driftfix=slew,base=localtime' -machine 'type=pc-q35-6.0+pve0' -global 'kvm-pit.lost_tick_policy=discard'' failed: got timeout


Regarding system journal - can you guide me how to open the correct one?
journalctl -xe just shows me todays morning but not the night where the issue has happened.


Regarding your other mentioned questions:
pveversion -v:
Code:
proxmox-ve: 7.0-2 (running kernel: 5.11.22-4-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-5.11: 7.0-7
pve-kernel-helper: 7.0-7
pve-kernel-5.11.22-4-pve: 5.11.22-8
pve-kernel-5.11.22-3-pve: 5.11.22-7
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-6
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-11
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
openvswitch-switch: 2.15.0+ds1-2
proxmox-backup-client: 2.0.9-2
proxmox-backup-file-restore: 2.0.9-2
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-9
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-3
pve-firmware: 3.3-1
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-13
smartmontools: 7.2-pve2
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1

VM specifications:
Code:
agent: 1
bios: ovmf
boot:
cores: 8
cpu: host
hostpci0: 83:10.0,pcie=1
ide2: none,media=cdrom
machine: pc-q35-6.0
memory: 32768
name: WindowsServer
numa: 1
onboot: 1
ostype: win10
scsi0: local-lvm:vm-109-disk-0,discard=on,size=200G,ssd=1
scsi1: Storage_14TB:vm-109-disk-1,backup=0,discard=on,size=4T
scsi10: Storage_10TB:vm-109-disk-0,backup=0,discard=on,size=7T
scsi2: Storage_8TB:vm-109-disk-0,backup=0,discard=on,size=7T
scsi3: Storage_14TB:vm-109-disk-0,backup=0,discard=on,size=6T
scsi4: Storage_12TB:vm-109-disk-2,backup=0,discard=on,size=1000G
scsi5: RAID_Storage_4TB:vm-109-disk-1,backup=0,discard=on,size=100G
scsi6: RAID_Storage_4TB:vm-109-disk-0,backup=0,discard=on,size=2T
scsi7: RAID_Storage_4TB:vm-109-disk-3,discard=on,size=70G
scsi8: Storage_12TB:vm-109-disk-1,backup=0,discard=on,size=4T
scsi9: Storage_12TB:vm-109-disk-0,backup=0,discard=on,size=4T
scsihw: virtio-scsi-single
smbios1: uuid=4e7d194d-448c-4585-b349-766632e7df21
sockets: 2
tablet: 0
vga: qxl
vmgenid: fa9d8f08-4ff4-4984-83d4-e249e8a34cbd
 
Last edited:
that shows that it fails to start, not why it stopped/crashed..
 
  • Like
Reactions: showiproute
@fabian: Okay I think I found something interessting.
According to Windows event summary (Ereignisanzeige) the VM crashed at 21:57:26 CEST.

For this specific timerange I found following within journalctl:
Code:
Sep 20 21:57:54 proxmox2 QEMU[8442]: kvm: ../util/iov.c:335: qemu_iovec_concat_iov: Assertion `soffset == 0' failed.
Sep 20 21:57:54 proxmox2 kernel:  zd16: p1 p2
Sep 20 21:57:54 proxmox2 kernel:  zd48: p1 p2
Sep 20 21:57:55 proxmox2 kernel:  zd64: p1 p2
 
For this specific timerange I found following within journalctl:
This certainly looks like a bug to me. If you're up for further debugging, could you attach 'gdb' to your VM and try to trigger it again (i.e. run your wsus cleanup workload)?

Attaching would work as follows:
Code:
apt install gdb pve-qemu-kvm-dbg   # do this beforehand

# start your VM if not already running

VMID=100 # change this!
VM_PID=$(cat /var/run/qemu-server/${VMID}.pid)

gdb attach $VM_PID -ex='handle SIGUSR1 nostop noprint pass' -ex='handle SIGPIPE nostop print pass' -ex='cont'
...and leave the terminal running (or run it within a 'screen' or 'tmux' session). Once a crash occurs, gdb will output some stuff again and give you a prompt, run thread apply all bt followed by quit in that and post the output.
 
  • Like
Reactions: showiproute
This certainly looks like a bug to me. If you're up for further debugging, could you attach 'gdb' to your VM and try to trigger it again (i.e. run your wsus cleanup workload)?

Attaching would work as follows:
Code:
apt install gdb pve-qemu-kvm-dbg   # do this beforehand

# start your VM if not already running

VMID=100 # change this!
VM_PID=$(cat /var/run/qemu-server/${VMID}.pid)

gdb attach $VM_PID -ex='handle SIGUSR1 nostop noprint pass' -ex='handle SIGPIPE nostop print pass' -ex='cont'
...and leave the terminal running (or run it within a 'screen' or 'tmux' session). Once a crash occurs, gdb will output some stuff again and give you a prompt, run thread apply all bt followed by quit in that and post the output.

Hi @Stefan_R ,

sure - I am always happy to give you a helping hand for debugging stuff.
The WSUS cleanup job is already running. Let us see if the VM crashes again.
 
  • Like
Reactions: fabian
Hello @Stefan_R & @fabian,

it crashed again - output of gdb would be:

Code:
#0  futex_abstimed_wait_cancelable (private=0, abstime=0x7f80808d14d0, clockid=-2138237856, expected=0, futex_word=0x55cfdb2b5068) at ../sysdeps/nptl/futex-internal.h:323
#1  __pthread_cond_wait_common (abstime=0x7f80808d14d0, clockid=-2138237856, mutex=0x55cfdb2b5018, cond=0x55cfdb2b5040) at pthread_cond_wait.c:520
#2  __pthread_cond_timedwait (cond=cond@entry=0x55cfdb2b5040, mutex=mutex@entry=0x55cfdb2b5018, abstime=abstime@entry=0x7f80808d14d0) at pthread_cond_wait.c:656
#3  0x000055cfdac7f3cf in qemu_sem_timedwait (sem=sem@entry=0x55cfdb2b5018, ms=ms@entry=10000) at ../util/qemu-thread-posix.c:282
#4  0x000055cfdac75814 in worker_thread (opaque=opaque@entry=0x55cfdb2b4fa0) at ../util/thread-pool.c:91
#5  0x000055cfdac7e6b9 in qemu_thread_start (args=0x7f80808d1570) at ../util/qemu-thread-posix.c:521
#6  0x00007f898b37aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00007f898b2aadef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 125 (Thread 0x7f80cb9f9700 (LWP 3740345) "kvm"):
#0  futex_abstimed_wait_cancelable (private=0, abstime=0x7f80cb9f44d0, clockid=-878754720, expected=0, futex_word=0x55cfdb2b506c) at ../sysdeps/nptl/futex-internal.h:323
#1  __pthread_cond_wait_common (abstime=0x7f80cb9f44d0, clockid=-878754720, mutex=0x55cfdb2b5018, cond=0x55cfdb2b5040) at pthread_cond_wait.c:520
#2  __pthread_cond_timedwait (cond=cond@entry=0x55cfdb2b5040, mutex=mutex@entry=0x55cfdb2b5018, abstime=abstime@entry=0x7f80cb9f44d0) at pthread_cond_wait.c:656
#3  0x000055cfdac7f3cf in qemu_sem_timedwait (sem=sem@entry=0x55cfdb2b5018, ms=ms@entry=10000) at ../util/qemu-thread-posix.c:282
#4  0x000055cfdac75814 in worker_thread (opaque=opaque@entry=0x55cfdb2b4fa0) at ../util/thread-pool.c:91
#5  0x000055cfdac7e6b9 in qemu_thread_start (args=0x7f80cb9f4570) at ../util/qemu-thread-posix.c:521
#6  0x00007f898b37aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00007f898b2aadef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 122 (Thread 0x7f80ef3fa700 (LWP 3638010) "kvm"):
#0  futex_abstimed_wait_cancelable (private=0, abstime=0x7f80ef3f54d0, clockid=-281062304, expected=0, futex_word=0x55cfdb2b5068) at ../sysdeps/nptl/futex-internal.h:323
#1  __pthread_cond_wait_common (abstime=0x7f80ef3f54d0, clockid=-281062304, mutex=0x55cfdb2b5018, cond=0x55cfdb2b5040) at pthread_cond_wait.c:520
#2  __pthread_cond_timedwait (cond=cond@entry=0x55cfdb2b5040, mutex=mutex@entry=0x55cfdb2b5018, abstime=abstime@entry=0x7f80ef3f54d0) at pthread_cond_wait.c:656
#3  0x000055cfdac7f3cf in qemu_sem_timedwait (sem=sem@entry=0x55cfdb2b5018, ms=ms@entry=10000) at ../util/qemu-thread-posix.c:282
#4  0x000055cfdac75814 in worker_thread (opaque=opaque@entry=0x55cfdb2b4fa0) at ../util/thread-pool.c:91
#5  0x000055cfdac7e6b9 in qemu_thread_start (args=0x7f80ef3f5570) at ../util/qemu-thread-posix.c:521
#6  0x00007f898b37aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00007f898b2aadef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 118 (Thread 0x7f804b7fe700 (LWP 3638003) "kvm"):
#0  futex_abstimed_wait_cancelable (private=0, abstime=0x7f804b7f94d0, clockid=1266652256, expected=0, futex_word=0x55cfdb2b5068) at ../sysdeps/nptl/futex-internal.h:323
#1  __pthread_cond_wait_common (abstime=0x7f804b7f94d0, clockid=1266652256, mutex=0x55cfdb2b5018, cond=0x55cfdb2b5040) at pthread_cond_wait.c:520
#2  __pthread_cond_timedwait (cond=cond@entry=0x55cfdb2b5040, mutex=mutex@entry=0x55cfdb2b5018, abstime=abstime@entry=0x7f804b7f94d0) at pthread_cond_wait.c:656
#3  0x000055cfdac7f3cf in qemu_sem_timedwait (sem=sem@entry=0x55cfdb2b5018, ms=ms@entry=10000) at ../util/qemu-thread-posix.c:282
#4  0x000055cfdac75814 in worker_thread (opaque=opaque@entry=0x55cfdb2b4fa0) at ../util/thread-pool.c:91
#5  0x000055cfdac7e6b9 in qemu_thread_start (args=0x7f804b7f9570) at ../util/qemu-thread-posix.c:521
#6  0x00007f898b37aea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00007f898b2aadef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 20 (Thread 0x7f81311ff700 (LWP 78089) "kvm"):
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0x55cfdc30c55c) at ../sysdeps/nptl/futex-internal.h:186
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55cfdc30c568, cond=0x55cfdc30c530) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=cond@entry=0x55cfdc30c530, mutex=mutex@entry=0x55cfdc30c568) at pthread_cond_wait.c:638
#3  0x000055cfdac7edfb in qemu_cond_wait_impl (cond=0x55cfdc30c530, mutex=0x55cfdc30c568, file=0x55cfdad626f2 "../ui/vnc-jobs.c", line=248) at ../util/qemu-thread-posix.c:174
#4  0x000055cfda87ebc3 in vnc_worker_thread_loop (queue=0x55cfdc30c530) at ../ui/vnc-jobs.c:248
--Type <RET> for more, q to quit, c to continue without paging--q
Quit
 
What is interessting in this case: I can start the VM from shell (qm start 109) without a restart of the physical server.
 
that unfortunately wasn't the full output, but just the first page, and it just shows threads waiting and not the one that caused the crash.. could you repeat, but collect the full trace?

either by paging through and copy-pasting every page of output, or by adding -ex='set logging on' -ex='set pagination off' before the -ex='cont' in the gdb command. this will log all output to the file "gdb.txt" (in addition to the terminal), and disable pagination so that long output like the backtraces are printed in one go.

thanks!
 
yes, the folder where you started gdb.
 
Okay, so the first run "unfortunately" completed without a new crash.
I will start another, more complex wsus clean up to hopefully trigger another crash.
 
I "FUBARed" my VM and restored it from PBS.
Give it another try.

If this does not helps is there any issue when leaving this gdb debug mode running @fabian ?
 
no, there shouldn't be any issue with leaving it running.
 
did you run the thread apply all bt command? the VM will hang until you run quit, but you need to collect the back traces first ;)
 
no.... I thought that output to .txt file is enough.... mea culpa.
So I'll try it again: I will use the command for the gdb.txt file -> gdb attach $VM_PID -ex='handle SIGUSR1 nostop noprint pass' -ex='handle SIGPIPE nostop print pass' -ex='set logging on' -ex='set pagination off' -ex='cont'
and as soon as the VM crashes I will use thread apply all bt and end it with quit

right?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!