WSUS cleanup crashes VM

fabian · Sep 22, 2021

yes, exactly.

showiproute · Sep 24, 2021

After I restored the VM from backup it won't crash during ongoing high load workloads.
I will continue monitoring it.

shallax · Sep 28, 2021

Just thought I'd add that I'm also receiving this issue with a Linux guest when running cryptsetup-reencrypt to encrypt my disks. It doesn't always happen, though. However, I have had it occur on multiple nodes (running AMD Ryzen CPUs - one with ECC RAM, so I'm hoping it's a software issue rather than hardware)

Code:

Sep 28 09:39:38 node QEMU[159155]: kvm: ../util/iov.c:335: qemu_iovec_concat_iov: Assertion `soffset == 0' failed.

Code:

pveversion -v
proxmox-ve: 7.0-2 (running kernel: 5.11.22-4-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-5.11: 7.0-7
pve-kernel-helper: 7.0-7
pve-kernel-5.11.22-4-pve: 5.11.22-8
pve-kernel-5.11.22-3-pve: 5.11.22-7
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph-fuse: 15.2.13-pve1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-6
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-11
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.9-2
proxmox-backup-file-restore: 2.0.9-2
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-9
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-3
pve-firmware: 3.3-1
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.0.0-4
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-13
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1

Code:

balloon: 2048
boot: order=net0
cores: 8
cpu: kvm64
memory: 8192
name: MyVm
net0: virtio=xx:xx:xx:xx:xx:xx,bridge=vmbr200
numa: 0
onboot: 1
ostype: l26
runningcpu: kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep
runningmachine: pc-i440fx-6.0+pve0
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=xxx
snaptime: 1632818570
sockets: 1
startup: order=6
virtio0: hdd-img:600/vm-600-disk-0.qcow2,discard=on,size=16G
virtio1: hdd-img:600/vm-600-disk-1.qcow2,discard=on,size=16G
virtio2: hdd-img:600/vm-600-disk-2.qcow2,discard=on,size=128G
vmgenid: xxx
vmstate: hdd-img:600/vm-600-state-vdbdone.raw

fabian · Sep 28, 2021

@shallax if you can trigger it semi-reliably, the backtraces as described earlier in this thread would still help in finding the actual bug!

shallax · Sep 28, 2021

I'm on it, chief! Giving it my best with about 10 VMs to encrypt. I'll let you know how it goes.

shallax · Sep 28, 2021

Interesting, different crash this time:

Code:

Sep 28 10:17:24 node QEMU[171446]: kvm: ../softmmu/physmem.c:3193: address_space_unmap: Assertion `mr != NULL' failed.

Code:

root@node:~/crash# cat gdb.txt
Continuing.

Thread 1 "kvm" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

Thread 11 (Thread 0x7f57357ff700 (LWP 171483) "kvm"):
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0x5638deccdfec) at ../sysdeps/nptl/futex-internal.h:186
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x5638deccdff8, cond=0x5638deccdfc0) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=cond@entry=0x5638deccdfc0, mutex=mutex@entry=0x5638deccdff8) at pthread_cond_wait.c:638
#3  0x00005638db3eadfb in qemu_cond_wait_impl (cond=0x5638deccdfc0, mutex=0x5638deccdff8, file=0x5638db4ce6f2 "../ui/vnc-jobs.c", line=248) at ../util/qemu-thread-posix.c:174
#4  0x00005638dafeabc3 in vnc_worker_thread_loop (queue=0x5638deccdfc0) at ../ui/vnc-jobs.c:248
#5  0x00005638dafeb888 in vnc_worker_thread (arg=arg@entry=0x5638deccdfc0) at ../ui/vnc-jobs.c:361
#6  0x00005638db3ea6b9 in qemu_thread_start (args=0x7f57357fa3f0) at ../util/qemu-thread-posix.c:521
#7  0x00007f594f881ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#8  0x00007f594f7b1def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 10 (Thread 0x7f5737dff700 (LWP 171481) "kvm"):
#0  0x00007f594f7a8cc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x00005638db278ac7 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5638dd99d9c0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2  0x00005638db278c31 in kvm_cpu_exec (cpu=cpu@entry=0x5638dd99d9c0) at ../accel/kvm/kvm-all.c:2467
#3  0x00005638db216dad in kvm_vcpu_thread_fn (arg=arg@entry=0x5638dd99d9c0) at ../accel/kvm/kvm-accel-ops.c:49
#4  0x00005638db3ea6b9 in qemu_thread_start (args=0x7f5737dfa3f0) at ../util/qemu-thread-posix.c:521
#5  0x00007f594f881ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f594f7b1def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 9 (Thread 0x7f593ccf6700 (LWP 171480) "kvm"):
#0  0x00007f594f7a8cc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x00005638db278ac7 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5638dd98fea0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2  0x00005638db278c31 in kvm_cpu_exec (cpu=cpu@entry=0x5638dd98fea0) at ../accel/kvm/kvm-all.c:2467
#3  0x00005638db216dad in kvm_vcpu_thread_fn (arg=arg@entry=0x5638dd98fea0) at ../accel/kvm/kvm-accel-ops.c:49
#4  0x00005638db3ea6b9 in qemu_thread_start (args=0x7f593ccf13f0) at ../util/qemu-thread-posix.c:521
#5  0x00007f594f881ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f594f7b1def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 8 (Thread 0x7f593d4f7700 (LWP 171479) "kvm"):
#0  0x00007f594f7a8cc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x00005638db278ac7 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5638dd982380, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2  0x00005638db278c31 in kvm_cpu_exec (cpu=cpu@entry=0x5638dd982380) at ../accel/kvm/kvm-all.c:2467
#3  0x00005638db216dad in kvm_vcpu_thread_fn (arg=arg@entry=0x5638dd982380) at ../accel/kvm/kvm-accel-ops.c:49
#4  0x00005638db3ea6b9 in qemu_thread_start (args=0x7f593d4f23f0) at ../util/qemu-thread-posix.c:521
#5  0x00007f594f881ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f594f7b1def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 7 (Thread 0x7f593dcf8700 (LWP 171478) "kvm"):
#0  0x00007f594f7a8cc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x00005638db278ac7 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5638dd9748b0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2  0x00005638db278c31 in kvm_cpu_exec (cpu=cpu@entry=0x5638dd9748b0) at ../accel/kvm/kvm-all.c:2467
#3  0x00005638db216dad in kvm_vcpu_thread_fn (arg=arg@entry=0x5638dd9748b0) at ../accel/kvm/kvm-accel-ops.c:49
#4  0x00005638db3ea6b9 in qemu_thread_start (args=0x7f593dcf33f0) at ../util/qemu-thread-posix.c:521
#5  0x00007f594f881ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f594f7b1def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7f593e4f9700 (LWP 171477) "kvm"):
#0  0x00007f594f7a8cc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x00005638db278ac7 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5638dd966d10, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2  0x00005638db278c31 in kvm_cpu_exec (cpu=cpu@entry=0x5638dd966d10) at ../accel/kvm/kvm-all.c:2467
#3  0x00005638db216dad in kvm_vcpu_thread_fn (arg=arg@entry=0x5638dd966d10) at ../accel/kvm/kvm-accel-ops.c:49
#4  0x00005638db3ea6b9 in qemu_thread_start (args=0x7f593e4f43f0) at ../util/qemu-thread-posix.c:521
#5  0x00007f594f881ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f594f7b1def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7f593ecfa700 (LWP 171476) "kvm"):
#0  0x00007f594f7a8cc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x00005638db278ac7 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5638dd9594d0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2  0x00005638db278c31 in kvm_cpu_exec (cpu=cpu@entry=0x5638dd9594d0) at ../accel/kvm/kvm-all.c:2467
#3  0x00005638db216dad in kvm_vcpu_thread_fn (arg=arg@entry=0x5638dd9594d0) at ../accel/kvm/kvm-accel-ops.c:49
#4  0x00005638db3ea6b9 in qemu_thread_start (args=0x7f593ecf53f0) at ../util/qemu-thread-posix.c:521
#5  0x00007f594f881ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f594f7b1def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7f593f4fb700 (LWP 171475) "kvm"):
#0  0x00007f594f7a8cc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x00005638db278ac7 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5638dd94afa0, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2  0x00005638db278c31 in kvm_cpu_exec (cpu=cpu@entry=0x5638dd94afa0) at ../accel/kvm/kvm-all.c:2467
#3  0x00005638db216dad in kvm_vcpu_thread_fn (arg=arg@entry=0x5638dd94afa0) at ../accel/kvm/kvm-accel-ops.c:49
#4  0x00005638db3ea6b9 in qemu_thread_start (args=0x7f593f4f63f0) at ../util/qemu-thread-posix.c:521
#5  0x00007f594f881ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f594f7b1def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f593fcfc700 (LWP 171474) "kvm"):
#0  0x00007f594f7a8cc7 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1  0x00005638db278ac7 in kvm_vcpu_ioctl (cpu=cpu@entry=0x5638dd916610, type=type@entry=44672) at ../accel/kvm/kvm-all.c:2630
#2  0x00005638db278c31 in kvm_cpu_exec (cpu=cpu@entry=0x5638dd916610) at ../accel/kvm/kvm-all.c:2467
#3  0x00005638db216dad in kvm_vcpu_thread_fn (arg=arg@entry=0x5638dd916610) at ../accel/kvm/kvm-accel-ops.c:49
#4  0x00005638db3ea6b9 in qemu_thread_start (args=0x7f593fcf73f0) at ../util/qemu-thread-posix.c:521
#5  0x00007f594f881ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f594f7b1def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f5945287700 (LWP 171447) "kvm"):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00005638db3eb64a in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /build/pve-qemu/pve-qemu-kvm-6.0.0/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x5638db97c148 <rcu_call_ready_event>) at ../util/qemu-thread-posix.c:460
#3  0x00005638db41580a in call_rcu_thread (opaque=opaque@entry=0x0) at ../util/rcu.c:258
#4  0x00005638db3ea6b9 in qemu_thread_start (args=0x7f59452823f0) at ../util/qemu-thread-posix.c:521
#5  0x00007f594f881ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6  0x00007f594f7b1def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f59453f11c0 (LWP 171446) "kvm"):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f594f6d9537 in __GI_abort () at abort.c:79
#2  0x00007f594f6d940f in __assert_fail_base (fmt=0x7f594f842128 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5638db52b0bd "mr != NULL", file=0x5638db55e24e "../softmmu/physmem.c", line=3193, function=<optimized out>) at assert.c:92
#3  0x00007f594f6e8662 in __GI___assert_fail (assertion=assertion@entry=0x5638db52b0bd "mr != NULL", file=file@entry=0x5638db55e24e "../softmmu/physmem.c", line=line@entry=3193, function=function@entry=0x5638db55eec0 <__PRETTY_FUNCTION__.3> "address_space_unmap") at assert.c:101
#4  0x00005638db26e697 in address_space_unmap (as=as@entry=0x5638db976780 <address_space_memory>, buffer=<optimized out>, len=<optimized out>, is_write=is_write@entry=false, access_len=0) at ../softmmu/physmem.c:3193
#5  0x00005638db1b7b52 in dma_memory_unmap (access_len=<optimized out>, dir=DMA_DIRECTION_TO_DEVICE, len=<optimized out>, buffer=<optimized out>, as=0x5638db976780 <address_space_memory>) at /build/pve-qemu/pve-qemu-kvm-6.0.0/include/sysemu/dma.h:226
#6  virtqueue_unmap_sg (elem=elem@entry=0x5638dd513e60, len=len@entry=1310721, vq=<optimized out>, vq=<optimized out>) at ../hw/virtio/virtio.c:692
#7  0x00005638db1b97d6 in virtqueue_fill (vq=vq@entry=0x5638dec02700, elem=0x5638dd513e60, len=1310721, idx=idx@entry=0) at ../hw/virtio/virtio.c:845
#8  0x00005638db1b9c59 in virtqueue_push (vq=0x5638dec02700, elem=elem@entry=0x5638dd513e60, len=<optimized out>) at ../hw/virtio/virtio.c:919
#9  0x00005638db1e4d58 in virtio_blk_req_complete (req=req@entry=0x5638dd513e60, status=status@entry=0 '\000') at ../hw/block/virtio-blk.c:85
#10 0x00005638db1e52af in virtio_blk_rw_complete (opaque=<optimized out>, ret=0) at ../hw/block/virtio-blk.c:152
#11 0x00005638db3304a8 in blk_aio_complete (acb=0x5638ddeb92e0) at ../block/block-backend.c:1412
#12 blk_aio_complete (acb=0x5638ddeb92e0) at ../block/block-backend.c:1409
#13 blk_aio_read_entry (opaque=0x5638ddeb92e0) at ../block/block-backend.c:1466
#14 0x00005638db418cf3 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:173
#15 0x00007f594f704d40 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#16 0x00007fff72f91220 in ?? ()
#17 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7f5704e4f000
Detaching from program: /usr/bin/qemu-system-x86_64, process 171446
[Inferior 1 (process 171446) detached]

shallax · Sep 28, 2021

Another one (see attached)

Anecdotally, it seems like the crashes are more likely to occur if more than one VM is doing a high amount of IO at a time.

shallax · Sep 28, 2021

That's all I can get for now.

fabian · Sep 28, 2021

thanks - the trigger seems to be similar in both cases (specific, heavy IO load), and the fallout indicates something going wrong rather severely. we'll see if we can reproduce it with the re-encrypt work load.

showiproute · Sep 28, 2021

@shallax : It should not be hardware related as I am using a refurbished Supermicro server:
16 x Intel(R) Xeon(R) CPU E5-2637 v3 @ 3.50GHz (2 Sockets)
128 GB DDR3 ECC RAM
mostly 3.5 " SATA disks + NVMe cache
1x NVMe SSD which also contains the PVE OS.

showiproute · Sep 28, 2021

@fabian : I'll keep an eye on that as well if I run my next WSUS cleanup session at beginning of October.

fabian · Sep 30, 2021

no luck reproducing so far - so please keep an eye out for further instances and attempt to collect more data! thanks

shallax · Sep 30, 2021

To try and get you some crashes, I've set up 3 VMs on which I run cryptsetup-reencrypt in parallel. I've had a segfault in one so far, but the stack trace isn't so useful (though it does appear to be within code handling IO). See attached.

fabian · Sep 30, 2021

could you give more details on how exactly you set up these test VMs? which OS, which commands do you run?

shallax · Sep 30, 2021

Sure, unfortunately you probably won't like the answer as it's a fairly bespoke setup based on a Gentoo kernel (Linux xxxx 5.10.61-gentoo #2 SMP Tue Sep 28 15:36:05 BST 2021 x86_64 Common KVM processor GenuineIntel GNU/Linux)

I'm actually in an homemade initramfs that contains statically compiled binaries for cryptsetup and not much else. This is so that I can encrypt the rootfs of the VM before it's mounted. The scripts are basic:

For swap:

Code:

#!/bin/sh

DEV=${1}

read -p "Totally destroy and encrypt device ${DEV} for LUKS swap? (y/n)> " doContinue

if [ ${doContinue} == "y" ]; then
  DEVNODE=`echo ${DEV} | sed "s/^.*\///"`
  echo "Encrypting..."
  echo "mypassword" | cryptsetup-reencrypt --new ${DEV} --reduce-device-size 32MiB
  echo "Opening..."
  echo "mypassword" | cryptsetup --allow-discards --persistent luksOpen ${DEV} decrypt${DEVNODE}

  NEW_DEV=`echo ${DEV} | sed "s/^.*\//\/dev\/mapper\/decrypt/"`
  echo "Creating swap at ${NEW_DEV}..."
  mkswap ${NEW_DEV}
fi

For ext4 partitions:

Code:

#!/bin/sh

DEV=${1}

read -p "Encrypt device ${DEV} for LUKS ext4 partition? (y/n)> " doContinue

if [ ${doContinue} == "y" ]; then
  cd /
  umount ${DEV} 2> /dev/null
  DEVNODE=`echo ${DEV} | sed "s/^.*\///"`

  echo "Checking file system..."
  e2fsck.static -f ${DEV}
  echo "Shrinking file system to minimum size..."
  resize2fs.static -M ${DEV}

  echo "Encrypting..."
  echo "mypassword" | cryptsetup-reencrypt --new ${DEV} --reduce-device-size 32MiB

  echo "Opening..."
  echo "mypassword" | cryptsetup --allow-discards --persistent luksOpen ${DEV} decrypt${DEVNODE}

  NEW_DEV=`echo ${DEV} | sed "s/^.*\//\/dev\/mapper\/decrypt/"`
  echo "Re-checking file system at ${NEW_DEV}..."
  e2fsck.static -f ${NEW_DEV}
  echo "Enlarging file system to maximum size ${NEW_DEV}..."
  resize2fs.static ${NEW_DEV}
fi

The swap device is 2GiB in size, the ext4 device is 64GiB. Both are attached as VirtIO block devices (rather than on a SCSI controller). There's no partition table on either device, so I'm using the whole block device for the file system/ swap.

Anything else you'd specifically find useful?

eider · Oct 13, 2021

Observing similar issue on two VMs, one with Windows 10, other with Windows Server 2019. I can reliably attempt to trigger it (with some degree of chance) by opening Chrome/Chromium/Edge, even on a completely idle system. This has definitely started happening only after upgrade to PVE 7.

Observed issues so far:

Code:

Sep 29 01:58:02 riko QEMU[3534840]: kvm: ../util/iov.c:335: qemu_iovec_concat_iov: Assertion `soffset == 0' failed.
Oct  3 18:38:55 riko kernel: [1715763.644247] kvm[43833]: segfault at ffffffffffffffe4 ip 0000558b1defd4e6 sp 00007f09a16f95f0 error 7 in qemu-system-x86_64[558b1dab9000+532000]
Oct 11 12:31:34 riko QEMU[731068]: kvm: ../block/aio_task.c:42: aio_task_co: Assertion `pool->busy_tasks < pool->max_busy_tasks' failed.
Oct 11 18:42:34 riko QEMU[2092231]: kvm: ../util/iov.c:335: qemu_iovec_concat_iov: Assertion `soffset == 0' failed.
Oct 13 00:19:13 riko QEMU[750212]: kvm: ../util/iov.c:335: qemu_iovec_concat_iov: Assertion `soffset == 0' failed.

I do not think this is necessarily related to heavy I/O, however it is possible heavy I/O increases chance of it by sheer chance.

The Server 2019 VM is running Veeam B&R server and handles quite a lot of I/O daily however it never crashes when doing so. The only observable crashes there were caused by opening Edge and outside of I/O workload hours. The other VM is workstation and generally idles with single OpenVPN tunnel open and RDP connection open. It tends to crash on its own from time to time for unknown reasons despite doing nothing however the majority of crashes occur when opening Chrome.

Attaching first GDB log I gathered for now, however here is interesting part:

Code:

Thread 1 "kvm" received signal SIGSEGV, Segmentation fault.
qemu_coroutine_entered (co=0x1) at ../util/qemu-coroutine.c:198
198     ../util/qemu-coroutine.c: No such file or directory.

called from

Code:

Thread 1 (Thread 0x7fda57d47040 (LWP 2212172) "kvm"):
#0  qemu_coroutine_entered (co=0x1) at ../util/qemu-coroutine.c:198
#1  0x000055d78438ed65 in luring_process_completions (s=s@entry=0x55d7850228c0) at ../block/io_uring.c:217
#2  0x000055d78438f128 in ioq_submit (s=0x55d7850228c0) at ../block/io_uring.c:263
#3  0x000055d78438f2b0 in qemu_luring_completion_cb () at ../block/io_uring.c:276
#4  0x000055d78444927a in aio_dispatch_handler (ctx=ctx@entry=0x55d784da6210, node=0x55d7850229a0) at ../util/aio-posix.c:329
#5  0x000055d7844499b2 in aio_dispatch_handlers (ctx=0x55d784da6210) at ../util/aio-posix.c:372
#6  aio_dispatch (ctx=0x55d784da6210) at ../util/aio-posix.c:382
#7  0x000055d78446a36e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at ../util/async.c:306
#8  0x00007fda63898e6b in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
#9  0x000055d78445a890 in glib_pollfds_poll () at ../util/main-loop.c:231
#10 os_host_main_loop_wait (timeout=0) at ../util/main-loop.c:254
#11 main_loop_wait (nonblocking=nonblocking@entry=0) at ../util/main-loop.c:530
#12 0x000055d7842a6b91 in qemu_main_loop () at ../softmmu/runstate.c:725
#13 0x000055d783f67c0e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at ../softmmu/main.c:50

I will reboot this node to apply new kernel update in case missing symbols are caused by that.

Stefan_R · Oct 14, 2021

Anyone who can reproduce it, could you try setting all disks to use aio=native and attempt to trigger it again?

That is, edit your config and add the above line at the end of every disk, e.g. from '100.conf.txt' in the post above:

Code:

scsi0: local:100/vm-100-disk-0.qcow2,discard=on,iothread=1,size=128G,ssd=1,aio=native
scsi1: /dev/mapper/raid-veeam,backup=0,iothread=1,size=4T,aio=native

shallax · Oct 14, 2021

I'll try this now and report back today.

showiproute · Oct 14, 2021

Will also give it a try.

eider · Oct 14, 2021

Stefan_R said:
Anyone who can reproduce it, could you try setting all disks to use aio=native and attempt to trigger it again?

I had applied this setting on 109 shortly after writing the initial post as I remembered the io_uring from changelog. So far I have been unable to reproduce this. Will continue observing this for next few days and report back.

WSUS cleanup crashes VM

Proxmox Staff Member

Renowned Member

Member

Proxmox Staff Member

Member

Member

Member

Attachments

Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

Member

Attachments

Proxmox Staff Member

Member

Well-Known Member

Attachments

Proxmox Retired Staff

Member

Renowned Member

Well-Known Member

We value your privacy