CPU soft lockup: Watchdog: Bug: soft lockup - CPU#0 stock for 24s!

vix9

New Member
Feb 17, 2022
23
2
3
41
you need to use virtio-scsi-single to make iothread=1 to be effectively used at all. iothread=1 without virtio-scsi-single is meaningless (though it's possible to configure in the gui)

using virtio-scsi-single does not mean that you use a single virtual disk

https://qemu-devel.nongnu.narkive.com/I59Sm5TH/lock-contention-in-qemu
<snip>
I find the timeslice of vCPU thread in QEMU/KVM is unstable when there
are lots of read requests (for example, read 4KB each time (8GB in
total) from one file) from Guest OS. I also find that this phenomenon
may be caused by lock contention in QEMU layer. I find this problem
under following workload.
<snip>
Yes, there is a way to reduce jitter caused by the QEMU global mutex:

qemu -object iothread,id=iothread0 \
-drive if=none,id=drive0,file=test.img,format=raw,cache=none \
-device virtio-blk-pci,iothread=iothread0,drive=drive0

Now the ioeventfd and thread pool completions will be processed in
iothread0 instead of the QEMU main loop thread. This thread does not
take the QEMU global mutex so vcpu execution is not hindered.

This feature is called virtio-blk dataplane.
<snip>

https://forum.proxmox.com/threads/virtio-scsi-vs-virtio-scsi-single.28426/
Ahhh, I guess I only halfway understood. This helps. Thank you!
 
Jan 27, 2020
14
0
6
47
apparently, setting virtio-scsi-single & iothread & aio=threads cured all our vm freeze & hiccup issues.

i added this information to:

https://bugzilla.kernel.org/show_bug.cgi?id=199727#c8
https://bugzilla.proxmox.com/show_bug.cgi?id=1453

apparently, in ordinary/default qemu io processing, there is chances to get into larger locking conditions which block entire vm execution and thus entirely freezing the guest cpu for a while . this also explains why ping jitters that much.

when virtio-scsi-single & iothread & aio=native, ping jitter gets cured, too, but the jitter/freeze moves into the iothread instead and i'm still getting kernel traces/oopses regarding stuck processes/cpu.

adding aio=threads solves this entirely.

the following information sends some light into the whole picture, apparently the "qemu_global_mutex" can slam hard in your face and this seems to be very unknown:

https://docs.openeuler.org/en/docs/.../best-practices.html#i-o-thread-configuration

"The QEMU global lock (qemu_global_mutex) is used when VM I/O requests are processed by the QEMU main thread. If the I/O processing takes a long time, the QEMU main thread will occupy the global lock for a long time. As a result, the VM vCPU cannot be scheduled properly, affecting the overall VM performance and user experience."


i have never seen a problem again with virtio-scsi-single & iothread & aio=threads again, ping is absolutely stable with that,also ioping in VM during vm migration or virtual disk move is within reasonable range. it's slow on high io pressure, but no errors in kernel dmesg inside the guests.

i'm really curious, why this problem doesn't affect more people and why it is so hard to find information, that even proxmox folks won't give a hint into this direction (at least i didn't find one, and i searched really long and hard)

I'm still searching for some deeper information/knowledge what exactly happens in qemu/kvm and what is going on in detail that freezes for several tens of secends occur. even in qemu project detailed information on "virtio dataplane is curing vm hiccup/freezing and removing big qemu locking problem" is near to non existing. main context is "it improves performance and user experience".

anywhay, i consider this finding important enough to be added to the docs/faqs. for us, this finding is sort of essential for survival, our whole xen proxmox migration was delayed for months because of those vm hiccup/freeze issues.

what do you think @proxmox-team ?
Finally!!! Thank you!
 

vix9

New Member
Feb 17, 2022
23
2
3
41
Are you guys setting *all* of your VMs with virtio-scsi-single/iothread/threads or just the problem machines?
 

mdo

Active Member
Dec 5, 2010
34
3
28
New Zealand
i set all VMs with this params
Thank you to all involved with this thread and the research work. We have seen freezes on three systems in recent days (on PVE 7.2) which were all rock solid before. We have now applied the combination of settings that RolandK suggests (virtio-scsi-single & iothread & aio=threads) and hope for the best.
An excellent, helpful community here.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!