CPU soft lockup: Watchdog: Bug: soft lockup - CPU#0 stock for 24s!

zhoid

Member
Sep 4, 2019
24
0
21
41
Good Day,

I have Ubuntu VM's locking up recently, VM's are on different hosts in the cluster with warning/error message on the console when I open

Screenshot 2021-02-13 at 10.58.45.png

There are VM's with this message but are not experiencing any issues.

Screenshot 2021-02-14 at 09.32.09.png

These errors are not appearing in the host syslog or anything that could indicate a problem regarding the CPU

Screenshot 2021-02-14 at 09.38.30.png

One of the hosts with VM's with the issue appears to be normal.

I only have one customer whereby he's VM's locked up on more than 1 occasion but he's using ubuntu 18.x.x LTS

I am not sure if this is an issue on the Host or VM any guidance on where to start looking would be a great help.

Thanks

Zaid
 
I am not sure if this is an issue on the Host or VM any guidance on where to start looking would be a great help.
I mean it can be a combination of those, but if this started only happening now it may be a an issue from the VMs kernel as the PVE 5.4 got not kernel update since 6-7 months.

From your screenshot of PVE summary panel I can see that this node is still on Proxmox VE 5.x which went End-of-Life (EOL) at end of July 2020.
https://pve.proxmox.com/pve-docs/chapter-pve-faq.html#faq-support-table

Please upgrade soon to get many new bug and security fixes and to run a supported system again:
https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0
 
I mean it can be a combination of those, but if this started only happening now it may be a an issue from the VMs kernel as the PVE 5.4 got not kernel update since 6-7 months.

From your screenshot of PVE summary panel I can see that this node is still on Proxmox VE 5.x which went End-of-Life (EOL) at end of July 2020.
https://pve.proxmox.com/pve-docs/chapter-pve-faq.html#faq-support-table

Please upgrade soon to get many new bug and security fixes and to run a supported system again:
https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0
Thanks for the feedback Thomas but yes this only started recently and it's only isolated to this one customer's Ubuntu 18.x.x VM's

No error messages or VM's on the host, I am not sure how this is related to 5.x not receiving kernel updates for 6-7 months?

Thanks

Zaid
 
No error messages or VM's on the host, I am not sure how this is related to 5.x not receiving kernel updates for 6-7 months?
Normally, if some issue pops up it is due to some change in the environment. With issues inside VMs this can have a few triggers, but the most common ones are:
  1. PVE host QEMU
  2. PVE host kernel
  3. Guest kernel

As you use PVE 5.x and we stopped providing updates for that version in July 2020 the first two points did not change since quite a while, so I deduced that it was probably a change of the third point which caused this to start to happen - that's the relation between not getting updates and a probable cause in my eyes.
 
Last edited:
This normally just means that the PVE host had a (way) too high load, CPU or IO wise.

I can observe such things when running very intensive compile tasks on an already loaded host, there's not much one can do besides increasing resources available or limiting the high load.
 
>This normally just means that the PVE host had a (way) too high load, CPU or IO wise.

hello @Thomas Lamprecht ,

i can confirm that io load plays an essential role here, but are you sure that cpu load alone may trigger this ?

so far i have been unable to reproduce any noticeable VM hiccup or even lockups by overloading CPU ressources, not even when pushing loadavg above 50 on the host and inside a VM.
 
Last edited:
apparently, setting virtio-scsi-single & iothread & aio=threads cured all our vm freeze & hiccup issues.

i added this information to:

https://bugzilla.kernel.org/show_bug.cgi?id=199727#c8
https://bugzilla.proxmox.com/show_bug.cgi?id=1453

apparently, in ordinary/default qemu io processing, there is chances to get into larger locking conditions which block entire vm execution and thus entirely freezing the guest cpu for a while . this also explains why ping jitters that much.

when virtio-scsi-single & iothread & aio=native, ping jitter gets cured, too, but the jitter/freeze moves into the iothread instead and i'm still getting kernel traces/oopses regarding stuck processes/cpu.

adding aio=threads solves this entirely.

the following information sends some light into the whole picture, apparently the "qemu_global_mutex" can slam hard in your face and this seems to be very unknown:

https://docs.openeuler.org/en/docs/.../best-practices.html#i-o-thread-configuration

"The QEMU global lock (qemu_global_mutex) is used when VM I/O requests are processed by the QEMU main thread. If the I/O processing takes a long time, the QEMU main thread will occupy the global lock for a long time. As a result, the VM vCPU cannot be scheduled properly, affecting the overall VM performance and user experience."


i have never seen a problem again with virtio-scsi-single & iothread & aio=threads again, ping is absolutely stable with that,also ioping in VM during vm migration or virtual disk move is within reasonable range. it's slow on high io pressure, but no errors in kernel dmesg inside the guests.

i'm really curious, why this problem doesn't affect more people and why it is so hard to find information, that even proxmox folks won't give a hint into this direction (at least i didn't find one, and i searched really long and hard)

I'm still searching for some deeper information/knowledge what exactly happens in qemu/kvm and what is going on in detail that freezes for several tens of secends occur. even in qemu project detailed information on "virtio dataplane is curing vm hiccup/freezing and removing big qemu locking problem" is near to non existing. main context is "it improves performance and user experience".

anywhay, i consider this finding important enough to be added to the docs/faqs. for us, this finding is sort of essential for survival, our whole xen proxmox migration was delayed for months because of those vm hiccup/freeze issues.

what do you think @proxmox-team ?
 
anywhay, i consider this finding important enough to be added to the docs/faqs. for us, this finding is sort of essential for survival, our whole xen proxmox migration was delayed for months because of those vm hiccup/freeze issues.
Yes, it may deserve some slight addition to the existing paragraph in PVE's VM documentation.

FWIW, we knew about the general behavior of IO thread, i.e., that doing potentially blocking IO related stuff in the main thread can block other things, e.g., QMP processing before it was done out of band (pre 6.0 IIRC?) and the biggest reason that we did not promote IO Thread more is that it was plagued for some issues for a long time. Some general bugs in QEMU and some also with backup, partially in our clean-state backup implementation for QEMU, that stemmed mostly from the fact that doing complex co-routine (async) model paired & mixed with arbitrary threads, that QEMU uses, is hard to get right, especially in such relatively unsafe programming languages like C.

With QEMU 5.0 (PVE 6.2) most of that got fixed, and with QEMU 5.1 (PVE 6.3) the remaining parts got wrinkled out and made more robust, since then using IO Thread is deemed as quite stable internally, and we run explicit tests on it to check for known regressions on every QEMU release, but with the history it had in PVE it was naturally something we wanted to observe for a while before recommending in general, or even enabling it by default. Besides that, we did not observe the specific effects you do, iow., while this is probably a quite ok remedy and releases enough pressure for quite a few cases like yours, it cannot fix all underlying causes the hangs can still happen on an IOThread enabled system due to way too much IO pressure, but that's probably rare and noticed in other ways too, unlike your symptoms.

IMO your post is a good reminder that we should reevaluate conveying the usefulness and stability of IOThreads in some situations and extend recommendation and information in the docs about it, so thanks for sticking with that issue and relaying your observations here!

what do you think @proxmox-team ?
Note that neither this account nor the one you mentioned in the older, now removed post, is me or otherwise a Proxmox affiliated account (albeit their naming is definitively unfortunate, I'll be looking into resolving that).
 
  • Like
Reactions: RolandK
wow, thanks for that long and in depth reply and for the insider information.

regarding the @mention - i just didn't notice, that @prox... does mention a 3rd party user account, i didn't want to use @mentioning at all
 
Testing the IO Thread/Virtio Single SCSI/Threads solution today. After we had upgraded our 5 node Cluster to 7.1 from 6, we noticed that backups on a VM living on Node 1 would start locking up a machine on Node 3. The only common thread was they shared network storage. The backup setup had not changed in 6 months or so, to it was puzzling why the 7.1 upgrade would cause this issue. It seems to make sense now.
 
Last edited:
Testing the IO Thread/Virtio Single SCSI/Threads solution today. After we had upgraded our 5 node Cluster to 7.1 from 6, we noticed that backups on a VM living on Node 1 would start locking up a machine on Node 3. The only common thread was they shared network storage. The backup setup had not changed in 6 months or so, to it was puzzling why the 7.1 upgrade would cause this issue. It seems to make sense now.
Note here though that the IOThread behavior stayed the same between QEMU versions in PVE 6.x and 7.x, what changed though is going for io_uring as default async-IO backend in PVE 7.0, at least if not on LVM or RBD storage types (which had initially some issues due to a kernel bug). You can override this per VM disk in the advanced settings when editing one.

Just mentioning this as the regression RolandK sees without IO Threads happened already with PVE 6.4 IIRC, so it may be two different underlying causes. But moving the IO in a separate thread can certainly avoid some hang potential in the main VM CPU thread, and as it can be switched on/off relatively easily it should be worth a shot.
 
Note here though that the IOThread behavior stayed the same between QEMU versions in PVE 6.x and 7.x, what changed though is going for io_uring as default async-IO backend in PVE 7.0, at least if not on LVM or RBD storage types (which had initially some issues due to a kernel bug). You can override this per VM disk in the advanced settings when editing one.

Just mentioning this as the regression RolandK sees without IO Threads happened already with PVE 6.4 IIRC, so it may be two different underlying causes. But moving the IO in a separate thread can certainly avoid some hang potential in the main VM CPU thread, and as it can be switched on/off relatively easily it should be worth a shot.
The VMs are all configured with shared NFS storage. On 6.x, we had zero issues with our PBS (or even backup to another NFS share). It looks like the lockup issue is mostly mitigated with the switch to io_thread and async to threads. I had tried before I found this thread swapping the async to threads without enabling io_thread, and the issue seemed to persist. For us, it was a three node MySQL cluster that would have one node fall back minutes during a backup. Whatever node was falling back would always be one that shared storage with the machine that was currently being backed up.
 
  • Like
Reactions: timonych
The VMs are all configured with shared NFS storage. On 6.x, we had zero issues with our PBS (or even backup to another NFS share). It looks like the lockup issue is mostly mitigated with the switch to io_thread and async to threads. I had tried before I found this thread swapping the async to threads without enabling io_thread, and the issue seemed to persist. For us, it was a three node MySQL cluster that would have one node fall back minutes during a backup. Whatever node was falling back would always be one that shared storage with the machine that was currently being backed up.

Well, the switch to io_thread/async Threads did not mitigate the problem entirely. Machine lockups are still happening. Interesting note though. The worst of the issues comes when we are backing machines running CentOS. This may be entirely coincidental, but we seem to have issues any time we have heavy IO on a CentOS disk. Just for a test, I migrated our Mattermost server from a CentOS VM to an Ubuntu 20 VM. Backing up the older CentOS machine would cause lockups on other machines using the same shared storage. When I migrated the machine in full to Ubuntu, the lockups ceased to happen. I know there are other factors (brand new QCOW2 disk for one) but I would be curious to hear if others had a similar experience.
 
>Well, the switch to io_thread/async Threads did not mitigate the problem entirely

for me, it did. mind that "virtio scsi single" makes the difference....
 
>Well, the switch to io_thread/async Threads did not mitigate the problem entirely

for me, it did. mind that "virtio scsi single" makes the difference....
I've changed some VMs to virtio scsi single, but not all. Maybe a project for next week.
 
>Well, the switch to io_thread/async Threads did not mitigate the problem entirely

for me, it did. mind that "virtio scsi single" makes the difference....
Can you help explain why a machine with a single virtual disk would benefit from virtio scsi single?
 
you need to use virtio-scsi-single to make iothread=1 to be effectively used at all. iothread=1 without virtio-scsi-single is meaningless (though it's possible to configure in the gui)

using virtio-scsi-single does not mean that you use a single virtual disk

https://qemu-devel.nongnu.narkive.com/I59Sm5TH/lock-contention-in-qemu
<snip>
I find the timeslice of vCPU thread in QEMU/KVM is unstable when there
are lots of read requests (for example, read 4KB each time (8GB in
total) from one file) from Guest OS. I also find that this phenomenon
may be caused by lock contention in QEMU layer. I find this problem
under following workload.
<snip>
Yes, there is a way to reduce jitter caused by the QEMU global mutex:

qemu -object iothread,id=iothread0 \
-drive if=none,id=drive0,file=test.img,format=raw,cache=none \
-device virtio-blk-pci,iothread=iothread0,drive=drive0

Now the ioeventfd and thread pool completions will be processed in
iothread0 instead of the QEMU main loop thread. This thread does not
take the QEMU global mutex so vcpu execution is not hindered.

This feature is called virtio-blk dataplane.
<snip>

https://forum.proxmox.com/threads/virtio-scsi-vs-virtio-scsi-single.28426/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!