rcu_sched self-detected stall on CPU

signalcodec · Apr 13, 2020

I'm getting "rcu_sched self-detected stall on CPU" errors in every VM at startup, and then if the VM's are under a heavy load, the proxmox host machine will also lockup and begin saying the same message.

I'm not sure even how to figure out my root cause.

Numerous google have found similar complaints but from much much older versions of the kernel .

help?

wolfgang · Apr 17, 2020

Hi,

what kernel do you use and which CPU?
Is this an NUMA machine?

signalcodec · Apr 18, 2020

5.3.10-1-pve, 2x Intel Xeon E5-2670
And I'm not sure if it's a NUMA, not sure how to check. It's a dell R720 if that helps.

Thank you for replying, I appreciate it.

wolfgang · Apr 20, 2020

Yes, you have a NUMA (Non-Uniform memory access).
All multi Soket Sytems are NUMA.

I do not find any bug related to your description in the log.
Try the following.
Ensure that all power savings are disabled in the BIOS.
Update to Kernel 5.4.

Code:

apt install pve-kernel-5.4

signalcodec · Apr 24, 2020

So I'm now on 5.4.27.1-pve, checked, but I've got no power savings enabled. Problem hasn't gone away

Is there anything I can do my end to help find a particular cause of this?

wolfgang · Apr 24, 2020

What do use as vCPU type?

signalcodec · Apr 24, 2020

The vCPU type is Default (kvm64)

I have no flags enabled either; here's an example from a VM:

wolfgang · Apr 27, 2020

Please try vCPU Type "host".
Or set it to the oldest model in your cluster if it is a cluster with different Host CPUs.
This is necessary on a mixed Cluster to allow live migration.

signalcodec · Apr 30, 2020

Sorry for the few day delay, it took me a few days to get a chance to change them.

So, I set them all to 'host' like you suggested but i'm still getting the same issue:

I also want to clarify I only have 1 host, so I don't have a cluster setup. Just 1 host with 2 CPUs.

TheAnimaL · Apr 30, 2020

Same issue here with 6.1-8 on pve-kernel-5.3.18-3-pve and pve-kernel-5.4.27-1-pve on a i7-3720 CPU.
Tried vCPU kvm64, qemu64 and host. All had the same lock up issue :-(
I do have a mixed cluster, with AMD and Intel CPU's, but this also happens when spinning up a new vm on the i7-3720 host.

TheAnimaL · Apr 30, 2020

Update: I reinstalled the machine ( this time not with root on ZFS ) and the issue seems to have disappeared.

signalcodec · May 1, 2020

I wanna say i'm happy for that person, but my issue is still active (for anyone confused)

Xeata_James · Sep 23, 2020

This just started happening to me this week on servers that have never seen a problem. Today four of my VMs are all console-locked with the "rcu_sched self-detected stalls on..." . Important note however, they are all RUNNING just fine, I just no longer have console access. It kind of coincides with having upgraded my 2-server cluster to latest Proxmox... but that's a stretch.
So @wolfgang if you're looking for clues, this seems to be one.

Nemesiz · Mar 12, 2023

This is old thread but I run into this problem today too.

Choosing recovery mode from grub I saw controller problem. Switching from default (LSI 53C895A) to VirtIO SCSI solve the problem.

It was the last VM running with LSI 53C895A controller mode and upgrading kernel somehow matched with the problem.

fiona · Mar 13, 2023

Hi,

Nemesiz said:
This is old thread but I run into this problem today too.

Choosing recovery mode from grub I saw controller problem. Switching from default (LSI 53C895A) to VirtIO SCSI solve the problem.

It was the last VM running with LSI 53C895A controller mode and upgrading kernel somehow matched with the problem.

glad you found a workaround! This is a regression in our QEMU 7.2 package with the LSI SCSI controller.

rcu_sched self-detected stall on CPU

signalcodec

New Member

wolfgang

Famous Member

signalcodec

New Member

wolfgang

Famous Member

signalcodec

New Member

wolfgang

Famous Member

signalcodec

New Member

wolfgang

Famous Member

signalcodec

New Member

TheAnimaL

New Member

TheAnimaL

New Member

signalcodec

New Member

Xeata_James

Well-Known Member

Nemesiz

Renowned Member

fiona

Proxmox Staff Member

We value your privacy