Server lag spikes at 80% -- rcu: INFO: rcu_preempt self-detected stall on CPU

deezly

New Member
Feb 13, 2024
1
0
1
Hello all, I just got a (new*) dell poweredge r630 for a virtualization station and was looking for a good software to run on it. I heard about how great Proxmox is and the great community behind it so I gave it a try. I installed it.

Specs (not the most impressive I know, and older hardware but I thought it would be modest enough)
Dell poweredge r630
3x960GB SSD's (RAID0)
Raid Controller - PERC H730p Mini Mono 12GB
2xIntel Xeon E5-2696 v3 (36 cores total, 72 threads)
192GB DDR4 RAM
Network Card - Broadcom BCM57800-T 2 X10GB
10GBPS Upload / Download

So far I love the rich features and UI and everything was going smoothly. But I noticed on some of my one VM during install VNC would just completely stop and lag out. Thought nothing of it. My use case for the proxmox server is all my little projects and to run a few game servers. So far I only have one VM set up.

VM Settings:
Memory: 64GB
Processors - 1 socket, 65 [host]
BIOS - OVMF
Machine - q35
SCSI Controller - VirtlO SCSI single


So I set up an Enshrouded to start, and its been a mess. I'll get extreme lag spikes where I'll do stuff, then everything jumps back in time 10-15 seconds. Weird. I wanted to make sure this isn't a single game related issue so I tried Palworld & minecraft instead, same thing. Extremely unstable and random lag spikes. I'm kinda scratching my head.
I installed ESXI 7 on this same setup and the issues disappeared. But I'd really like to keep using proxmox.

I'm very ignorant on where to begin my troubleshooting so I figure I'd ask here and see if there is anything I can do.

I booted up the VM and played for around 10 minutes and had multiple spikes of lag, and here's the VM graphs. Nothing outwardly jumped out at me.

Something I notice is the whole server sits at around 8% but every minute or so it spikes to 85% CPU usage , and this is when the lag starts.
1707797671921.png
1707797720903.png

Thanks for your time.

Edit: got some error codes on my console for my VM

rcu: INFO: rcu_preempt self-detected stall on CPU

rcu: 0...!: (150 GPs behind idle=cff4/0/0x1 softirq=248/248 fqs=0

rcu: rcu_preempt kthread starved for 5505 jiffies! g3041 f0x0 RCU_GP_WAIT_FQS(5 -> state 0x0 ->cpu=9

rcui: RCU grace-period kthread stack dump:

rcu: Stack dump where RCU GP kthread last ran:
 
Last edited:
About a week ago, I started demo'ing proxmox, xcp-ng, and hyper-v (in that order) to see which one is the best fit for me now with the whole broadcom thing going down. So I'm no expert here, but it seems to me you might want to re-configure your raid controller by switching to hba mode and use zfs instead.

While I was trying out proxmox on my R740xd with raid-5 (3x6tb) on the PERC H730p Mini, I too had issues with lockups. Even though my disk benchmarks showed huge performance increases over any other the other hypervisors, I believe the lockups were caused by delays from my raid cache being filled and the system paused while data was being written to disk. The benchmarks would start out great and if I ran them repeatedly, write speeds would start going down with each subsequent benchmark. And if I transferred a 4gb iso from one vm to another, the system would hang for a bit every single time.

Again, I'm no expert, so take that experience with a grain of salt. I should note though that I did not experience the lockups with any other hypervisors, but also didn't see other benchmarks from other hypervisors come anywhere close to some of my first run benchmark tests.

Since proxmox seemed to do everything I need it to do, I'll likely be trying it out once more with zfs and my H730p in hba mode. I'll be finishing up my testing with hyper-v tomorrow, so probably switching back tomorrow night or tuesday unless Hyper-V suddenly becomes my final pick for some reason. I suspect my lag wouldn't have been as noticeable with my configs had I used nvme or ssd's, but I definitely wasn't running the configuration everyone here seems to promote (zfs with hba, or rather running an independent hba controller instead of raid controller).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!