Proxmox freezes when CPU under low load condtions

simon098

New Member
Jan 8, 2025
7
2
3
Hi,

Been having this issue for a while now. When the CPU is under low loads (say less than 1-2%) Proxmox will freeze randomly. The journalctl doesn't seem to provide anything meaningful either.

I can run this for days without issue if I keep the CPU at a high idle, say >3%. OpnSense is usually running at all times, although it doesn't use enough of the CPU to bring the load up.


I don't think it's a hardware issue like RAM or PSU, as this would show more when under higher loads

CPU: Ryzen 9 6900HX
Memory: 32GB DDR5
Storage: SSD (Proxmox) + nVME (VM Storage)

Proxmox Version: 8.3.2
Kernel Version: Linux 6.8.12-5-pve

Here is the latest crash journctl logs

Code:
Jan 08 01:17:01 pve CRON[32326]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jan 08 01:17:01 pve CRON[32327]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jan 08 01:17:01 pve CRON[32326]: pam_unix(cron:session): session closed for user root
Jan 08 01:25:15 pve pvedaemon[1350]: <root@pam> successful auth for user 'simon@pve'
Jan 08 01:27:55 pve pveproxy[23068]: worker exit
Jan 08 01:27:55 pve pveproxy[1357]: worker 23068 finished
Jan 08 01:27:55 pve pveproxy[1357]: starting 1 worker(s)
Jan 08 01:27:55 pve pveproxy[1357]: worker 35146 started
Jan 08 01:40:15 pve pvedaemon[1349]: <root@pam> successful auth for user 'simon@pve'
Jan 08 01:49:44 pve smartd[924]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 71 to 72
Jan 08 01:50:11 pve pveproxy[21244]: worker exit
Jan 08 01:50:11 pve pveproxy[1357]: worker 21244 finished
Jan 08 01:50:11 pve pveproxy[1357]: starting 1 worker(s)
Jan 08 01:50:11 pve pveproxy[1357]: worker 40956 started
Jan 08 01:55:16 pve pvedaemon[1350]: <root@pam> successful auth for user 'simon@pve'
Jan 08 02:02:21 pve systemd[1]: Starting pve-daily-update.service - Daily PVE download activities...
Jan 08 02:02:22 pve pveupdate[44121]: <root@pam> starting task UPID:pve:0000AC5E:000EE6B9:677DDCAE:aptupdate::root@pam:
Jan 08 02:02:23 pve pveupdate[44126]: update new package list: /var/lib/pve-manager/pkgupdates
Jan 08 02:02:24 pve pveupdate[44121]: <root@pam> end task UPID:pve:0000AC5E:000EE6B9:677DDCAE:aptupdate::root@pam: OK
Jan 08 02:02:24 pve systemd[1]: pve-daily-update.service: Deactivated successfully.
Jan 08 02:02:24 pve systemd[1]: Finished pve-daily-update.service - Daily PVE download activities.
Jan 08 02:02:24 pve systemd[1]: pve-daily-update.service: Consumed 2.261s CPU time.
Jan 08 02:10:47 pve pvedaemon[1351]: <root@pam> successful auth for user 'simon@pve'
Jan 08 02:13:26 pve pveproxy[25464]: worker exit
Jan 08 02:13:26 pve pveproxy[1357]: worker 25464 finished
Jan 08 02:13:26 pve pveproxy[1357]: starting 1 worker(s)
Jan 08 02:13:26 pve pveproxy[1357]: worker 47405 started
Jan 08 02:17:01 pve CRON[48323]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jan 08 02:17:01 pve CRON[48324]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jan 08 02:17:01 pve CRON[48323]: pam_unix(cron:session): session closed for user root
-- Boot 2e8e62781a5d4069a5670f524c44ad2d --
Jan 08 09:32:18 pve kernel: Linux version 6.8.12-5-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-5 (2024-12-03T10:26Z) ()
Jan 08 09:32:18 pve kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-5-pve root=/dev/mapper/pve-root ro quiet


Anyone with any ideas? I was thinking maybe it's to do with C-States or something, but I can't see anything about them being enabled in the BIOS, although it's pretty limited on what it shows in the BIOS for this machine...
 
So far so good. I'd normally leave it 4-5 hrs and it would have frozen. It's been running a bit more than 24hrs so far and not frozen up.

I did find an option to disable Global C states in the BIOS. I disabled this too. But I will leave this a few days and see if it crashes, if it doesn't I'll re-enable it in BIOS and see if the GRUB process.max_cstate=1 works. :cool:

P.S. Also, interestingly... since disabling the C-States, not only does it seem more reliable, it also seems to be consuming less power!
 
Last edited:
C-States on AMD hardware should be a sticky - it is flat out broken on Linux and AMD and their board manufacturers refuse to properly fix it and point the finger. It is broken on Windows as well, although in many cases, Microsoft has forced the hand of AMD to ship a fix for them through Windows Update.

There may be some fixes but it needs a coordinated fix between your CPU and motherboard firmware which is not always available. AMD just doesn’t care about Linux. On servers, yeah, but desktop, no.
 
Last edited:
Jan 08 01:49:44 pve smartd[924]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 71 to 72
Have a look at the cooling of your server. 72 degrees is quite hot for a drive.
 
Have a look at the cooling of your server. 72 degrees is quite hot for a drive.
I looked into this, the SMART readings don't give you the actual disk temp. The disks are actually around 40C
 
Last edited:
So, an interim update.

I've been running with Global C-States Disabled in the BIOS and GRUB process.max_cstate=1 for 5 days now. This is far longer than the usual 3-4hrs before freezing. So this seems to have solved my issue.

I reset the BIOS to factory (re-enabling Global C-States in the BIOS) last night and it still hasn't fallen over. So far, the GRUB process.max_cstate=1 seems to be effective.

I will update in a few more days for any other poor soul that's been suffering these problems :)
 
Last edited:
  • Like
Reactions: fba
Final update:

I left the BIOS reset to default and tested with just the GRUB process.max_cstate=1 for the last 5-6 days and it hasn't frozen up once.

Interestingly, I read a lot of people complaining that this would prevent turbo / dynamic clock frequencies... can't say this is something I experienced. Also the power usage is no worse than before changing the mac C-State.

I'm sure disabling global c-states will probably work just the same too. But for anyone else having these issues, at least it's documented now :p

Here's to hoping this helps others :)

Thanks @Fantu
 
Last edited:
  • Like
Reactions: fba
For what it's worth, I can confirm that disabling C-states solved the issue of my Proxmox server freezing every day without any logs in syslog or journalctl.
AMD Ryzen 5 3600 and an ASRock MB.
 
For what it's worth, I can confirm that disabling C-states solved the issue of my Proxmox server freezing every day without any logs in syslog or journalctl.
AMD Ryzen 5 3600 and an ASRock MB.
Always good to have more people confirm.

Mine has been rock solid ever since and thats over 6 weeks ago now :)
 
Always good to have more people confirm.

Mine has been rock solid ever since and thats over 6 weeks ago now :)
Spoke to soon, after three weeks the problem came back, now it freezes every day - can't describe how tired I am of this shit.
This was the end of my troubleshooting journey,

I have:
- Switched SSD and SATA cable for the Proxmox disk.
- Done a memory test that turned out green.
- Cleaned entire server.
- Confirmed the system is not running too hot.

I will buy a used i7 8700k and see if it is the same with Intel, if it is then I have no words for this shit.