max Cstate=1 fixed freezing on VE 8.X but Version 9+ upgrade/fresh seems to bring the issue back (2400GE)

Koseph

New Member
Nov 28, 2025
4
0
1
Hi all,

I am looking for some assistance in understanding stability issues presented on my home cluster (at worst this info can hopefully help someone find a temporary fix for their environment). I am less knowledgeable than I wish I were on troubleshooting Linux systems and have been throwing spaghetti against the wall (Hopefully in a controlled fashion to try to figure out where the issue lies).

The primary symptom of the issue:

Complete host system hang after short periods of runtime time. 1-3 Hours of runtime usually, but shortest time experienced was about 10 minutes. (Potentially aggravated by VMs running, but unable to verify beyond testimonial)

  • Display continues to function/display frozen screen if already pugged in and powered on, but plugging in a monitor if running headless results in the monitor falling into power saving mode.
  • Keyboard input is not registered including changing TTY lines and Ctrl+Alt+Delete spam.
  • On hang, host IP reachability fails and all VMs (If any were running) become unreachable, but network switch still shows 1000/Full negotiated and Link Layer up.
The Hardware: (It is a home lab/small use case environment)
PVE Install:
  • Proxmox VE 9.1.1 – UEFI Install (Haven’t tried BIOS)
  • One was an upgrade from 8X
  • One is a direct installation from 9.0
  • Currently one system is installed on an LVM partition, the other is installed on a ZFS partition
  • Various Kernels:
  • 6.8.12-9-pve – Functional with processor.max_cstate=1, failure without.
  • 6.8.12-15-pve – Functional with processor.max_cstate=1, failure without.
  • 6.14.11-2-pve – Non-functional with processor.max_cstate=1
  • 6.17.2-2-pve – Non-functional with processor.max_cstate=1
  • Usually clustered, but in troubleshooting, issues still occur without.
History/Steps taken:

This is an issue I have been fighting with since I decided to try to run Proxmox VE on these systems. This was before 9.0 was out so I was on a base version of 8 on both nodes. Initial information I found suggested that CState switching on the Zen 1-2 architecture in early supported versions of the kernel was not the most stable, and folks had found stability in disabling lower CStates.
  • My first point was to disable C6 in the bios and see if we had any luck. – This yielded no observable change.
  • Next I modified etc/default/grub to add “processor.max_cstate=1” to GRUB_CMDLINE_LINUX_DEFAULT, generated a config and rebooted only to find when reading /sys/module/processor/parameters/max_cstate that the change didn’t take effect – Obviously this yielded no observable change. (Unless I am misremembering)
  • After a little bit of RTFM… I modified /etc/kernel/cmdline and appended “processor.max_cstate=1” to the entry, followed by “pve-efiboot-tool refresh” and a reboot. reading /sys/module/processor/parameters/max_cstate returned 1 as expected. – Though I didn’t love the idea of increased power consumption, this was a valid workaround.
Instead of trying to figure out why max_cstate=1 improves stability like I should have, I paid my electric company a little extra money and forgot about it…

It is a home lab, so on release of 9.0, I went ahead and updated my nodes. Immediately the issue seemed to return. I went ahead and tried to rebuild a node on a fresh install to see if I messed up the upgrae, but had no luck. I then set max cstate again in /etc/kernel/cmdline (Or it was already set, I can’t remember) and verified that it was 1. I figured it might be the kernel change as that was a major change between versions, but said… I will deal with this another day and have been too busy, so I let it sit.

Well, the new kernel option got me curious, so I hopped back into my home lab to play, hoping the new kernel would make it stable again, and no luck… Because it is now top of mind, I also figured, lets go ahead and roll back the kernel to a previous version and see what happens. After setting the kernel with “pve-efiboot-tool kernel pin 6.8.12-15-pve --next-boot” and verifying /sys/module/processor/parameters/max_cstate returns 1, system has been stable for 24 hours…

You may have noticed a lack of mentioning journalctl, dmesg, and syslog… This is where I was better at linux administration and could use a little help as there are still more questions, and I don’t know what to look for in the logs to know.

Since the logs don’t seem to persist through a reboot, I can setup a remote syslog collector. But journalctl and dmesg seem to clear on reboot, so I don’t know what messages may be leading up to the fault. I do know that the time I left a monitor plugged in until the system froze, I got the following message multiple times that I assume is related to CPU Interrupts.

“[Timestamp] perf: interrupt took too long (XXXX > YYYY), lowering kernel.perf_event_max_sample_rate to ZZZZZ”
  • Where:
  • XXXX is the time the interrupt took,
  • YYYY is the threshold,
  • and ZZZZZ is the sample rate.
This happened 4 times before the freeze with times between of roughly 4800 seconds, 6700 seconds, and 29000 seconds and the sample rate decreasing from 78000 to 62000 to 49000 to 390000. These events were preceded (8500 seconds) by an event, “hrtimer: interrupt took 41067 ns”

I am aware that interrupts transition processor cstates, so a more active cstate reducing interrupt time makes sense to me (And I have yet to cycle down cstates to figure out if 2/3/4 are stable), but I would still love to know two things…
  • The R5 2400GE was a relatively mainstream CPU, and seems to be supported in modern kernels without widely reported issues. What interaction between the kernel and proxmox could be causing this, and is max CState modification the most appropriate fix?
  • What changed between 6.8.X and later kernels to cause it to only work on 6.8.
--

Any guidance y’all can provide in my efforts to find these answers is appreciated!

(Edited Spelling and Clarity - 11/29)
(Edited Another Typo - 12/1)
 
Last edited:
Alright, Update time…

First, I had some errors and assumptions in my above troubleshooting, and I am a little lost on some things.

To bring the systems back to a consistent state. I wiped both away, and installed with the same parameters on both:
  • Fresh install of 9.1.1
  • LVM-EXT4 on root partition
  • Rolled kernel back to 6.8.12-15

I don't know the specifics of why, but now to modify boot parameters I went back to modifying /etc/default/grub for changes to affect the system. (This time I validated with “grep . /sys/devices/system/cpu/cpu*/cpuidle/state*/name”)

Initial settings:
GRUB_CMDLINE_LINUX_DEFAULT="quiet processor.max_cstate=1"

I started the makings of a home lab to get some level of load on the nodes:

Node1:

Node2:

For some reason, node 2 seems to crash/hang about every few hours in this configuration. To keep configs in sync, I haven’t let node 1 run for a long period, so I don’t know if it is stable/crashing after a period.

I tried removing max cstate, moving to 2, and tried amd_pstate=active to see if that might do anything, all with no luck.

I setup systemd-journal-remote and sent logs to a collector, and they are… uninspiring.

Below are logs for a crash when GRUB_CMDLINE_LINUX_DEFAULT="quiet processor.max_cstate=2 amd_pstate=active":

Code:
Dec 05 20:53:53 pve2 pmxcfs[1133]: [status] notice: received log
Dec 05 20:59:32 pve2 chronyd[1048]: Detected falseticker 74.208.25.46 (2.debian.pool.ntp.org)
Dec 05 21:00:04 pve2 sshd-session[6074]: Accepted publickey for root from 192.168.1.11 port 36186 ssh2: RSA SHA256:tiroNEyGcoUokZrlrgiIOXGAG1z0jBctj+08EEAjr9A
Dec 05 21:00:04 pve2 sshd-session[6074]: pam_unix(sshd:session): session opened for user root(uid=0) by root(uid=0)
Dec 05 21:00:04 pve2 systemd-logind[942]: New session 6 of user root.
Dec 05 21:00:04 pve2 systemd[1]: Started session-6.scope - Session 6 of User root.
Dec 05 21:00:05 pve2 sshd-session[6081]: Received disconnect from 192.168.1.11 port 36186:11: disconnected by user
Dec 05 21:00:05 pve2 sshd-session[6081]: Disconnected from user root 192.168.1.11 port 36186
Dec 05 21:00:05 pve2 sshd-session[6074]: pam_unix(sshd:session): session closed for user root
Dec 05 21:00:05 pve2 systemd-logind[942]: Session 6 logged out. Waiting for processes to exit.
Dec 05 21:00:05 pve2 systemd[1]: session-6.scope: Deactivated successfully.
Dec 05 21:00:05 pve2 systemd[1]: session-6.scope: Consumed 671ms CPU time, 99M memory peak.
Dec 05 21:00:05 pve2 systemd-logind[942]: Removed session 6.
Dec 05 21:01:08 pve2 pmxcfs[1133]: [status] notice: received log
Dec 05 21:03:21 pve2 systemd[1]: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories...
Dec 05 21:03:21 pve2 systemd-tmpfiles[7044]: /usr/lib/tmpfiles.d/legacy.conf:14: Duplicate line for path "/run/lock", ignoring.
Dec 05 21:03:21 pve2 systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Dec 05 21:03:21 pve2 systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
Dec 05 21:15:06 pve2 kernel: fbcon: amdgpudrmfb (fb0) is primary device
Dec 05 21:15:06 pve2 kernel: Console: switching to colour frame buffer device 160x45
Dec 05 21:15:06 pve2 kernel: amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device
Dec 05 21:15:40 pve2 kernel: usb 2-3: new full-speed USB device number 2 using xhci_hcd
Dec 05 21:15:40 pve2 kernel: usb 2-3: New USB device found, idVendor=046d, idProduct=c52b, bcdDevice=24.07
Dec 05 21:15:40 pve2 kernel: usb 2-3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
Dec 05 21:15:40 pve2 kernel: usb 2-3: Product: USB Receiver
Dec 05 21:15:40 pve2 kernel: usb 2-3: Manufacturer: Logitech
Dec 05 21:15:40 pve2 kernel: hid: raw HID events driver (C) Jiri Kosina
Dec 05 21:15:40 pve2 kernel: usbcore: registered new interface driver usbhid
Dec 05 21:15:40 pve2 kernel: usbhid: USB HID core driver
Dec 05 21:15:40 pve2 kernel: usbcore: registered new interface driver usbmouse
Dec 05 21:15:40 pve2 kernel: usbcore: registered new interface driver usbkbd
Dec 05 21:15:40 pve2 kernel: input: Logitech USB Receiver as /devices/pci0000:00/0000:00:08.1/0000:03:00.3/usb2/2-3/2-3:1.0/0003:046D:C52B.0001/input/input9
Dec 05 21:15:40 pve2 kernel: hid-generic 0003:046D:C52B.0001: input,hidraw0: USB HID v1.11 Keyboard [Logitech USB Receiver] on usb-0000:03:00.3-3/input0
Dec 05 21:15:40 pve2 kernel: input: Logitech USB Receiver Mouse as /devices/pci0000:00/0000:00:08.1/0000:03:00.3/usb2/2-3/2-3:1.1/0003:046D:C52B.0002/input/input10
Dec 05 21:15:40 pve2 kernel: input: Logitech USB Receiver Consumer Control as /devices/pci0000:00/0000:00:08.1/0000:03:00.3/usb2/2-3/2-3:1.1/0003:046D:C52B.0002/input/input11
Dec 05 21:15:40 pve2 kernel: input: Logitech USB Receiver System Control as /devices/pci0000:00/0000:00:08.1/0000:03:00.3/usb2/2-3/2-3:1.1/0003:046D:C52B.0002/input/input12
Dec 05 21:15:40 pve2 kernel: hid-generic 0003:046D:C52B.0002: input,hiddev0,hidraw1: USB HID v1.11 Mouse [Logitech USB Receiver] on usb-0000:03:00.3-3/input1
Dec 05 21:15:40 pve2 kernel: hid-generic 0003:046D:C52B.0003: hiddev1,hidraw2: USB HID v1.11 Device [Logitech USB Receiver] on usb-0000:03:00.3-3/input2
Dec 05 21:15:40 pve2 kernel: logitech-djreceiver 0003:046D:C52B.0003: hiddev0,hidraw0: USB HID v1.11 Device [Logitech USB Receiver] on usb-0000:03:00.3-3/input2
Dec 05 21:15:40 pve2 kernel: input: Logitech Wireless Device PID:404d Keyboard as /devices/pci0000:00/0000:00:08.1/0000:03:00.3/usb2/2-3/2-3:1.2/0003:046D:C52B.0003/0003:046D:404D.0004/input/input14
Dec 05 21:15:40 pve2 kernel: input: Logitech Wireless Device PID:404d Mouse as /devices/pci0000:00/0000:00:08.1/0000:03:00.3/usb2/2-3/2-3:1.2/0003:046D:C52B.0003/0003:046D:404D.0004/input/input15
Dec 05 21:15:40 pve2 kernel: hid-generic 0003:046D:404D.0004: input,hidraw1: USB HID v1.11 Keyboard [Logitech Wireless Device PID:404d] on usb-0000:03:00.3-3/input2:1
Dec 05 21:15:40 pve2 kernel: input: Logitech K400 Plus as /devices/pci0000:00/0000:00:08.1/0000:03:00.3/usb2/2-3/2-3:1.2/0003:046D:C52B.0003/0003:046D:404D.0004/input/input19
Dec 05 21:15:40 pve2 kernel: logitech-hidpp-device 0003:046D:404D.0004: input,hidraw1: USB HID v1.11 Keyboard [Logitech K400 Plus] on usb-0000:03:00.3-3/input2:1
Dec 05 21:15:40 pve2 systemd-logind[942]: Watching system buttons on /dev/input/event7 (Logitech K400 Plus)
Dec 05 21:16:08 pve2 pmxcfs[1133]: [status] notice: received log
Dec 05 21:17:01 pve2 CRON[11120]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
Dec 05 21:17:01 pve2 CRON[11130]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Dec 05 21:17:01 pve2 CRON[11120]: pam_unix(cron:session): session closed for user root
Dec 05 21:18:27 pve2 smartd[937]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 34 to 35
Dec 05 21:27:58 pve2 pmxcfs[1133]: [dcdb] notice: data verification successful
Dec 05 21:30:05 pve2 sshd-session[14985]: Accepted publickey for root from 192.168.1.11 port 46598 ssh2: RSA SHA256:tiroNEyGcoUokZrlrgiIOXGAG1z0jBctj+08EEAjr9A
Dec 05 21:30:05 pve2 sshd-session[14985]: pam_unix(sshd:session): session opened for user root(uid=0) by root(uid=0)
Dec 05 21:30:05 pve2 systemd-logind[942]: New session 9 of user root.
Dec 05 21:30:05 pve2 systemd[1]: Started session-9.scope - Session 9 of User root.
Dec 05 21:30:05 pve2 sshd-session[14992]: Received disconnect from 192.168.1.11 port 46598:11: disconnected by user
Dec 05 21:30:05 pve2 sshd-session[14992]: Disconnected from user root 192.168.1.11 port 46598
Dec 05 21:30:05 pve2 sshd-session[14985]: pam_unix(sshd:session): session closed for user root
Dec 05 21:30:05 pve2 systemd[1]: session-9.scope: Deactivated successfully.
Dec 05 21:30:05 pve2 systemd[1]: session-9.scope: Consumed 679ms CPU time, 98.9M memory peak.
Dec 05 21:30:05 pve2 systemd-logind[942]: Session 9 logged out. Waiting for processes to exit.
Dec 05 21:30:05 pve2 systemd-logind[942]: Removed session 9.
Dec 05 21:31:09 pve2 pmxcfs[1133]: [status] notice: received log
Dec 05 21:34:04 pve2 chronyd[1048]: Source 74.208.25.46 replaced with 172.234.37.140 (2.debian.pool.ntp.org)
Dec 05 21:46:10 pve2 pmxcfs[1133]: [status] notice: received log
Dec 05 21:48:27 pve2 smartd[937]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 35 to 34
Dec 05 22:00:05 pve2 sshd-session[23790]: Accepted publickey for root from 192.168.1.11 port 50726 ssh2: RSA SHA256:tiroNEyGcoUokZrlrgiIOXGAG1z0jBctj+08EEAjr9A
Dec 05 22:00:05 pve2 sshd-session[23790]: pam_unix(sshd:session): session opened for user root(uid=0) by root(uid=0)
Dec 05 22:00:05 pve2 systemd[1]: Starting dpkg-db-backup.service - Daily dpkg database backup service...
(END)

I don't have an exact crash time, but find anything of major interest in there other than now knowing the last thing the system ren before crashing, but I am not really sure how to investigate if that had an effect yet.

During that crash, I left a monitor and keyboard attached overnight. When I returned this morning, the screen display was frozen, but unlike the time I tried long ago, there were no extra logs on the screen about interrupt times...

I am pretty sure my next step is to power down containers and to have an equal load on the systems to see what happens.

Is this just bad hardware? Any way to validate?

(Edited for improved detail 12/6/2025)
 
Last edited:
Try to completely disable c-states in bios.
Could be a storage problem too, try disabling sata link power management in bios, or for testing everyting related to power saving.
And check what nic you have, if it is a intel you maybe affected by the e1000e issue

You can always test another linux distro to see if the problem persists there like vanilla debian or ubuntu server 24.04 with hwe.
Memtest would not be a bad idea too.
 
  • Like
Reactions: Koseph
Thanks for the reply and your thoughts!

I was unaware of the E1000 issue, but at a glance, it looks like it should not affect me as the systems have Realtek NICs.

Code:
root@pve2:~# lspci | grep Ethernet
01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 0e)
root@pve2:~# lspci | grep Intel   
root@pve2:~#

In the BIOS I could disable C6 again, I forgot to mention I reenabled it, but when I boot without the processor.max_cstate parameter, the system only sees C1 and C2.

Code:
root@pve2:~# grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
/sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
/sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1
/sys/devices/system/cpu/cpu0/cpuidle/state2/name:C2

I should really work on limiting parameter changes... I did reenable SATA and added a storage drive to run ZFS on. The issue was occurring when the SATA controller was disabled in the BIOS before (The root drives are NVME), so I don't think it is related, but I will see if I can do a controlled isolation.

I had run a memtest previously, but I should give it another go, if that comes back clean, I will try base debian and see how the system does running prime95 for a while.
 
just chiming in here to say I also have a system with an AMD Ryzen 5 PRO 2400GE CPU on a ThinkCentre M715q running 6.8.12-17-pve, and I'm having issues with the device hanging/freezing and requiring a manual power reset. I don't know how to fix it.


I disabled C-states, switched to a USB NIC, checked journalctl, and ran memtest. Still don't know how to solve the issue.
 
Last edited:
  • Like
Reactions: Koseph
just chiming in here to say I also have a system with an AMD Ryzen 5 PRO 2400GE CPU on a ThinkCentre M715q running 6.8.12-17-pve, and I'm having issues with the device hanging/freezing and requiring a manual power reset. I don't know how to fix it.


I disabled C-states, switched to a USB NIC, checked journalctl, and ran memtest. Still don't know how to solve the issue.

Good to know there are more out there! Makes me feel potentially a little better about it being an out of the box config issue that just need to be found and rectified. Are you running ZFS for your root partition or at all?

When you say you disabled cstates, do you mean disabling C6 in the bios? or did you use a boot parameter?

I won't be able to do a swap to base Debian until next week. For now, I am going to try to run the system with 0 load beyond the host OS. If that fails, I am going to remove all ZFS volumes.

We will see how it goes!