[SOLVED] Proxmox keeps crashing

Make sure the kernel is installed, e.g. for the one mentioned above:
apt install pve-kernel-5.15.60-2-pve
Afterwards you can add it to the list of known kernels for the proxmox-boot-tool [0]:
proxmox-boot-tool kernel add 5.15.60-2-pve
And if you want to always boot that kernel, you can pin it:
proxmox-boot-tool kernel pin 5.15.60-2-pve


[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysboot
 
  • Like
Reactions: freddyvdh
Again, if CPU is AMD Ryzen/Threadripper/Epyc gen 2 or 3, please try that first: https://forum.proxmox.com/threads/o...ox-ve-7-x-available.115090/page-3#post-507337
Kernel downgrade doesn't help.
If CPU is some Intel, use search for crashes with intel.

Ryzen and Threadripper processors have CSTATE 6 that needs to be disabled, when running Linux. Look into your BIOS/UEFI settings and find something similar like: "Power idle control" and set it to "Typical current idle" (or normal, high, something that is not an equivalent of low). You may find this settings somewhere in Power, Misc, or even in CPU settings. Different brands might name it a little different. Search how to disable CSTATE 6 for your mainboard.
Your crashes will disappear. Known issue since Kernel 5.10 or even earlier.

Maybe this should be sticked (with better explanation) and go into Wiki, for those who have random crashes with recent AMD CPUs and can't find the solution by searching the forums.

If you like to use sensors (temperature), you need at least Kernel 5.15.x, better 5.19.x, because 5.13.x and earlier had no implementation for AMD sensors. This concerns all recents AMD CPUS and chipsets, not only x570 and Zen 2/3
 
Last edited:
Again, if CPU is AMD Ryzen/Threadripper/Epyc gen 2 or 3, please try that first: https://forum.proxmox.com/threads/o...ox-ve-7-x-available.115090/page-3#post-507337
Kernel downgrade doesn't help.
If CPU is some Intel, use search for crashes with intel.



Maybe this should be sticked (with better explanation) and go into Wiki, for those who have random crashes with recent AMD CPUs and can't find the solution by searching the forums.

If you like to use sensors (temperature), you need at least Kernel 5.15.x, better 5.19.x, because 5.13.x and earlier had no implementation for AMD sensors. This concerns all recents AMD CPUS and chipsets, not only x570 and Zen 2/3

Thanks for your response. I went through all the Bios settings several times, and the only setting that even remotely resembles what they are saying, was enabling/disabling low power mode. This was already set to disabled, so unfortunately I don't think this is the issue.
 
hm, that sounds strange. i have dozens of ryzen 5950x and a few epyc 72xx/73xx nodes. the only other possible stability issue i experienced so far, is OOM with ryzen (not with epyc)... since ryzen is limited to 128gb ram, it happens fast to max that out, considering 16c/32t and all the caches that are not calculated before (ceph/zfs + vdisk caches). also, when balooning is disabled (which again is often disabled, since it is slow), VMs eat more than you consider (caches).
 
Last edited:
Did some more debugging. I can trigger the freezes by downloading large files from within a VM, or copying large files over LAN. Was thinking perhaps something wrong with the SSD, so I installed a new m.2 SSD, installed a fresh copy of Proxmox and created 1 VM. Started downloading a file, soon as the transfer rate exceeded 30 MBs, the whole PM server froze. CPU and RAM usage were both < 5% at that time.
 
Did some more debugging. I can trigger the freezes by downloading large files from within a VM, or copying large files over LAN. Was thinking perhaps something wrong with the SSD, so I installed a new m.2 SSD, installed a fresh copy of Proxmox and created 1 VM. Started downloading a file, soon as the transfer rate exceeded 30 MBs, the whole PM server froze. CPU and RAM usage were both < 5% at that time.
In this case it may be the NIC that's causing the freezes. Do you have a different one available to test?
 
  • Like
Reactions: flames
Tried a USB NIC, and also connected the box to a monitor.

The system survived way longer this time. Instead of freezing instantly after heavy write actions (downloading large file), it now kept running over 30 mins while downloading at nearly 70 MBs! In the end, it froze again though. The monitor is showing a full green screen and everything is unresponsive. The fans are still blowing at max though.
 
Which CPU are you using? Do you have an integrated GPU?

This could still hint at PSU issues, or RAM issues. Maybe check both to see if there are any issues.
 
Using a Gigabyte GB-BRR7H-4800 with Ryzen 7 4800U. Any suggestion how to check PSU for issues?
 
OP here!

It worked for me disabling the C-states in BIOS. Not been getting any more crashes for a few days now.
 
  • Like
Reactions: mira
Asrock actually removed c state 6 toggle on their newest bios, I am not sure if its on or off by default but luckily no issues on my Zen 3 chip in Proxmox.
 
Asrock actually removed c state 6 toggle on their newest bios, I am not sure if its on or off by default but luckily no issues on my Zen 3 chip in Proxmox.
Thanks for the heads up, so I know not to update the BIOS, but would be interesting to find out if its now disabled by default or what is going on, otherwise I dont see how this can be made stable with future motherboards in case of issues like these.
 
You can still force it off by globally disabling c-states, when I toggled that was barely any difference to idle power consumption.
 
  • Like
Reactions: flames
just fyi: this will disable turbo boost and power management for your cpu at all, so you run all your cores permanently at base frequency...
and yes, it is indeed a workaround, better than crashes all the time. if there is no other way to disable cstate 6 on your mobo, it is acutally the only way to get your system stable (fuuuu HPE with your DL385 gen10 plus v2!)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!