[SOLVED] Proxmox hang : difficulty finding the origin

Singman

Well-Known Member
Sep 13, 2019
44
1
48
57
Hi,

Since months, my Proxmox setup hang really often, sometimes stable for days, sometimes could not get up more than a couple of minutes.
Setup
Motherboard : ASUS Pro WS X570-ACE
CPU : AMD Ryzen 9 5900X
Memory : 2x Kingston server premier 16Gb (DDR4 ECC CL19 DIMM 2Rx8 Mémoire serveur Hynix D - KSM26ED8/16HD)
Memory : 2x Kingston Server Premier 8GB (DDR4 ECC CL19 DIMM 1Rx8 Mémoire serveur Hynix D - KSM26ES8/8HD)
Boot : SSD M2.2280 nvme 480 Gb
Storage : 4x SS Crucial MX500 1Tb (CT1000MX500SSD1)
Cooling : Watercooling Fractal Design Celsius S24 Blackout

First, I checked memory; no error but I had the warranty activated and got 2 new RAM. I also bought brand new 2x 8Gb to try them (and kept them :))
Then I changed the motherboard, still crashing.

I have absolutly NO MESSAGE in dmesg or journalctl before crash, the system just hang and nothing work. No warning before. It just happen.
Memtest86+ give no errors.
I used a stress Linux UISB key to see if the cpu work good, no error, no temperature limit.

I only see one other source of the problem : software. That Proxmox is 5 years old and got updated many times (current version is 8.4.1). But reinstalling everything, even if I have backup with PBS, will be a hard work, it's my homelab and the conf is not really straight forward.

Do you have any idea ? How to debug that configuration ?
 
I don't have anything similar to your setup, but in general this is what I would try:

I'd probably try updating the BIOS on that MB.

Try pinning to a previous kernel - to see if that removes issues.

Another thing - you don't provide any GPU or NW details. These can often be a source of issues.

If you are using the on-board LANs/NICs maybe try using your own.
(I see the board seems to have 2 NICs; Realtek® RTL8117 & Intel® I211-AT, maybe try changing your setup with which one you use for what).

(I also see your board uses ASUS LAN Guard - not sure of this HW-implemented device, but if possible try deactivating this BIOS-side).

Anyway good luck, as stated above I don't use any of the above.
 
Ok, problem solved and it's really strange.

Replaced my CPU (was working fine during 4 years) by another one, a Ryzen 5 5600X.
The strange part is when I'm testing the CPU with linux tools, everything is ok. And it's not an overheating problem. Ill will try to test that CPU under Windows to see if it show the same problem.

BTW, Proxmox have a few strange behavior when compared to Debian Bookworm, I have a few PC running fine under Debian and not under Proxmox. I should try to install PVE over Debian to see what happen :)
 
BTW, Proxmox have a few strange behavior when compared to Debian Bookworm, I have a few PC running fine under Debian and not under Proxmox. I should try to install PVE over Debian to see what happen :)
Installing on top of Debian is no different than via the installer (unless there are things missing in the guide or Proxmox, like not enabling some services, recently). But Proxmox is different from Debian in that it uses a kernel based on the Ubuntu LTS kernel (which is also some versions ahead of Debian) and therefore also different drivers and firmware. And Proxmox keeps systems more busy with logging and graph data than other Debian installations.
 
Last edited:
Installing on top of Debian is no different than via the installer (unless there are things missing in the guide or Proxmox, like not enabling some services, recently). But Proxmox is different from Debian in that it uses a kernel based on the Ubuntu LTS kernel (which is also some versions ahead of Debian) and therefore also different drivers and firmware. And Proxmox keeps systems more busy with logging and graph data than other Debian installations.
But I have a few PC that work like a charm under Debian and keep crashing every hour under Proxmox :)
That's why I should to not burn them directly with PVE ISO but do an install over Debian.
 
But I have a few PC that work like a charm under Debian and keep crashing every hour under Proxmox :)
You should not compare Proxmox with Debian in these matters. It makes more sense to compare Proxmox with Ubuntu LTS. I don't understand your reasoning, sorry.
That's why I should to not burn them directly with PVE ISO but do an install over Debian.
I hoped to explain that you should not expect a difference but feel free to spend your time one this. If you are then I still suggest that you also compare it with Ubuntu LTS (but you cannot install Proxmox on top of this).
 
Last edited: