Hi,
Since months, my Proxmox setup hang really often, sometimes stable for days, sometimes could not get up more than a couple of minutes.
Setup
Motherboard : ASUS Pro WS X570-ACE
CPU : AMD Ryzen 9 5900X
Memory : 2x Kingston server premier 16Gb (DDR4 ECC CL19 DIMM 2Rx8 Mémoire serveur Hynix D - KSM26ED8/16HD)
Memory : 2x Kingston Server Premier 8GB (DDR4 ECC CL19 DIMM 1Rx8 Mémoire serveur Hynix D - KSM26ES8/8HD)
Boot : SSD M2.2280 nvme 480 Gb
Storage : 4x SS Crucial MX500 1Tb (CT1000MX500SSD1)
Cooling : Watercooling Fractal Design Celsius S24 Blackout
First, I checked memory; no error but I had the warranty activated and got 2 new RAM. I also bought brand new 2x 8Gb to try them (and kept them
)
Then I changed the motherboard, still crashing.
I have absolutly NO MESSAGE in dmesg or journalctl before crash, the system just hang and nothing work. No warning before. It just happen.
Memtest86+ give no errors.
I used a stress Linux UISB key to see if the cpu work good, no error, no temperature limit.
I only see one other source of the problem : software. That Proxmox is 5 years old and got updated many times (current version is 8.4.1). But reinstalling everything, even if I have backup with PBS, will be a hard work, it's my homelab and the conf is not really straight forward.
Do you have any idea ? How to debug that configuration ?
Since months, my Proxmox setup hang really often, sometimes stable for days, sometimes could not get up more than a couple of minutes.
Setup
Motherboard : ASUS Pro WS X570-ACE
CPU : AMD Ryzen 9 5900X
Memory : 2x Kingston server premier 16Gb (DDR4 ECC CL19 DIMM 2Rx8 Mémoire serveur Hynix D - KSM26ED8/16HD)
Memory : 2x Kingston Server Premier 8GB (DDR4 ECC CL19 DIMM 1Rx8 Mémoire serveur Hynix D - KSM26ES8/8HD)
Boot : SSD M2.2280 nvme 480 Gb
Storage : 4x SS Crucial MX500 1Tb (CT1000MX500SSD1)
Cooling : Watercooling Fractal Design Celsius S24 Blackout
First, I checked memory; no error but I had the warranty activated and got 2 new RAM. I also bought brand new 2x 8Gb to try them (and kept them

Then I changed the motherboard, still crashing.
I have absolutly NO MESSAGE in dmesg or journalctl before crash, the system just hang and nothing work. No warning before. It just happen.
Memtest86+ give no errors.
I used a stress Linux UISB key to see if the cpu work good, no error, no temperature limit.
I only see one other source of the problem : software. That Proxmox is 5 years old and got updated many times (current version is 8.4.1). But reinstalling everything, even if I have backup with PBS, will be a hard work, it's my homelab and the conf is not really straight forward.
Do you have any idea ? How to debug that configuration ?