Constant Proxmox errors / crashing

hax0rjax

New Member
Dec 4, 2022
3
0
1
Hello all! So I've had an issue with Proxmox starting about 2 weeks ago. My Ryzen 9 3950x / Gigabyte x570 UD MB Proxmox setup (7.3) was crashing, kernel panicking, and just being super unstable out of no where. I have since then replaced (in order) Motherboard with a ASUS Prime x570 pro (Latest BIOS), swapped out my 9265-8i raid card with a 9300-8i, swapped my dual SFP+ Ethernet card with a single SFP+, swapped the 64 gigs of RAM with 128 gigs ECC from NEMIX, and then got a new Ryzen 9 5950x. I was getting issues like USB would stop working when trying to backup my VMs to an external HDD, Memory errors (ran Memtest for 6+ hours and 2 passes, no errors found), and messed with enabling / disabling IOMMU, Re-Bar, C states, and have hit a wall. At this point, 99% of the errors are gone with one crash I had on the all new hardware and I am getting MCE / Memory error still even with replaced ram that tests good. Looking for thoughts / ideas that I have maybe missed, attached is a DMESG from the 3950x showing most of the issues on the new ASUS PRIME MB and also this is the error I've been still getting with the new processor. Thank you in advance! _EDIT ADDED SYSLOG FOR 5950X_

Code:
Dec 04 09:34:16 vmhost rasdaemon[6869]: rasdaemon: mce_record store: 0x55b26aabc488
Dec 04 09:34:16 vmhost rasdaemon[6869]: rasdaemon: register inserted at db
Dec 04 09:34:16 vmhost rasdaemon[6869]:            <...>-3393569 [000]     0.007939: mce_record:           2022-12-04 09:34:16 -0500 Unified Memory Controller (bank=18), status= 9c2040000000011b, Corrected error, no action required., mci=CECC, mca= DRAM ECC error.
Dec 04 09:34:16 vmhost rasdaemon[6869]:  Memory Error 'mem-tx: generic read, tx: generic, level: L3/generic', memory_channel=1,csrow=3, cpu_type= AMD Scalable MCA, cpu= 0, socketid= 0, misc= d01a001101000000, addr= c78fb7a00, synd= 40004000a801203, ipid= 9600150f00, mcgstatus=0, mcgcap= 11c, apicid= 0
Dec 04 09:34:16 vmhost kernel: mce: [Hardware Error]: Machine check events logged
 

Attachments

Last edited:
Hi Mira, thank you for the response. Per the logs, I'm already on 5.19. I actually found an option in my BIOS to enable ECC (even though I'm running ECC memory) and several hours later, not seeing any errors. I think we can mark this solved.
EDIT I take this back, just had a crash (it seems to be because of me trying to do a backup onto an external USB HDD which is brand new). This also has been happening where the USB just "dies" out of no where then takes out the whole system. Log attached.
 

Attachments

Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!