Memory upgrade causing boot issues

hhamama

New Member
Apr 23, 2023
9
0
1
First thing I should mention is that I had a power outage a few days ago. When I rebooted, I didn't notice any issues.

I'm trying to add more memory to the server and it seems to be causing kernel panics or issues with ZFS. So when I added the extra sticks of ram, it caused a kernel panic a few times. I then shuffled the ram around. I took the new sticks and placed them in the slots used by the old ram. It booted to the CLI just fine, but once one of my VMs powered up, it froze and crashed. When I restarted, the system was stuck trying to import ZFS pools by device for a good 25 minutes. I forced a restart, then swapped the memory back out and then server booted like normal.

Specs are as follows:
AMD 5920x
Corsair Vengeance LPX 64GB (2x32GB) DDR4 3600(PC4-28800) C181.35V (I purchased another pack of this to add to the server)
Asus x570 prime
Samsung 970 Evo 1tb
nvidia 1080 ti founders
4x WD red pro 12tb
hifiber BCM57810S
iocrest SY-PEX24086

I've also attached a copy of the kernel log file in case there's something there that could shed light on this issue
 

Attachments

Check the memory for errors with memtest86. If there are errors, then your Proxmox installation might already be corrupted.
That's another thing I forgot to mention. When I tried loading memtest86, all I would see was a black screen, then the system would reboot again. Is there a way to repair the installation without having to reinstall from scratch? If not, how do I migrate my VM's and zfs pools? I have a spare drive to move all the VMs onto, but I'm concerned about the zfs pool.
 
That's another thing I forgot to mention. When I tried loading memtest86, all I would see was a black screen, then the system would reboot again.
Modern CPUs do several automatic reboots when you change the memory (or the BIOS settings are wiped).
Is there a way to repair the installation without having to reinstall from scratch? If not, how do I migrate my VM's and zfs pools? I have a spare drive to move all the VMs onto, but I'm concerned about the zfs pool.
Restoring from known good backups? Note that your Proxmox and VMs and ZFS pools might already contain silent corruption IF the memory has errors...
 
Modern CPUs do several automatic reboots when you change the memory (or the BIOS settings are wiped).

Restoring from known good backups? Note that your Proxmox and VMs and ZFS pools might already contain silent corruption IF the memory has errors...
I honestly never backed up. I don't have anything on it that I'm worried about losing, I just don't want to go through the hassle of reinstalling everything from scratch. The web interface is showing zero errors in all 5 drives though.
 
I honestly never backed up. I don't have anything on it that I'm worried about losing, I just don't want to go through the hassle of reinstalling everything from scratch. The web interface is showing zero errors in all 5 drives though.
If you don't care, just do a scrub of your zpools to check for errors. If you don't notice anything wrong then you don't have to do anything.
 
If you don't care, just do a scrub of your zpools to check for errors. If you don't notice anything wrong then you don't have to do anything.
Ok. So if there are no errors, can I migrate the larger ZFS pool without issue? Currently, VM storage is split. Main boot drive for VMs is on the same SSD that proxmox boots off of, while additional storage gets assigned from the ZFS pool (4x WD red 12 TB)
 
If you don't care, just do a scrub of your zpools to check for errors. If you don't notice anything wrong then you don't have to do anything.
Current Proxmox install is version 7.4. I opted to reinstall from scratch after backing up the existing VMs. First I tried 7.4, then tried 8.0 and both were crashing with the additional ram. Ran memtest86 from a bootable USB and all 128gb was showing zero issues. I did one last google search before throwing my hands up and opting to build an entirely new system, then found this thread: https://forum.proxmox.com/threads/o...r-proxmox-ve-7-x-available.115090/post-507337

Relevant comment:
Ryzen and Threadripper processors have CSTATE 6 that needs to be disabled, when running Linux. Look into your BIOS/UEFI settings and find something similar like: "Power idle control" and set it to "Typical current idle" (or normal, high, something that is not an equivalent of low). You may find this settings somewhere in Power, Misc, or even in CPU settings. Different brands might name it a little different. Search how to disable CSTATE 6 for your mainboard.
Your crashes will disappear. Known issue since Kernel 5.10 or even earlier.

For Asus motherboards, I found this setting under advanced ---> CBS (common BIOS settings). Typical current idle was one of the options as the OP said.
 
Edit: nvm. Still crashing, just not as frequently and not sure why. I was uploading ISO files and it crashed. Added firewall rules and it crashed. Not sure why
 
That's another thing I forgot to mention. When I tried loading memtest86, all I would see was a black screen, then the system would reboot again.
Which one? memtest86 or memtest86+?

For all these tests, you have to understand that you only can rely on found errors indicating a problem, not the absence of errors indicating all is well. There is so much interaction of system components that some errors will only appear in combinations that aren't covered by any tests.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!