Proxmox 7.3 reboot randomly / Kernel panic

Hardware is Ryzen 7 2700x on a x470 board. Proxmox 7.3. I have tried updating the kernel to 6.1 and 6.2 with no improvement
It ran fine for me on the same hardware for years. Can you test the memory and maybe replace some other hardware parts to test? Does an Ubuntu Live DVD run fine?
 
It ran fine for me on the same hardware for years. Can you test the memory and maybe replace some other hardware parts to test? Does an Ubuntu Live DVD run fine?

Memtest did not report errors.

I may have found the issue. I removed "amd_iommu=on iommu=pt" from the kernel command line, and the server has been up and stable for several hours now. No cpu lockups or kernel panic messages. So at least, I have a starting point for troubleshooting.
 
I may have found the issue. I removed "amd_iommu=on iommu=pt" from the kernel command line, and the server has been up and stable for several hours now. No cpu lockups or kernel panic messages. So at least, I have a starting point for troubleshooting.
Adding or removing amd_iommu=on does nothing because it is on by default. Therefore, it must be iommu=pt. That's strange because it only set the identity mapping for devices that are not passed though, and is usually used as a work-around when devices can't handle regular IOMMU mappings. Good thing is that you probably don't need iommu=pt anyway.
 
This whole thread emanates my last two weeks.

Ryzen 5 3600
Gigabyte B450 Auros Pro
64GB Ram
1x 2T Nvme
2x 8TB zfs mirror

Random lockups on everything.

Latest PVE software, upgraded with every version since 6.2
I’m considering redoing the whole lot…
 
Well… it was better for a minute, but not for long. The errors started with AMD-Vi IO_PAGE_FAULT on the Ethernet controller and then progressed to kernel panics and hard lockups.

In an attempt to fix “something”, I flashed my BIOS, killing the system. Either the mobo is bricked or just no longer compatible with the processor. A good excuse as any to buy new hardware :)
 
Well… it was better for a minute, but not for long. The errors started with AMD-Vi IO_PAGE_FAULT on the Ethernet controller and then progressed to kernel panics and hard lockups.

In an attempt to fix “something”, I flashed my BIOS, killing the system. Either the mobo is bricked or just no longer compatible with the processor. A good excuse as any to buy new hardware :)
Wow sorry to hear, that is not good news :(
But every cloud has a silver lining.

I'm downgrading my BIOS when I get back home.
My colleague has exact same setup with rock solid stability I had!
 
Good evening, I'm coming back to this post. After long discussions with my server provider (OVH), I have obtained a new dedicated server without any problems. I think I'll be able to close the subject at the end of the week.

Have a nice evening
 
For what it's worth, it came back on my Proxmox 8 node after activating Powertop. Not sure if was the --autotune or --configre flag, but issues resolved after I ditched Powertop!


Switched to s-tui because all I wanted from Powertop was some insight in powerusage en cpu freqs s-tui does this much better.