Proxmox 7.3 reboot randomly / Kernel panic

Hardware is Ryzen 7 2700x on a x470 board. Proxmox 7.3. I have tried updating the kernel to 6.1 and 6.2 with no improvement
It ran fine for me on the same hardware for years. Can you test the memory and maybe replace some other hardware parts to test? Does an Ubuntu Live DVD run fine?
 
It ran fine for me on the same hardware for years. Can you test the memory and maybe replace some other hardware parts to test? Does an Ubuntu Live DVD run fine?

Memtest did not report errors.

I may have found the issue. I removed "amd_iommu=on iommu=pt" from the kernel command line, and the server has been up and stable for several hours now. No cpu lockups or kernel panic messages. So at least, I have a starting point for troubleshooting.
 
I may have found the issue. I removed "amd_iommu=on iommu=pt" from the kernel command line, and the server has been up and stable for several hours now. No cpu lockups or kernel panic messages. So at least, I have a starting point for troubleshooting.
Adding or removing amd_iommu=on does nothing because it is on by default. Therefore, it must be iommu=pt. That's strange because it only set the identity mapping for devices that are not passed though, and is usually used as a work-around when devices can't handle regular IOMMU mappings. Good thing is that you probably don't need iommu=pt anyway.
 
This whole thread emanates my last two weeks.

Ryzen 5 3600
Gigabyte B450 Auros Pro
64GB Ram
1x 2T Nvme
2x 8TB zfs mirror

Random lockups on everything.

Latest PVE software, upgraded with every version since 6.2
I’m considering redoing the whole lot…
 
Well… it was better for a minute, but not for long. The errors started with AMD-Vi IO_PAGE_FAULT on the Ethernet controller and then progressed to kernel panics and hard lockups.

In an attempt to fix “something”, I flashed my BIOS, killing the system. Either the mobo is bricked or just no longer compatible with the processor. A good excuse as any to buy new hardware :)
 
Well… it was better for a minute, but not for long. The errors started with AMD-Vi IO_PAGE_FAULT on the Ethernet controller and then progressed to kernel panics and hard lockups.

In an attempt to fix “something”, I flashed my BIOS, killing the system. Either the mobo is bricked or just no longer compatible with the processor. A good excuse as any to buy new hardware :)
Wow sorry to hear, that is not good news :(
But every cloud has a silver lining.

I'm downgrading my BIOS when I get back home.
My colleague has exact same setup with rock solid stability I had!
 
Good evening, I'm coming back to this post. After long discussions with my server provider (OVH), I have obtained a new dedicated server without any problems. I think I'll be able to close the subject at the end of the week.

Have a nice evening
 
For what it's worth, it came back on my Proxmox 8 node after activating Powertop. Not sure if was the --autotune or --configre flag, but issues resolved after I ditched Powertop!


Switched to s-tui because all I wanted from Powertop was some insight in powerusage en cpu freqs s-tui does this much better.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!