A PVE v7 install gets corrupted to the point of hanging regularly (would like some input)

kernull

Active Member
Apr 11, 2022
47
4
28
A very remote PVE host of mine would go totally unresponsive sometimes only minutes after a restart and after zero luck at remote troubleshooting, I waited to travel to get hands on only to discover that I was just as confused troubleshooting locally...

after much cursing and testing I discovered a single bad address in ram and removed 2/4 dimms (tryin to retain DDR4 speeds) and attempted reboot to see what impact running with this bad address for who knows how long...

Everything seems to lock up in a fashion that does not allow for any sort of kernel debugging- attempted kdump and netconsole, but no luck.

Is there anyway to repair an pve installation?

also, before I ask that I should disclose I may have made things worse (as usual) by trying to see what things would look like if I installed pve9 on another flashdrive and attempted to import the zpool from the nvmedrives and I was quickly reminded that I don't know enough about zfs to make decisions like this...

Anyone have any informed suggestions?

thanks for reading.