A very remote PVE host of mine would go totally unresponsive sometimes only minutes after a restart and after zero luck at remote troubleshooting, I waited to travel to get hands on only to discover that I was just as confused troubleshooting locally...
after much cursing and testing I discovered a single bad address in ram and removed 2/4 dimms (tryin to retain DDR4 speeds) and attempted reboot to see what impact running with this bad address for who knows how long...
Everything seems to lock up in a fashion that does not allow for any sort of kernel debugging- attempted kdump and netconsole, but no luck.
Is there anyway to repair an pve installation?
also, before I ask that I should disclose I may have made things worse (as usual) by trying to see what things would look like if I installed pve9 on another flashdrive and attempted to import the zpool from the nvmedrives and I was quickly reminded that I don't know enough about zfs to make decisions like this...
Anyone have any informed suggestions?
thanks for reading.
after much cursing and testing I discovered a single bad address in ram and removed 2/4 dimms (tryin to retain DDR4 speeds) and attempted reboot to see what impact running with this bad address for who knows how long...
Everything seems to lock up in a fashion that does not allow for any sort of kernel debugging- attempted kdump and netconsole, but no luck.
Is there anyway to repair an pve installation?
also, before I ask that I should disclose I may have made things worse (as usual) by trying to see what things would look like if I installed pve9 on another flashdrive and attempted to import the zpool from the nvmedrives and I was quickly reminded that I don't know enough about zfs to make decisions like this...
Anyone have any informed suggestions?
thanks for reading.