tldr;
Does anyone make, has anyone a guide on making a "shadow boot disc" for their PVE install? A shadow boot disk is basically a external boot disk which has everything needed to boot, access, fix, repair, start, stop, debug, inspect, the real system. If you have a large enough media, and enough ram, like a 32Gb flash drive, you can just put the whole OS on it and make a "Live ISO" image of the running machine.
-----
So I perma-crashed my first PVE box on day one. I tested what would happen if I assigned the PCIe SATA controller (I have 3 controllers) to a VM.
Well, what happened, apparently, is that the PVE node itself crashed. Even when I plugged a monitor and keyboard into it, it was dead. When I used the power button it shutdown cleanly and powered back up cleanly until it launched that VM when it hung. It was interesting that it could shutdown and startup with console output, but as soon as that VM ran the PVE node was completely incognito to everything.
Switching DNS over to 8.8.8.8 for some searching and I couldn't quickly find a way to boot into a recovery or safe mode. No bootloader options I could see etc. I found a post on here suggesting using the installer, which I still had on a USB key, but using debug mode which will drop you to console at each stage.
Being new to zfs it was a little daunting at first, but I finally got the ROOT volume mounted and.... WTF?!? panic mode. /etc/pve empty.
I hunted high and low and ended up back here to find the /etc/pve is a virutal mount point materialised by the cluster manager database which lives in /var/lib/pve-cluster
So, I made a copy of that, edited the binary file and remove the hostpci: config line from the offending VM and rebooted.
I kinda figured I wouldn't get away with this, but I was a little desperate, even saying out loud... you are going to have to reinstall it and spend the evening trying to re-attach/restore the VMs. After having to go back and change the mountpoint on /rpool/ROOT again (oops),... Indeed "Database corrupt" on pve-cluster startup.
However. This got me to exactly where I needed to be in the first place, in a working, fully initialised shell with running base services. So I could put the backed up config.db back in place, start the cluster manager alone to materialise the /etc/pve folder and fix the VM config.
reboot
Back to normal. Phew.
Does anyone make, has anyone a guide on making a "shadow boot disc" for their PVE install? A shadow boot disk is basically a external boot disk which has everything needed to boot, access, fix, repair, start, stop, debug, inspect, the real system. If you have a large enough media, and enough ram, like a 32Gb flash drive, you can just put the whole OS on it and make a "Live ISO" image of the running machine.
-----
So I perma-crashed my first PVE box on day one. I tested what would happen if I assigned the PCIe SATA controller (I have 3 controllers) to a VM.
Well, what happened, apparently, is that the PVE node itself crashed. Even when I plugged a monitor and keyboard into it, it was dead. When I used the power button it shutdown cleanly and powered back up cleanly until it launched that VM when it hung. It was interesting that it could shutdown and startup with console output, but as soon as that VM ran the PVE node was completely incognito to everything.
Switching DNS over to 8.8.8.8 for some searching and I couldn't quickly find a way to boot into a recovery or safe mode. No bootloader options I could see etc. I found a post on here suggesting using the installer, which I still had on a USB key, but using debug mode which will drop you to console at each stage.
Being new to zfs it was a little daunting at first, but I finally got the ROOT volume mounted and.... WTF?!? panic mode. /etc/pve empty.
I hunted high and low and ended up back here to find the /etc/pve is a virutal mount point materialised by the cluster manager database which lives in /var/lib/pve-cluster
So, I made a copy of that, edited the binary file and remove the hostpci: config line from the offending VM and rebooted.
I kinda figured I wouldn't get away with this, but I was a little desperate, even saying out loud... you are going to have to reinstall it and spend the evening trying to re-attach/restore the VMs. After having to go back and change the mountpoint on /rpool/ROOT again (oops),... Indeed "Database corrupt" on pve-cluster startup.
However. This got me to exactly where I needed to be in the first place, in a working, fully initialised shell with running base services. So I could put the backed up config.db back in place, start the cluster manager alone to materialise the /etc/pve folder and fix the VM config.
reboot
Back to normal. Phew.