Create Test Scenario for PVE boot failure

mw88

New Member
Oct 8, 2022
3
0
1
I am working on setting up a homeserver, which is going to mainly serve as a fileserver plus run some virtual machines and containers for home automation, routing, media, etc..

For the files, I have planned and ready a raidz2 with four HDDs and I played around with restoring files and datasets, with loading the pool on different machines, so that I feel I can trust my data to this setup.

However, for the VMs, containers and main PVE config I am using a raidz2 boot pool consisting of four SSDs. In my reasoning, raidz2 is a suitably low risk of having to recover from failure of three disks at once. However, as I am a beginner, I always run the risk of misconfiguration. So I would like to create a test scenario, where I manually render my PVE unbootable and then restore the rpool to an earlier snapshot, before I finally trust the system with all my config and data.

I imagine, that running
Bash:
wipefs -a /dev/sdaX
would be sufficient to render disks unbootable, but what is the procedure to restore it to an earlier snapshot without PVE running?

Also, do you have any suggestions how I would/should test a bootloader failure?
 
I am working on setting up a homeserver, which is going to mainly serve as a fileserver plus run some virtual machines and containers for home automation, routing, media, etc..

For the files, I have planned and ready a raidz2 with four HDDs and I played around with restoring files and datasets, with loading the pool on different machines, so that I feel I can trust my data to this setup.

However, for the VMs, containers and main PVE config I am using a raidz2 boot pool consisting of four SSDs. In my reasoning, raidz2 is a suitably low risk of having to recover from failure of three disks at once. However, as I am a beginner, I always run the risk of misconfiguration. So I would like to create a test scenario, where I manually render my PVE unbootable and then restore the rpool to an earlier snapshot, before I finally trust the system with all my config and data.

I imagine, that running
Bash:
wipefs -a /dev/sdaX
would be sufficient to render disks unbootable, but what is the procedure to restore it to an earlier snapshot without PVE running?

Also, do you have any suggestions how I would/should test a bootloader failure?
In this case I would do the following:

* Install Proxmox VE in VM and configure it exactly as you plan it for your physical server
* Play then around by removing (virtual) disks etc and see the effect
* In order to be able to go back to a certain state of your experiments make snapshots
* As soon as you are safe and familiar how it works install your physical server accordingly

Btw: RAIDZ2 need a lot of performance, recommended to figure out if Storage access speed is sufficient.
 
In this case I would do the following:

* Install Proxmox VE in VM and configure it exactly as you plan it for your physical server
* Play then around by removing (virtual) disks etc and see the effect
* In order to be able to go back to a certain state of your experiments make snapshots
* As soon as you are safe and familiar how it works install your physical server accordingly

Btw: RAIDZ2 need a lot of performance, recommended to figure out if Storage access speed is sufficient.
Thank you for the suggestions, however I have concerns:

* Running Proxmox in a VM is something I did before deciding on actually deciding on proxmox about a year ago.
* I sucessfully understood how to resilver, and did that.
* For most configurations problems, going back to a VM state is fine, however I want to prepare for the case when Proxmox does not boot anymore. Resetting them VM is not something that I can do on a real system. And this is the main concern I have and want to test, hence my question.
* I have the physical server anyway, so why should I not test everything on there? Virtualization always introduces/reduces some aspects.

Storage access speed should be fine, I'm running this on a TR3960.
 
* For most configurations problems, going back to a VM state is fine, however I want to prepare for the case when Proxmox does not boot anymore. Resetting them VM is not something that I can do on a real system. And this is the main concern I have and want to test, hence my question.
Sure, therefore I suggested to proof the concept for your setup with a VM before in order to avoid a situation having the need to to "reset" the system; respectively: get familiar how to recover a "damaged" RAIDZ2 by try it out with a VM. In principle explained in https://docs.oracle.com/cd/E19253-01/819-5461/gcfhw/index.html , the difficulty is what to do when RAIDZ2 is necessary for boot: you have to boot the system with alternative media and repair it by using the alternative OS and make it bootable then. Once you get safety by practise it with a VM it will make sure that you succeed in case of a failure in bare metal system too.
 
Thank you, although I have been sceptical at first and didn't like yourt answer, I now realize this is a great way to test what I have had in mind.

My own research shows I should be able to perform the following steps:
  1. destroy the OS (PVE), then
  2. boot from a (virttual) flash drive and
  3. try a rollback
Are these the recommended steps? Then afterwards I can experiment on destroying the bootloader.

I will try it out during the upcoming weeks and most probably document my results here (for myself and anyone else interested)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!