How to recover from failed upgrade...

DanH

Member
Aug 20, 2022
13
1
8
Is there a simple way to get a system back to a previous version of proxmox, if the upgrade fails?

Just wonder, because I currently have some issues with upgrading from 8.2.2 to 8.3...

I'm only thinking about the proxmox ve host, not the VM's.

I could run a script that backups the important files (/etc/network/*, /etc/pve/*, other?), reinstall from scratch with the 8.2.2 media, copy back the files, reboot. Correct?

If yes, does that backup script exist yet?

Thanks
Dan
 
I don't know of any existing script, I'm usually assuming that if I lose a host, I'll just install a new one, and bring it in as a fresh one..

But your list seems nice… Maybe things in /var especially if you care about stats and logs. And root's SSH key if you don't want to deploy a new one…

there was some discussion around here: https://forum.proxmox.com/threads/official-way-to-backup-proxmox-ve-itself.126469/
 
If you don't use your Proxmox host as a general-purpose Debian OS, and if you don't heavily modify it, upgrades are usually pretty straight-forward. If they break, it tends to be something simple that can be fixed without having to reinstall from scratch.

Can you give more details on what you did and what error you saw. This might still be something that you can recover from.
 
If you don't use your Proxmox host as a general-purpose Debian OS, and if you don't heavily modify it, upgrades are usually pretty straight-forward. If they break, it tends to be something simple that can be fixed without having to reinstall from scratch.

Can you give more details on what you did and what error you saw. This might still be something that you can recover from.
I already reinstalled 8.2.2 and then from scratch again 8.3. This is a very old server which I just reuse.
Actually I still have the issue, but it is most likely a hardware problem.

Still, this triggered me to think about recovery of a proxmox system. I tried to find resources or guides for that, but the most common answer is "just reinstall again".

This is, for my taste, not good enough.
 
I also asked myself the question about backups and restores of the host in case of hardware problems or following updates.
In virtualization systems it is not simple and in proxmox even more given the large amount of things it supports and the flexibility it allows, therefore it would be difficult and long to implement a good backup and restore of the host.
If I remember correctly the host backup has been in the roadmap since the first tests of proxmox that I did about ten years ago, and it still is, but it will probably never be done or if it is done it can risk causing more unforeseen events and waste of time in many cases rather than a clean installation and manual restore of the configurations (making all the appropriate checks).

In practice, you have to take into account any system differences, hardware resources, different configurations, customizations made, changes made to the VMs after the backup etc...
For example at the moment I perform a daily scheduled configuration backup, plus any additional manual executions when I make additions/changes to the system and the vms, I use a custom script that backups both the proxmox configurations and other custom things (nut, fail2ban, various monitoring tools and scripts etc).
From what I've seen when restoring to a clean system before restoring the proxmox configuration you have to be careful to first recreate the basic storage configurations (outside the proxmox ones and possibly adapt them), you also have to take into account any changes made after the last backup and manage them (reason why I do manual backups even after any changes to decrease this risk), because there could have been changes to the vm configurations and/or disks that would create problems.

I also plan to use btrfs snapshots in the future (I use btrfs for the root), but there are some improvements to be made first to be able to manage them well (for example better subvolume structure and ESP partition managing), and will still have to be careful because would still have the same problems with differences in the configuration (from changes made after the snapshot/backups) so it would be a good time saver only in certain cases, mainly for post/upgrade regressions, for when you see them immediately or almost immediately the issue, you can't find a workaround right away and you want to get back up and running as quickly as possible, and for when there are problems with the system disk/disks (and the rest is unchanged). I do not recommend trying it to those who do not have enough experience.

-------------------------------------------
important note, always before check for any hardware issues and manage them, as even with any other servers it can be counterproductive to do any restores right away (as can happen when they are made simple and fast) without checking and managing any hardware issues.
 
Last edited:
[...]
In practice, you have to take into account any system differences, hardware resources, different configurations, customizations made, changes made to the VMs after the backup etc...
[...]
From what I've seen when restoring to a clean system before restoring the proxmox configuration you have to be careful to first recreate the basic storage configurations (outside the proxmox ones and possibly adapt them), you also have to take into account any changes made after the last backup and manage them (reason why I do manual backups even after any changes to decrease this risk), because there could have been changes to the vm configurations and/or disks that would create problems.
[..]
important note, always before check for any hardware issues and manage them, as even with any other servers it can be counterproductive to do any restores right away (as can happen when they are made simple and fast) without checking and managing any hardware issues.

You bring up exactly the points I am concerned about. Thanks a lot for the write-up. I am totally in agreement with you. You nailed it.

When setting up a new system to replace an old, damaged one, it's crucial to consider storage mapping, naming, special configurations, and other hardware-dependent factors. My replacement system might have different NICs, resulting in different names (like the recent Broadcom issue). I might also have disks of different sizes or from different vendors. Does it make a difference? I'm not sure.

The fact is, over time, a replacement system will have different hardware. So, bringing back a damaged node with different hardware should be possible. But where are the traps? For example, if I use a cluster, will it easily reintegrate into that cluster? If the new system's disk layout is different, what about syncing VMs, etc.?

I wish there was a guide (not just a bunch of links to community articles with long discussions) to help recover a Proxmox system in various scenarios without having to become "The Guru in Proxmox Management."

Questions like the following might be addressed by such a guide (this list might become huge, but bear with me... I'm just a simple engineer who had to take over a role):
  • How can I get a Proxmox system up and running again if the following happens:
    • A Proxmox upgrade fails because the hardware unexpectedly rebooted during the process? This is what happened to me and triggered these questions.
    • The disk where Proxmox boots from fails.
    • The Proxmox system does not have network connectivity after an upgrade, even though the hardware hasn't changed.
  • How can I get that new system into a cluster, replacing the failed node?
  • What needs to be done to make the cluster work fine again?
  • Just a standard, here-is-all-you-need-to-recover-the-Proxmox-node guide...
    • For example, have a recovery boot stick ready and know how to use it.
  • What means of preparation are necessary to be ready if something happens?
  • What needs to be documented and in what detail?
    • Some very good engineers "do not need stinking documentation," I know... but what happens if that engineer leaves or just falls off a train?
  • What needs to be backed up and how?
    • There are lots of sources that list files and folders. Interestingly, not all of those sources list the same files/folders.
  • How do I get the backup back onto a new system?
    • Seems trivial, but consider name or disk mapping changes...
  • Who knows, one can wish... a Terraform import functionality for a Proxmox system to get the setup code for the existing system.
    • I am just dreaming.
  • Or someone wrote a Terraform/module/code/config which is simple to use, and I would use it for all new Proxmox systems (just, I have a bunch of old systems still having the same issue...)...
Probably I could answer all those (and similar) questions myself if I had the time to become the Proxmox specialist.

I was looking around a bit, starting with the link from @Gilou above, but then digging a bit deeper. Not with the goal to write something up myself (except that little script I've asked Copilot (or was it ChatGPT) to do for me), but to find something that can be used by someone like me too.

So far, I have found some interesting stuff. I still need to look at it in more detail, but if you have additional ideas and/or could look into the links below and give your feedback...

Is there more? Better?

Dan
 
Creating a complete guide for all cases (or even just most of them) is not a simple thing, as it is not creating a backup system (which requires much more time and is even more complicated).
Firstly to use and manage proxmox even if basically it could be used by anyone to be able to manage many things and solve problems (even some that could be simple) a minimum of knowledge is needed.

If you use it on a personal level or on a test level or for unimportant things where even if problems occur you are not able to manage them and there is no problem if you have long downtime and data loss then it is fine for anyone to use it, otherwise I would say that it should not be used unless you have at least a minimum of experience as a Linux system administrator.
Without it you even have difficulty looking for solutions, describing and providing useful data when asking for help (as I have seen in many cases on the forum) and even applying solutions that are provided correctly and well explained.

Then even if you have a good experience, you need a specific experience based on the things you use, and they can be very varied, but it is important to acquire experience on what you use to manage it better, then maybe you can also have experience on certain components but you will need to integrate it with specific parts of proxmox, for example I have about ten years of experience with a virtual kvm environment but managed with libvirt that proxmox does not use so beyond many basic parts in common (regarding qemu) on others I am still acquiring knowledge.
And that's just a small example, the parts can be many.

To recap it is good to have at least some basic experience (the more you have, the better) but then you need to acquire more, specifics of Proxmox and specifics of the components/technologies that are used and as I usually do (but I suppose/hope many do) try in practice everything that is used on test systems and also try the possible cases of problems, recoveries etc. to be quite prepared when they happen in production and also find the best ways to manage them.
 
Proxmox-backup-client is surprisingly powerful and really simplifies a lot of these disaster recovery issues. Backing up all the various bits of data is also not really all that difficult.

But I agree that recovery is very much a case-by-case situation. There are just too many possible scenarios that are often unique to just one particular user. I have had good luck restoring backups, and I am big fan of PBS. But I honestly have no idea how to offer a one-size-fits-all restoration guide let alone a tool that would do so automatically.

I would recommend practicing, if you have the opportunity to do so. You might be able to make a customized tool that works for your particular use case.

I have restored containers, Raspberry Pi systems, and ChromeOS/Crostini. All of those worked very easily. I have not had to restore a Proxmox host in anger just yet, but I imagine that with my past experience I shouldn't encounter too many issues.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!