PVE upgrade crashed, now cannot upgrade anymore

ipreferpie

New Member
Feb 14, 2025
7
0
1
Hi,

I was in the midst of running the latest upgrade that included the most recent proxmox headers (6.8.12-9-pve) when a backup I was running caused a crash on a pve node. Upon restarting, in booted back into 6.8.12-8 successfully. But then when I tried to redo the upgrade process, it required me to run dpkg —configure -a which was successful. I then tried to run the upgrade again via GUI (apt-get dist-upgrade) which led it getting stuck at “unpacking proxmox-headers-6.8.12-9-pve”. I let it run for 1-2 hours but still no signs of progress. On similar nodes, I ran the same upgrade which took only 3-7mins. I also tried “apt-get remove” and “apt-get purge” after which the system displayed:

dpkg: error processing package proxmox-headers-6.8.12-9-pve (--remove):
package is in a very bad inconsistent state; you should
reinstall it before attempting a removal
dpkg: too many errors, stopping
Errors were encountered while processing:
proxmox-headers-6.8.12-9-pve

So I tried “apt-get —reinstall install” which gets me back to “unpacking proxmox-headers-6.8.12-9-pve“. So I’m totally stuck. Any solutions would be very helpful! Do I need to wait until version 6.8.12-10-pve comes out before I can even attempt upgrading? Many thanks!
 
Last edited:
Check to see if your boot/OS disk is dying. What make/model disk is it?

Proper solution is to reinstall proxmox (which will wipe the target disk!) and restore LXC/VM from backup; but you may need to replace the OS disk.
 
Try cleaning your downloaded and cached packages first by issuing apt clean. Then refresh the package list, just in case: apt update. Next download the offending package without actually installing it apt --download-only install proxmox-headers-6.8.12-9-pve. You should now have a pristine copy of the *.deb file in /var/cache/apt. Try installing it over the broken files with dpkg -i /var/cache/apt/archives/proxmox-headers-6.8.12-9-pve_6.8.12-9_amd64.deb. Since we are side-stepping apt, we have a little more control and we hopefully will get good error messages.

Presumably, this won't work on the first try, but there should be more clues now. Look at the error and see if it does a better job explaining what's wrong. It might just be that you have to pass one of the various --force-*** options to dpkg, or it could be that one of the scripts in /var/lib/dpkg/info/proxmox-headers-6.8.12-9-pve.* is horribly confused by the odd state that your system finds itself in after being interrupted so rudely. From the partial error message that you quoted, I suspect the *.prerm or *.postrm scripts failing unexpectedly.

If that's the root cause of the problem, you can often appease the scripts by temporarily editing them. Even just adding a exit 0 at the beginning of the script can sometimes do the trick. This is obviously quite invasive and not something you should do lightly. Ideally, you should first try to understand what the script attempts to do, and why it is failing to do so. By bypassing the entire script, you obviously haven't fixed things; you just silenced the error message. But that might just be good enough to then reinstall the same package on-top of the remnants of the old one. And that should fix things. Recovering from a crash in the middle of an upgrade is generally doable, but it also is something where you are a bit on your own. This is not something that should ever happen, so it's rather unpredictable what broken state you find yourself in. A bit of debugging is required on your part.

After you make it past this step, dpkg --configure -a , apt-get install -f, and apt dist-upgrade, usually brings the rest of the system back to a normal state.

Good luck and report back on what error messages you see. If we get more details from you, we might be able to make better suggestions. This isn't really something specific to Proxmox VE and it is instead more an issue related to Linux administration. At least, in the case of Proxmox VE, you always have the option of reinstalling the node and restoring its state from backups (which you hopefully have and know how to use). That's an improvement over regular Linux systems that often have a lot of local state which is difficult to restore. But before embarking on this "nuclear option", see if you can't fix your system with the suggestions that I made. It might get you back up and running in a shorter amount of time.
 
  • Like
Reactions: gfngfn256
Thank you both for guiding me on how to solve this! After using tmux and letting the dist-upgrade run overnight, it managed to complete after dkpg — configure -a. I feel quite dumb not letting it run for over 2hrs now. Funny thing is that the nodes are quite homogenous so while node1 & 3 took only ~15min, node2 took a whole night. SMART status looks ok, but will monitor the SSDs closely. It’s running on ZFS mirror so hopefully there should be more resilience from failure.
 
It shouldn't really take two hours. But who knows what's going on. If the hardware looks OK, then it's really anyone's guess at this point. So many things that could have been subtly wrong and eventually got sorted out somehow.

Just make sure to keep backups up-to-date and double-check that you know how to restore backups. If after all, the hardware turns out to be the culprit then good backups will make your life so much easier.
 
It shouldn't really take two hours. But who knows what's going on. If the hardware looks OK, then it's really anyone's guess at this point. So many things that could have been subtly wrong and eventually got sorted out somehow.

Just make sure to keep backups up-to-date and double-check that you know how to restore backups. If after all, the hardware turns out to be the culprit then good backups will make your life so much easier.
Thanks for the pointers and totally agree. I’m been backing up the /etc directory in case anything fails. I might just move the ZFS mirror drive by drive to new NVMe drives away from the old SSDs just in case. It’s too bad PBS doesn’t have a host backup function to make things easier.
 
"backing up" the host is pretty easy. proxmox-backup-client does a pretty good job at that with only minimal shell-scripting. It the "restoring" part that is difficult. I have successfully restored with proxmox-backup-client, but it requires quite a thorough understanding of Linux. There are just too many different configurations and it's really difficult to come up with a universal solution that will always restore your system. And nothing is worse than a false sense of security that gets shattered when you realize that your particular set of backups can't be restored.

So, I have a lot of sympathy for the Proxmox developers not wanting to offer a backup/restore solution for the host. That's really difficult to get right. And considering that Proxmox VE is designed so that the state of the host shouldn't be important, it makes sense to encourage people to always re-install from fresh installation media.
 
"backing up" the host is pretty easy. proxmox-backup-client does a pretty good job at that with only minimal shell-scripting. It the "restoring" part that is difficult. I have successfully restored with proxmox-backup-client, but it requires quite a thorough understanding of Linux. There are just too many different configurations and it's really difficult to come up with a universal solution that will always restore your system. And nothing is worse than a false sense of security that gets shattered when you realize that your particular set of backups can't be restored.

So, I have a lot of sympathy for the Proxmox developers not wanting to offer a backup/restore solution for the host. That's really difficult to get right. And considering that Proxmox VE is designed so that the state of the host shouldn't be important, it makes sense to encourage people to always re-install from fresh installation media.
Oh yes, that makes a lot of sense and thx for the pointers on using promox-backup-client…didn’t consider that! I guess I should practice host restoring in the case of a mode failure to get used to it