[SOLVED] Upgrade HA without VM Downtime?

proxlion

New Member
Oct 24, 2022
5
1
3
Hello,

I can not find any Ressources how to upgrade a Proxmox HA Cluster.
Of course with the requirement - no VM donwtime.

As I understand it, there can no VM handover between nodes in different Proxmox Versions? Is this still the case?
So migrating all VMS to node 1 - upgrade node 2 - migrate all to node 2 - upgrade Nodes 1 and 3... seems not to be an option?

Any location where I can read about upgrading versions? All Threads which I can find are from 2011 or older about this, I wonder why this is not even asked by someone? Do I miss out something and this is just "working automatically" without worry about?
 
Last edited:
D
As I understand it, there can no VM handover between nodes in different Proxmox Versions? Is this still the case?
Depends on the jump from which version to which. In general, you can also upgrade hardware in the same step if you have e.g. a shared storage underneeth. You are very, very dynamic with PVE (and Linux in general).

So migrating all VMS to node 1 - upgrade node 2 - migrate all to node 2 - upgrade Nodes 1 and 3... seems not to be an option?
That will work (within bounds), yet why don't you want any downtime? I don't get that. Are you not installing updates that require a reboot? Instead of a reboot, you just stop the HA machine an it'll get automatically started again (with a new QEMU version and more features).
 
yet why don't you want any downtime?
Thats the whole point of HA ;).
All Services like Webshops, Websites, App-Backends, Mail, Loadbalancers, ... must be as close to 100% uptime as possible.

Customers are not happy when you sit in your Car and Navigation is not working ^^.
Or when you try to do your credit card payment and get a 404 Server Error.
Or when your wife hit the switch in the kitchen and light dont turn on...
Its from mayor issue to minor inconvenience...


But more important:
Thank you for your reply.
I tested that VM transfer seems to work between versions (e.g. 7.2 to 7.3) even when the documentation say it dont. Maybe its just not garanteed and we need to test it beforehand between each version?
 
Last edited:
  • Like
Reactions: mehdiykt
If your system requires full HA with 0 downtime, this is a service level architectual decision, not something you should have at the VM level. Proper database clusters, filesystems, multiple front/back ends, etc. Failure to do so is a failure of design.

That being said. You can migrate between most point releases. It's usually 6.x to 7.x and such that are the issue.
 
This "etc." is the point.
Examples: You can have the lobby and control server of an online game in redundant architecture. But each running game instance is a process on one single node/server/VM. If this VM goes down, all Players of this shared loose the connection. And depending on the game the complete session. Think about a gameserver for example Counterstrike or minecraft.
Or on applications like Home assist or OpenHab.

And on much software - like counterstrike game servers - you have no influence on the programming.
But when your competition is interrupted with such an event - thats not "good".
Or when your HomeAssist is offline for a few minutes - your solar system can not communicate with your wallbox how to charge your car.

So you are right:
As a programmer / operator this should be done better.
In critical Infrastrukturen - it of course should be done better.
But when you only run software from third party vendors like the examples above, you have no choice to do it better and need to keep the VM/Server Running :).

So back to the Question:
Is there a list, which versions can migrate and which not?
Or do everyone need to test this on their own?
 
Last edited:
As I understand it, there can no VM handover between nodes in different Proxmox Versions? Is this still the case?
If you mean live-migration with "handover", then it depends from which version to which you want to upgrade.

For minor upgrades (e.g., any 7.X.Y to any newer 7.X.Y) live-migration is pretty much guaranteed.
If HA is used, simply set the "migrate" shutdown policy (Datacenter -> HA -> Options) and VMs will be auto live-migrated on not shutdown/reboot.

For major upgrades, like 6 to 7 it's naturally a bigger jump and should be more tested on each specific setup, but we try hard to have live-migration of VMs work there too.

In general, the only limitation for live-migration are 1) live-migration must work in general (i.e., no pass-through of physical hardware that cannot be migrated, or between a wildly different CPUs (e.g., different vendors)) and 2) that the target must have an equal or newer version running, in terms of Proxmox VE and QEMU (kernel is a bit more likely to be compatible if target is older than source, but I'd not bet on that).
Is there a list, which versions can migrate and which not?
See above, but in general: older -> equal or Newer.
The last actual break of that was Proxmox VE 3.4 to 4.x about 8 years ago, due to the cluster stack switching to a newer incompatible wire-protocol (and one could make it work there too, just wasn't straight forward).
Or do everyone need to test this on their own?
Testing if it works in general makes sense before going into production, just like testing things to avoid surprises makes sense in general, but especially for homogenous hardware, no pass-through and if the target isn't older than source it's quite unlikely to not work.

But it helps if you ask a specific question about your current situation, this isn't something one can make a general yes/no or list of (future) versions answer that applies 100% to everybody.
 
Last edited:
Great Answer, thank you.
Exactly what I meant.

But it helps if you ask a specific question about your current situation, this isn't something one can make a general yes/no or list of (future) versions answer that applies 100% to everybody.
Current specific situation is 7.2-3 to latest (think its 7.4-3) in homogenous x86 AMD based cluster.
 
Last edited:
Current specific situation is 7.2-3 to latest (think its 7.4-3) in homogenous x86 AMD based cluster.
Not wanna curse it, but I'd be surprised if any just somewhat normal VM would fail live-migration there.

FWIW, for HA and maintenance I recommend at least skimming over the respective docs:
https://pve.proxmox.com/pve-docs/chapter-ha-manager.html#_node_maintenance

Also, in general we recommend upgrading node on by one and checking if all is still working OK before continuing with the next one.
If there's no kernel upgrade pulled in you normally don't even have to reboot, and if you have to reboot you could migrate the VMs to another node and repeat.
 
  • Like
Reactions: proxlion
Thats the whole point of HA ;).
Yes, on a service level, not on a VM level. That was decades ago.

If your system requires full HA with 0 downtime, this is a service level architectual decision, not something you should have at the VM level. Proper database clusters, filesystems, multiple front/back ends, etc. FailureN to do so is a failure of design.
Nicely said and that's what I would also have said.

Or when your HomeAssist is offline for a few minutes - your solar system can not communicate with your wallbox how to charge your car.
Yes, I can relate.... don't do that in HA. I just thought about the same problem today ... HA - despite the abbreviation - has nothing to do with high availability. You need critical systems to be run in a k8s cluster nowadays. Preferably inside of a bunch of PIs for home use and have everything wired and not via any kind of radio interface (bluetooth, zigbee or wifi) in order to be able to control everything depise a multitude of failure possibilities and less attack surface.
 
We've drifted, and I believe the question was answered, but to add on:

This is a discussion of redundancy vs resiliency.

HA has, and always will be, designed for redundancy. The ability for the VM to move, to be avialable, etc. However, it has nothing to do with the service level resiliency. The ability to have 0 downtime, to spread the service across multiple nodes, to cluster, to scale/descale transparently, etc is all part of resilience of a service.
 
  • Like
Reactions: proxlion

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!