Upgrading Cluster / VM Migration between different versions

Kjeld

New Member
Jul 30, 2014
3
0
1
Hello everyone,

The company I work for is currently moving it's Proxmox nodes (physically) to a new location. I have been appointed to plan and perform this task.
I have however run into some problems.

Our nodes:
- Proxmox Cluster #1
• Proxmox N4 (v3.2-4)
• Proxmox N5 (v3.1-21)
• Proxmox N6 (v2.3-13)
• Proxmox N7 (v3.0-23)
• Proxmox N8 (v2.3-13)

As you can see we have some version mismatches scattered accross our nodes that piled up over the past 3-4 years. (Starting company, growing fast)
Up untill recently our Proxmox and other hardware was housed in a single rack. With the continuous growth we are expanding and are moving the servers to a Private Room with a couple of rack.
Normally I'd just migrate (LIVE) my VMs to a low-load node, physically move the original host and upgrade it. Next I would migrate (LIVE) the VMs back to their original locations; rinse-repeat for all nodes.

I can however not always use live migration from certain nodes to another.
In my experiments I have concluded live migration is possible from:
+ 3.1-21 to 3.2.4
+ 3.1-21 or lower to 3.0-23
+ Matching versions

However:
- Never from 3.0-23 to anything
- Only from 3.2-4 to 3.1-21 if, and only if that specific VM was migrated online from a 3.1-21 node in the past. (And not any other node, and it must have been online migration)

---
False information lead me to believe I did not need to reboot my server before new features / kernel was enabled. I have removed the corresponding paragraph / source.

Question: When I upgrade my Proxmox Node (no reboot) can I (1) immedeately use the new features and (2) perform live migration to my other nodes? Or must I reboot first?
Answer: No, the node must be rebooted before any change in functionality and compatibility occurs.

---
Question: Can I in anyway do live migration other than from / to the aforementioned versions in my cluster? Or more simply, from 2.x / 3.0 to 3.2.
After a lot of research and some talks on #IRC I believe that this is indeed not possible. But if anyone has ever succeeded, I would happily learn how.
In particular if I would be able to move VMs off Proxmox N7 to any other version , then I would be able to follow the regular upgrade process (offload, upgrade, migrate back).

An idea I had was whether or not it was possible to use qemu/kvm commands to do the migration for me. Basically whether Proxmox prevents me from performing the live migration because it thinks (read: knows) things might go horribly wrong; whereas maybe (?) the live migration is possible when entering the commands manually on the server; with all consequences that come with it offcourse. (Hopefully none?)

Best regards,
Kjell Teirlynck
www.layer7.be
 
Last edited:
All this info below are from experience.

You NEED Reboot after major upgrade. Period! This is short answer in simple form. :)

Upgrading Linux and not reboot will have unexpected consequences. As you already know not all upgrades requires reboot. But any new features that requires reboot will not be activated at all till you do so. From the information you provided I would suggest you to proceed with extreme caution. The logical thing would be dont move anything at all till you bring all nodes to same version. Will save you days of headache. With major version difference when you reboot your nodes, there may be out of quorum situation between nodes. Thats the last thing you want. Migrate VMs to one step higher version nodes, and upgrade/reboot the older nodes and work your way up.

If your environment must be up at all time and cannot have any downtime, this is how i would do.
Assuming your new room is in the same floor and close by. Prepare the new rack with power, network cable/switch and have it ready to accept nodes. On old rack migrate all VMs from node Proxmox N8 to any other node. (proxmox N7/N6/N5). Shutdown Proxmox N8, put it in new rack in new room. Before you power it up run a long network cable from switch in old rack to the switch in new rack. Power up the node in new rack. It should boot up without quorum issue since it can still talk to the cluster in the old room. Migrate all VMs from Proxmox N7 to other nodes even to N8, then power down. Move it to new rack in new room. Repopulate with VMs. Do the same for rest of the Nodes. This is way nobody will notice your entire cluster relocation took place. You got the picture.

If minimal downtime possible, its just matter of shutting down everything and moving, "only" after all nodes are upgraded.
 
I did however found a source that claims I can upgrade my Proxmox Node without rebooting. (http://forum.proxmox.com/threads/16688-Proxmox-2-3-Cluster-to-3-1-Upgrade)

...I can't find anywhere in that thread where someone tells this...maybe it's me, or is another thread?

anyway if you can afford a few seconds downtime for each vm, it's really possible (with disks on shared storage, offline migration is really quick, it's just a matter to move .conf files from one node to another and restart the vm, can be done also manually), the slow part being press "stop" (maybe shutdown before, old node) and "start" (new node, and of course the booting time).

Marco
 
Thank you both for the reply, I have updated the original post aswell for future readers.

Upgrading Linux and not reboot will have unexpected consequences. As you already know not all upgrades requires reboot. But any new features that requires reboot will not be activated at all till you do so.
Am I then correct to conclude that untill I reboot (for example, N7) I will see no changes in migration behaviour, ergo still not be able to migrate (LIVE) from a 3.0-x node to any other version?

I have already made peace with the fact most VMs will have to be shutdown and migrated. And as pointed out by Marco, prospected downtime of offline migration would be somehwere between a couple of seconds to two minutes tops. (Depending on VM). I will be designing the migration plan accordingly accordingly.

The logical thing would be dont move anything at all till you bring all nodes to same version. Will save you days of headache. With major version difference when you reboot your nodes, there may be out of quorum situation between nodes. Thats the last thing you want. Migrate VMs to one step higher version nodes, and upgrade/reboot the older nodes and work your way up.
In our current setup we have nodes ranging from 2.3 to 3.2 and this appears to have worked. I will be upgrading all nodes to 3.2, do you reckon my quorum will be out of sync, or am I safe to assume that "it works now, it will work then"?
Does this way of working not also add to downtime as you now have to calculate the upgrade time in?

If your environment must be up...
The method described was very helpful and what I prefer. But in order to do everything live I need to be able to migrate from a 3.0 node to a higher version. All VMs on version 2.3 can be migrated to N5 (3.1) and N7 (3.0), and from N5 to N7 afterwards. But from that point onwards I am forced to do offline migration (No live migration from N7 to any node possible) if I want to bump my VMs to a node with a higher version.

Related to the above mentioned, do you think it is possible to do live migrate utilizing qemu/kvm commands rather than attempting to do so from the WebUI?

I don't think it is possible and it's probably my last idea on the whole "live migration"-matter. But other software I use sometimes only prevents you from doing things from the UI because it's potentially dangers / destructive, I am however unsure if that is the case with Proxmox.

Edit: I realise that when a migration fails the commands is also displayed on screen, and therefore I am currently assuming that my suggestion is also not possible.


Best regards,
Kjell Teirlynck
 
Rebooting is recommended (for kernel update), as for example, the kvm kernel module can be updated to support new features (like hyper-v for example).
If proxmox use a new kvm feature and the kernel is not updated, the vm will not start or you can have bluescreen or strange errors.

(You don't need to reboot only if no kernel update, or if kernel have only minor update)

in general, it's always possible to migration from any lower qemu version -> upper qemu version. (just do a test with sample vm to be sure that it's work).


Simple upgrade procedure:

- Have an empty server node updated to last proxmox version/kernel.
- live migrate all vms from older node to new node.
- then upgrade the older node to last proxmox version
... and so on...
 
Possibly related to this thread and may be of use to someone:

I had a cluster of 3 nodes, two running 7.0, one running 7.3 (upgraded at different times). We wanted to upgrade all 3 to 7.4-16 preparatory to moving to v8. There was a major VM on the 7.3 node we didn't want to take risks with, so tried to migrate it to one of the 7.0. Live migration proved to be no-go. However it was replicated between all nodes, so instead of a live migration, I shut it down and then was able to migrate it, and fire it up on one of the 7.0 nodes. I then upgraded its original node to 7.4-16 and rebooted it. It was then possible to live-migrate the VM back from 7.0 to 7.4 without any downtime.

However! The VM was then rebooted by accident, and the console came back with "no boot media found". Proved to be completely dead.

Fortunately as it had been replicating, I was able to recover it by moving the .conf file in /etc/pve/qemu-servers to one of the older nodes now newly upgraded to 7.4-16 with a brand new boot disk after its original failed on reboot after upgrade, where the VM booted normally. We've not yet tried moving it back to the node it usually lives on, that will come later.

But basically our findings were:
Off-line migrate 7.3 --> 7.0, worked fine, VM booted.
Live migrate back from 7.0 to 7.4, appeared to work, but VM then turned out to be unbootable.

As we now have multiple other VMs we need to move from 7.0 to 7.4 to enable the last 7.0 host to be upgraded, will be testing that each one is bootable after the migration, just in case.

Fortunately we have replication and comprehensive backups - I absolutely love the Proxmox backup system - so we are protected if the worst happens and a VM proves unrecoverable or unbootable.

Minor question: If live-migrating between versions, in particular from a lower version to a higher one, does it matter which host the migration is driven from? Should I use the v7.0 web interface to "push" VMs to the 7.4 box, rather than "pulling" them from the newer version? Am guessing not but not seen this asked anywhere so thought worth asking.
 
Hi,
Possibly related to this thread and may be of use to someone:

I had a cluster of 3 nodes, two running 7.0, one running 7.3 (upgraded at different times). We wanted to upgrade all 3 to 7.4-16 preparatory to moving to v8. There was a major VM on the 7.3 node we didn't want to take risks with, so tried to migrate it to one of the 7.0. Live migration proved to be no-go. However it was replicated between all nodes, so instead of a live migration, I shut it down and then was able to migrate it, and fire it up on one of the 7.0 nodes. I then upgraded its original node to 7.4-16 and rebooted it. It was then possible to live-migrate the VM back from 7.0 to 7.4 without any downtime.

However! The VM was then rebooted by accident, and the console came back with "no boot media found". Proved to be completely dead.
Are you using the SATA controller by chance? If yes, there was a long-standing rare issue that could cause the MBR to be lost, which will be fixed in an upcoming version of QEMU: https://git.proxmox.com/?p=pve-qemu.git;a=commit;h=816077299c92b2e20b692548c7ec40c9759963cf
Fortunately as it had been replicating, I was able to recover it by moving the .conf file in /etc/pve/qemu-servers to one of the older nodes now newly upgraded to 7.4-16 with a brand new boot disk after its original failed on reboot after upgrade, where the VM booted normally. We've not yet tried moving it back to the node it usually lives on, that will come later.

But basically our findings were:
Off-line migrate 7.3 --> 7.0, worked fine, VM booted.
Live migrate back from 7.0 to 7.4, appeared to work, but VM then turned out to be unbootable.

As we now have multiple other VMs we need to move from 7.0 to 7.4 to enable the last 7.0 host to be upgraded, will be testing that each one is bootable after the migration, just in case.

Fortunately we have replication and comprehensive backups - I absolutely love the Proxmox backup system - so we are protected if the worst happens and a VM proves unrecoverable or unbootable.
Minor question: If live-migrating between versions, in particular from a lower version to a higher one, does it matter which host the migration is driven from? Should I use the v7.0 web interface to "push" VMs to the 7.4 box, rather than "pulling" them from the newer version? Am guessing not but not seen this asked anywhere so thought worth asking.
Forward is the way to go: Migration from an older version to a newer version is always supported. In the same minor version (e.g. 7.x to 7.0) we do try to also keep backwards migration working, but it's not guaranteed. And you should not migrate from a new major version to an older major version (e.g. 8.0 to 7.4). It might work if you are lucky, but that is not supported (in many cases this will fail straight away because of the running QEMU machine version).

So upgrade your nodes one by one, migrating from not-yet-upgraded to already-upgraded and then you can migrate back after upgrading the original node.