Upgrade 4.1 cluster to 4.2

chrisalavoine

Well-Known Member
Sep 30, 2009
152
0
56
Hey all,

I am due to upgrade our main cluster over the next few days and wanted to gauge if there are likely to be any problems. I have 4 x Dell R630's with identical hardware running:

Code:
proxmox-ve: 4.1-34 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-5 (running version: 4.1-5/f910ef5c)
pve-kernel-4.2.6-1-pve: 4.2.6-34
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-30
qemu-server: 4.0-46
pve-firmware: 1.1-7
libpve-common-perl: 4.0-43
libpve-access-control: 4.0-11
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-21
pve-container: 1.0-37
pve-firewall: 2.0-15
pve-ha-manager: 1.0-18
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve3
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve7~jessie

I plan to live migrate VM's around to clear out each host in turn.

My main question is are you able to live-migrate between hosts running 4.1 and 4.2? As this will occur at some point in my upgrade plan.

Thanks,
Chris.
 
Hi
You can't live migrate between 4.1 and 4.2, as they will have different qemu versions. Offline migration will work though.
 
Hi
You can't live migrate between 4.1 and 4.2, as they will have different qemu versions. Offline migration will work though.

Huh? I've upgraded a couple of 4.1 clusters already to 4.2 with live migration without any problem. As far as I know only live migration from 3.x to 4.x isn't possible because of corosync 1.x to 2.x.

Only thing I've done is live migrate VM's to another node, upgrade and reboot the empty node, live migrate VM's back to the node. After that I did the same with the other nodes.
 
Huh? I've upgraded a couple of 4.1 clusters already to 4.2 with live migration without any problem. As far as I know only live migration from 3.x to 4.x isn't possible because of corosync 1.x to 2.x.

Only thing I've done is live migrate VM's to another node, upgrade and reboot the empty node, live migrate VM's back to the node. After that I did the same with the other nodes.

Hi Wosp,

That's what I figured.

I guess it will only refuse to live-migrate when I attempt it so no real danger involved with testing it...

c:)
 
Hi
You can't live migrate between 4.1 and 4.2, as they will have different qemu versions. Offline migration will work though.

Is this likely to continue with new versions? Previously I could clear virtual machines off a node, upgrade the node and migrate back with no downtime, it seems this is no longer possible.
If this is true then I am stuck at 4.1 for the forseeable future.
 
Hi,

I'm going to push ahead with this as there seems to be a general consensus that live migration will work. I'll post back if I hit any problems.

Thanks all.

Chris.
 
Is there a 4.1 to 4.2 upgrade instruction somewhere?
Has anyone successfully upgraded a 4.1 with GPU pass-through and confirm it still works in 4.2?

Thanks
FXD
 
This has not gone well...

Live migrated guests to another host and upgraded/rebooted the empty one. Then tried live migrating from 4.1 to 4.2 node, this failed. Only way I could get it work was to upgrade the 4.1 to 4.2 (with VM's still running) and then live migrations worked. At least until I hit a beast of a postgres VM with lots of RAM which took over an hour to migrate and then came up on the new host with tons of IO errors and a r/o filesystem.

Things went from bad to worse after that. Got lots of TOTEM corosync errors also multipath: failing path errors. Spent 24 hours troubleshooting and eventually found that the kernel on 4.2 (4.4.8-1) was causing problems with my hardware, not sure exactly but could be the Intel X520 10Gb NIC's. As soon as I reverted back to kernel 4.2.6-1 (used on Prox 4.1) everything started working nicely.

I'm pretty sure without the kernel issues the migrations would have worked but I need to get to the bottom of my hardware/kernel issues. These are all subscription hosts so I'll open an official ticket for this.

Thanks,
Chris.
 
.....found that the kernel on 4.2 (4.4.8-1) was causing problems with my hardware, not sure exactly but could be the Intel X520 10Gb NIC's. As soon as I reverted back to kernel 4.2.6-1 (used on Prox 4.1) everything started working nicely.

I'm also using the X520(-DA2) 10Gb NIC's in one of my clusters (on all 5 nodes) on PVE 4.2-5 without any problem. So I don't think that's the problem here.
 
[QUOTE="chrisalavoine, post: 138859, member: 1872"this failed. Only way I could get it work was to upgrade the 4.1 to 4.2 (with VM's still running) and then live migrations worked. .[/QUOTE]

I think they was a bug in initial 4.1 release, with migration process not passing the qemu -machine argument

this was fixed by this commit
https://git.proxmox.com/?p=qemu-server.git;a=commit;h=d1363934b81c5336f59987a9f958f09fcc11d038

So, yes, source node need to be updated too (at least qemu-server package)
 
I had a very similar issue trying to live migrate from 4.0 to 4.2 which was fixed by updating just qemu-server and its dependencies. Whilst 4.0 -> 4.2 is a 2-version jump and might not be tested by devs, the fact this issue also applied to 4.1 -> 4.2 live migrations is, yet again, a sign that the Proxmox devs aren't even testing live migrations between two consecutive minor releases.

In a cluster setup, live migration is crucial to avoiding downtime during upgrades and although testing every version combination of live migration is infeasible, there are two combinations that always need testing:

1. Live migration from the last minor version of a major release to the first minor version of the next major release up.
2. Live migration from one minor version to the next minor version up.

Ideally we need a live migration grid showing every combination (in both directions) with a yes/no/unknown value on each cell and perhaps workaround tips for combinations that require extra actions (e.g. upgrade qemu-server or whatever).
 
Last edited:
  • Like
Reactions: xxdoctorhousexx
I had a very similar issue trying to live migrate from 4.0 to 4.2 which was fixed by updating just qemu-server and its dependencies. Whilst 4.0 -> 4.2 is a 2-version jump and might not be tested by devs, the fact this issue also applied to 4.1 -> 4.2 live migrations is, yet again, a sign that the Proxmox devs aren't even testing live migrations between two consecutive minor releases.

This is simply not true - we are testing live migrations a lot, but unfortunately sometimes there are bugs or backwards-incompatibilities in our code or in Qemu/KVM. When possible, we try to mitigate these issues to enable at least forward-migration (from old to new), sometimes this requires a partial upgrade for the migration to work. If you follow the current development, you will see that there are for example upcoming changes regarding the migration over SSH where the old method will be kept around to allow upgrading from old to new.

The grid that you suggested would be very nice - all the information needed to create it is public ;) Unfortunately - as always - developer time is limited..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!