Does everything really need to be completely up to date before upgrading to corosync v3

May 14, 2020
4
0
1
39
Hello,

we want to upgrade a PVE Cluster with several hosts (>10) to Version 6 from various versions of 5.x. All are using the enterprise repositories.

The Process, according to the docs, is this:
Phase 1: Firstly, upgrade ALL packages on all hosts to be at the latest patch level of PVE 5.4. Have apt list --upgradable completely empty. In general, requires node reboots. We think this can be stretched out over some days.
Phase 2: Using the transitional stretch/corosync3 repo, upgrade corosync from v2 to v3 everywhere (because of the inevitable quorum loss in the smaller partition at any time, do this fast). Does NOT require reboots.
Phase 3: Upgrade all hosts to Proxmox 6.x (now 6.2) / Debian Buster. Requires node reboots again, but can be stretched out over some days, we think.

We tested this on a productive 2 node cluster, and on a 3 node test ceph cluster with some Test VMs (hyper-converged). Worked well.

But on this larger cluster, we will have trouble getting EVERYTHING up to date EVERYWHERE before the corosync upgrade.

I upgraded a node (node 1) 2 days earlier, and at the time, it had everything up to date.
Today, i upgraded 2 more nodes (node 2 and 4). But now node 1 is already not completely up to date anymore. Possible upgrades:
pve-kernel (5.4-16 -> 5.4.17)
pve-manager (5.4-13 -> 5.4-15)

(proxmox-ve is 5.4-2 on all 3 nodes)

The 5to6-Manual at the wiki states explicitly to make sure NOTHING but corosync and its dependencies will be upgraded when moving up to corosync3. It even lists the exact packages expected.
So, no minor possible upgrade of pve-kernel, or pve-manager. Need to upgrade those first, again.
But is this really strictly necessary? Or is it just a warning to make sure you don't accidentally upgrade a node to v6.x before upgrading everything to corosync3, and to make sure you are at proxmox-ve 5.4-2 ?

If it really IS necessary, does that also mean i need to reboot (host 1) again before the corosync3 upgrade (in general? or maybe only in this instance, to get the new kernel running?)

Getting all node to exactly the same patch levels of every installed package prior to the corosync upgrade may be difficult, especially if this requires reboots. In general, we may need to migrate machines, make fresh backups, and coordinate with people at short notice before any single upgrade+reboot, and check everything after any upgrade.
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
2,952
472
103
South Tyrol/Italy
shop.maurer-it.com
Hi,

But is this really strictly necessary? Or is it just a warning to make sure you don't accidentally upgrade a node to v6.x before upgrading everything to corosync3, and to make sure you are at proxmox-ve 5.4-2 ?
It totally depends on how much newer versions you'd get through an upgrade. This was much more of a requirement at the time of the initial Proxmox VE 6.0 release, as then one really had to be on latest PVE 5.4 to ensure the upgrade works good.

As you upgraded recently (in the last weeks) you should be fine. The kernel isn't to important. The pve-manager version includes and update for the pve5to6 tool to check if one is affected by VM/EFI issue ( https://pve.proxmox.com/wiki/Upgrad..._Virtual_Machines_booting_with_OVMF_.28EFI.29 ) but you could also only do that update by doing:
Bash:
apt update
apt install pve-manager
(executing an install on an already installed will still pull in the newer version of that package).

If it really IS necessary, does that also mean i need to reboot (host 1) again before the corosync3 upgrade (in general? or maybe only in this instance, to get the new kernel running?)
No, in your specific case, where you already run a relative new kernel, this is not a hard requirement.

But honestly, if you got terrible luck either of the kernel version could bring you an issue, so the best thing to do would be using the version you tested with, if you have tested on test setup with a similar HW, at least to a certain degree.
 
May 14, 2020
4
0
1
39
Hello,

thank you for the info, that is helpful.
We couldn't progress far yet, ran into some - unrelated to upgrades - problems that need to be solved in yet another cluster.

To clarify, the part that is hard for us to get everything in sync for is the corosync3-upgrade.

Rebooting nodes twice at a later time, once to get any upgrades to pve-5.4 packages after that (even if reboots were necessary), and again after going to 6.x, is not a problem, since that would just be an extended downtime for a single node.
 
May 14, 2020
4
0
1
39
Hello,

We are still not through with the first pass (but soon). Got a couple new packages that could be upgraded on the nodes that were done (but these are directly from Debian)

intel-microcode/oldstable 3.20200609.2~deb9u1 amd64 [upgradable from: 3.20191115.2~deb9u1]
linux-libc-dev/oldstable 4.9.210-1+deb9u1 amd64 [upgradable from: 4.9.210-1]

The microcode, just as the kernel, would require a reboot to be applied.

So, given the situation described above, i'm leaning to ignoring them both until after we have upgraded corosync, and then upgrade themright before the upgrade to 6.x.

Do you think that will lead to problems?
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
2,952
472
103
South Tyrol/Italy
shop.maurer-it.com
The microcode, just as the kernel, would require a reboot to be applied.
linux-libc-dev would not require a reboot, it's just the user space headers for the kernel and in general backward compatible.

So, given the situation described above, i'm leaning to ignoring them both until after we have upgraded corosync, and then upgrade themright before the upgrade to 6.x.

Do you think that will lead to problems?
No, I do not think that specific pending update will lead to problems regarding the upgrade.
That said, microcode updates can change things affecting fundamentals of the CPU, and having it at least boot tested the new one on a single system resembling the CPU models in use would be a good idea, IMO.
 
May 14, 2020
4
0
1
39
thank you for your help.
corosync is now v3 everywhere.
(we only had a small issue on 2 nodes where the MTU on the NIC used by corosync wasn't the same as on the others, which apparently wasn't an issue for corosync v2)

process:
# add repo / apt update
apt-mark hold pve-manager pve-kernel-4.15 linux-libc-dev intel-microcode
apt full-upgrade --download-only
apt full-upgrade # check diligently on each node what will be upgraded, removed, installed.
apt-mark unhold pve-manager pve-kernel-4.15 linux-libc-dev intel-microcode

We will now proceed with upgrading to 6.2. probably first upgrading whatever was held back, rebooting, upgrading to 6.2, rebooting.

Thank you for your other tips as well.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!