running different pve versions in the same cluster e.g. during pve 6.4 to 7.4 upgrade

ioo

Renowned Member
Oct 1, 2011
23
0
66
Hi!

First i apologize because this topic is much discussed already (and i tried and read thru posts). I think everything behaves as expected and well in my test lab but i would like still ask you to confirm my understanding. I have multinode (23) pve v. 6.4 cluster without HA turned on and it is using shared storage as lvm on iscsi; networking is done with openvswitch; only kvm-virtual-machines are used. My intent is to upgrade it to v. 7.4 and i think to do it so so-to-say in-place
  1. select one node
  2. move away virtual machines to other nodes
  3. upgrade it as per https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0 in-place (no kicking node out of cluster and joining back as new node joining cluster)
  4. move back virtual machines
and proceed with next node until all nodes are done.

I am a bit perplexed what impact could come from cluster having for some time running simultaniously pve v. 6.4 and v. 7.4 nodes. I understand there are six (what i can think of) important players in this regard

1. corosync which among other things passes around cluster messages and detects which nodes are present
2. pmxcfs filesystem itself at /etc/pve - it is multimaster read-write mounted filesystem and many so to say pve-processes use it to read and write stuff
3. /var/lib/pve-cluster/config.db - each node has one and it is used both ways: 1. at node bootup to populate /etc/pve; 2. at run time to write /etc/pve contents to
4. qemu version (package pve-qemu-kvm)
5. pve webgui version
6. configs under /etc/pve - like qemu-server kind of virtual machine configs (qm.conf)

i looked into them and i think

1. corosync versions at all times i.e. in pve v. 6.4 and 7.4 are around its version 3.1.x so my guess would be there is not much risk to run them simultaniously in cluster
2. pmxcfs - i dont have good estimate on this having 6.4 and 7.4 running together (i see it comes from pve-cluster package)
3. /var/lib/pve-cluster/config.db - i looked into this with sqlitebrowser program and see 6.4, 7.4 (and even v. 8.0) all have same schema

Code:
CREATE TABLE tree
(  inode INTEGER PRIMARY KEY NOT NULL,
  parent INTEGER NOT NULL CHECK(typeof(parent)=='integer'),
  version INTEGER NOT NULL CHECK(typeof(version)=='integer'),
  writer INTEGER NOT NULL CHECK(typeof(writer)=='integer'),
  mtime INTEGER NOT NULL CHECK(typeof(mtime)=='integer'),
  type INTEGER NOT NULL CHECK(typeof(type)=='integer'),
  name TEXT NOT NULL,  data BLOB)

4. qemu versions are very different (by major number) and i guess you better not move virtual computer online from newer to older
5. pve webgui version - newer pve webgui could create pve configuration which is not acceptable to older pve nodes (and their webgui); so better not make changes in upgraded nodes until whole cluster is upgraded
6. configs under /etc/pve - for example qemu-server qm.conf - probably it is possible to construct such qm.conf settings at new pve v. 7.4 that is not comprehensible for pve v. 6.4 but it can be avoided

I would be thankful if you could comment these thoughts and maybe add some aspect i did not pay attention to.

And my practical questions would be

1. could it be source of risk that upgraded pve v. 7.4 node changes stuff under /etc/pve and it propagates to all nodes (and end up also in /var/lib/pve-cluster/config.db), and those old pve v. 6.4 nodes can't tolerate what they see in /etc/pve; and it results old nodes (probably whole cluster) are defunct; i think it is not big problem if qm.conf is confusing one or other virtual machine on old nodes; but if cluster as whole gets confused it is serious (for example because /etc/pve/storage.cfg gets confusing content)

2. would it be ok to turn node selected for upgrade into dual boot system: one boot option is existing 6.4 and other starts out as root filesystem copy and would be upgraded to 7.4; and then if not content with upgrade i boot this node up into old 6.4 soon afterwards - and i expect this node and whole cluster being very same as before

I tried to make some challenging changes and even having running v. 6.4 with v. 8.0 but did not succeed totally to confuse cluster (althought at times i lost webgui but it came back after tens of seconds (it may be because of browser cache etc); even logs seemed to me say normal. And i also dual booted descibed way node and all functions okey.

So thanks for patience for reading this long set on statements etc, and i would be very thankful you could comment on them. Or if you could direct me to some resource on this (other than source code :)


Best regards,

Imre
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!