Very out of date cluster

Oct 28, 2024
6
2
3
Good evening. I need some high-level pointers, and tough love, if necessary. After downsizing and COVID, responsibility has come to me, though I do have access to our last surviving Senior Linux Engineer in a pinch. While originally only responsible for managing the provided virtualized resources (basic VM deployment and management) of a small cluster, I am now needing to look into a software upgrade. However, I am traditionally a Windows Senior Engineer, but I also support macOS in simple hybrid cloud (Azure) environment. I was also SME on our old 5-node vSphere Cluster, backed by NetApp storage back in the day, as well as formal training from VMware. When we first deployed the cluster, we had sufficient support resources to maintain this cluster without me, but now, not so much. I hope to be able to figure this out myself, as our other Sys Engineer is responsible for our HPC clusters and High-Performance storage assets, as well as AWS resources.

We are currently running a 3-node cluster, on version 5.3-8, and in a perfect world, I'd like to upgrade to the latest. We are also running Ceph and ZFS on the nodes. I am aware I am probably several months of learning away from attempting this. Is a direct upgrade to latest possible, or will I have to jump major versions first? Also, will the upgrade software handle any ZFS and Ceph upgrades, or will I need to tackle those separately? What's the best order of operations here? Thanks in advance.
 
We are currently running a 3-node cluster, on version 5.3-8, and in a perfect world, I'd like to upgrade to the latest.

Is a direct upgrade to latest possible

No.

or will I have to jump major versions first?

Yes:
  1. https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0
  2. https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0
  3. https://pve.proxmox.com/wiki/Upgrade_from_7_to_8

Also, will the upgrade software handle any ZFS and Ceph upgrades, or will I need to tackle those separately?

  • ZFS:
    Gets upgraded together with the PVE.
    The pool version would need a manual upgrade afterwards, but this is not important at this point, if at all. (Especially if the root/boot is on ZFS, a pool upgrade must be evaluated carefully beforehand anyway.)
  • Ceph:
    Separate upgrades are needed:
    https://pve.proxmox.com/wiki/Category:Ceph_Upgrade

What's the best order of operations here?

Carefully read, understand and follow step-by-step the PVE upgrade guides one after another.
They also include informations and steps with reference to Ceph.
Different PVE versions support different Ceph versions.

Of course, always have recent and successful-restore-tested backups!
 
It is quite well documented, but if you are unsure you might want to get some professional help
 
  • Like
Reactions: SysAdmin_D
It is quite well documented, but if you are unsure you might want to get some professional help
Oh, I definitely need some (mental) professional help! That said, I have access to our own Senior Engineer who did not set this up, but is familiar with, and responsible for, all our other Linux resources. If it became really necessary, I also have text message access to the person who actually built the cluster. They left on good terms and we get along; there is also the possibility for us to rehire in the near future, too.
 
No.



Yes:
  1. https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0
  2. https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0
  3. https://pve.proxmox.com/wiki/Upgrade_from_7_to_8



  • ZFS:
    Gets upgraded together with the PVE.
    The pool version would need a manual upgrade afterwards, but this is not important at this point, if at all. (Especially if the root/boot is on ZFS, a pool upgrade must be evaluated carefully beforehand anyway.)
  • Ceph:
    Separate upgrades are needed:
    https://pve.proxmox.com/wiki/Category:Ceph_Upgrade



Carefully read, understand and follow step-by-step the PVE upgrade guides one after another.
They also include informations and steps with reference to Ceph.
Different PVE versions support different Ceph versions.

Of course, always have recent and successful-restore-tested backups!
Thank you for this. I appreciate it.

Also, are there any built-in documentation tools for our current configuration? I'd definitely like to have something like that as I do more research.
 
That's a really great idea. I thought of that later last night.
Would it not be easier to start a new cluster by installing PVE 8.2 on a new system?
Otherwise you might run into light incompatibilities for each major version that you need to upgrade to in between. On a new single-node 8.2, you can test your current VMs one at a time (by restoring them from backups). Once you know how to make them work, you can wipe the nodes of your old cluster and add them to new one (after installing a fresh PVE 8.2 on them).
 
Would it not be easier to start a new cluster by installing PVE 8.2 on a new system?
Otherwise you might run into light incompatibilities for each major version that you need to upgrade to in between. On a new single-node 8.2, you can test your current VMs one at a time (by restoring them from backups). Once you know how to make them work, you can wipe the nodes of your old cluster and add them to new one (after installing a fresh PVE 8.2 on them).

That literally just came to me, as well. Would it be useful to just rebuild one of the current nodes to latest, then migrate to that node, followed by rebuilding the rest of the cluster? I don't have spare hardware, but I guess I could try copying to our production Nutanix cluster in a nested virtualization scenario.
 
  • Like
Reactions: Johannes S
Would it be useful to just rebuild one of the current nodes to latest, then migrate to that node, followed by rebuilding the rest of the cluster?
Then you would be running a two-node cluster which is at risk as neither node has quorum when the other is turned off (or unreachable or dead): https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum . Also, Ceph needs at least three nodes, normally. But I guess it is possible...
I don't have spare hardware, but I guess I could try copying to our production Nutanix cluster in a nested virtualization scenario.
What will you do when one of your current nodes fails? Maybe it's time to get a spare node?
 
I would take it a step further and say it's time to get a whole, new, cluster, but I work in the nonprofit space, and I used budgeting "tips and tricks" to get this cluster, so it is, what it is. That said, this was our first for Proxmox. We had migrated from our large VMware cluster to a managed Nutanix cluster, and this was us exploring site services that could fail over - application-wise - to the main Nutanix cluster in a different building. We don't make any money on our web presence so our VM hosting needs are pretty minimal. We were also trying out SuperMicro servers at the time, as an alternative to Dell, and had settled on a standard build so that we could replace them in the event of a node failure.
 
  • Like
Reactions: Johannes S

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!