Very out of date cluster

SysAdmin_D · Oct 28, 2024

Good evening. I need some high-level pointers, and tough love, if necessary. After downsizing and COVID, responsibility has come to me, though I do have access to our last surviving Senior Linux Engineer in a pinch. While originally only responsible for managing the provided virtualized resources (basic VM deployment and management) of a small cluster, I am now needing to look into a software upgrade. However, I am traditionally a Windows Senior Engineer, but I also support macOS in simple hybrid cloud (Azure) environment. I was also SME on our old 5-node vSphere Cluster, backed by NetApp storage back in the day, as well as formal training from VMware. When we first deployed the cluster, we had sufficient support resources to maintain this cluster without me, but now, not so much. I hope to be able to figure this out myself, as our other Sys Engineer is responsible for our HPC clusters and High-Performance storage assets, as well as AWS resources.

We are currently running a 3-node cluster, on version 5.3-8, and in a perfect world, I'd like to upgrade to the latest. We are also running Ceph and ZFS on the nodes. I am aware I am probably several months of learning away from attempting this. Is a direct upgrade to latest possible, or will I have to jump major versions first? Also, will the upgrade software handle any ZFS and Ceph upgrades, or will I need to tackle those separately? What's the best order of operations here? Thanks in advance.

Neobin · Oct 29, 2024

SysAdmin_D said:
We are currently running a 3-node cluster, on version 5.3-8, and in a perfect world, I'd like to upgrade to the latest.

SysAdmin_D said:
Is a direct upgrade to latest possible

No.

SysAdmin_D said:
or will I have to jump major versions first?

Yes:

SysAdmin_D said:
Also, will the upgrade software handle any ZFS and Ceph upgrades, or will I need to tackle those separately?

ZFS:
Gets upgraded together with the PVE.
The pool version would need a manual upgrade afterwards, but this is not important at this point, if at all. (Especially if the root/boot is on ZFS, a pool upgrade must be evaluated carefully beforehand anyway.)
Ceph:
Separate upgrades are needed:
https://pve.proxmox.com/wiki/Category:Ceph_Upgrade

SysAdmin_D said:
What's the best order of operations here?

Carefully read, understand and follow step-by-step the PVE upgrade guides one after another.
They also include informations and steps with reference to Ceph.
Different PVE versions support different Ceph versions.

Of course, always have recent and successful-restore-tested backups!

LnxBil · Oct 29, 2024

SysAdmin_D said:
What's the best order of operations here?

I recommend setting up a nested PVE cluster with the same software version and ceph in order to play around with a copy of your system and test all procedures.

ubu · Oct 29, 2024

It is quite well documented, but if you are unsure you might want to get some professional help

SysAdmin_D · Oct 29, 2024

LnxBil said:
I recommend setting up a nested PVE cluster with the same software version and ceph in order to play around with a copy of your system and test all procedures.

That's a really great idea. I thought of that later last night.

SysAdmin_D · Oct 29, 2024

ubu said:
It is quite well documented, but if you are unsure you might want to get some professional help

Oh, I definitely need some (mental) professional help! That said, I have access to our own Senior Engineer who did not set this up, but is familiar with, and responsible for, all our other Linux resources. If it became really necessary, I also have text message access to the person who actually built the cluster. They left on good terms and we get along; there is also the possibility for us to rehire in the near future, too.

SysAdmin_D · Oct 29, 2024

Neobin said:
No.

Yes:

https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0

https://pve.proxmox.com/wiki/Upgrade_from_6.x_to_7.0

https://pve.proxmox.com/wiki/Upgrade_from_7_to_8

ZFS:
Gets upgraded together with the PVE.
The pool version would need a manual upgrade afterwards, but this is not important at this point, if at all. (Especially if the root/boot is on ZFS, a pool upgrade must be evaluated carefully beforehand anyway.)

Ceph:
Separate upgrades are needed:
https://pve.proxmox.com/wiki/Category:Ceph_Upgrade

Carefully read, understand and follow step-by-step the PVE upgrade guides one after another.
They also include informations and steps with reference to Ceph.
Different PVE versions support different Ceph versions.

Of course, always have recent and successful-restore-tested backups!

Thank you for this. I appreciate it.

Also, are there any built-in documentation tools for our current configuration? I'd definitely like to have something like that as I do more research.

leesteken · Oct 29, 2024

SysAdmin_D said:
That's a really great idea. I thought of that later last night.

Would it not be easier to start a new cluster by installing PVE 8.2 on a new system?
Otherwise you might run into light incompatibilities for each major version that you need to upgrade to in between. On a new single-node 8.2, you can test your current VMs one at a time (by restoring them from backups). Once you know how to make them work, you can wipe the nodes of your old cluster and add them to new one (after installing a fresh PVE 8.2 on them).

SysAdmin_D · Oct 29, 2024

leesteken said:
Would it not be easier to start a new cluster by installing PVE 8.2 on a new system?
Otherwise you might run into light incompatibilities for each major version that you need to upgrade to in between. On a new single-node 8.2, you can test your current VMs one at a time (by restoring them from backups). Once you know how to make them work, you can wipe the nodes of your old cluster and add them to new one (after installing a fresh PVE 8.2 on them).

That literally just came to me, as well. Would it be useful to just rebuild one of the current nodes to latest, then migrate to that node, followed by rebuilding the rest of the cluster? I don't have spare hardware, but I guess I could try copying to our production Nutanix cluster in a nested virtualization scenario.

leesteken · Oct 29, 2024

SysAdmin_D said:
Would it be useful to just rebuild one of the current nodes to latest, then migrate to that node, followed by rebuilding the rest of the cluster?

Then you would be running a two-node cluster which is at risk as neither node has quorum when the other is turned off (or unreachable or dead): https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum . Also, Ceph needs at least three nodes, normally. But I guess it is possible...

SysAdmin_D said:
I don't have spare hardware, but I guess I could try copying to our production Nutanix cluster in a nested virtualization scenario.

What will you do when one of your current nodes fails? Maybe it's time to get a spare node?

SysAdmin_D · Oct 29, 2024

I would take it a step further and say it's time to get a whole, new, cluster, but I work in the nonprofit space, and I used budgeting "tips and tricks" to get this cluster, so it is, what it is. That said, this was our first for Proxmox. We had migrated from our large VMware cluster to a managed Nutanix cluster, and this was us exploring site services that could fail over - application-wise - to the main Nutanix cluster in a different building. We don't make any money on our web presence so our VM hosting needs are pretty minimal. We were also trying out SuperMicro servers at the time, as an alternative to Dell, and had settled on a standard build so that we could replace them in the event of a node failure.

Search

Search

Very out of date cluster

SysAdmin_D

New Member

Neobin

Distinguished Member

LnxBil

Distinguished Member

ubu

Renowned Member

SysAdmin_D

New Member

SysAdmin_D

New Member

SysAdmin_D

New Member

leesteken

Distinguished Member

SysAdmin_D

New Member

leesteken

Distinguished Member

SysAdmin_D

New Member