patching best practices

Binary Bandit · Apr 16, 2019

Hi All,

I'm looking for some input on best practices for updating / patching our three node cluster.

As a bit of background, we've been running 5.2 for nearly a year in a three node cluster. This was rock solid stable. I would update it once a month from the free repositories with no issues.

Fast forward to an update that installed version 5.3 a few weeks ago. The day after this we began experiencing random cluster node reboots. I've learned a great deal about logs and troubleshooting. Long story short, we're now on 5.4 and the cluster is stable again. We also bought support for all cluster nodes before upgrading to 5.4 so we're now using the enterprise repositories.

Thoughts? Suggestions? How would you recommend we do our patching, what's worked best for you and your organisation?

We were planning to add a forth node to the cluster purely so that CEPH would have an available OID to move data to should another node go down. Our thinking is now that this forth node should still be duplicate hardware but that it should be stand alone. Specifically the forth server run Proxmox outside of the cluster, will receive patches a week prior to the cluster nodes and will provide backup for cluster VMs. In the event of a cluster issue we'll have the option to spin up backed up VMs on it. Basically the forth node isn't a node, it's a server that's used for testing and disaster recovery.

thanks in advance,

James

wolfgang · Apr 17, 2019

Hi,

Binary Bandit said:
Thoughts? Suggestions? How would you recommend we do our patching, what's worked best for you and your organisation?

this is only true for minor updates like 5.x > 5.y.
For major updates, there is an upgrade how to like this [1].

In a Cluster you should generall do an upgrade one by one node.
1.) shutdown/move all running VM from this node.
2.) Disable HA if it is active (just for safety)
3.) upgrade the node.
4.) If a new kernel version where installed reboot the node.
5.) do the same with the next node

1.) https://pve.proxmox.com/wiki/Upgrade_from_4.x_to_5.0
2.) https://pve.proxmox.com/wiki/FAQ

Binary Bandit · Apr 18, 2019

Thanks for the response Wolfgang.

That first Wiki is interesting ... hope that we go after new hardware at that point. I'd much prefer to add the new node, role VMs to it and retire an old node.

Search

Search

patching best practices

Binary Bandit

Well-Known Member

wolfgang

Proxmox Retired Staff

Binary Bandit

Well-Known Member

We value your privacy