Hi All,
I'm looking for some input on best practices for updating / patching our three node cluster.
As a bit of background, we've been running 5.2 for nearly a year in a three node cluster. This was rock solid stable. I would update it once a month from the free repositories with no issues.
Fast forward to an update that installed version 5.3 a few weeks ago. The day after this we began experiencing random cluster node reboots. I've learned a great deal about logs and troubleshooting. Long story short, we're now on 5.4 and the cluster is stable again. We also bought support for all cluster nodes before upgrading to 5.4 so we're now using the enterprise repositories.
Thoughts? Suggestions? How would you recommend we do our patching, what's worked best for you and your organisation?
We were planning to add a forth node to the cluster purely so that CEPH would have an available OID to move data to should another node go down. Our thinking is now that this forth node should still be duplicate hardware but that it should be stand alone. Specifically the forth server run Proxmox outside of the cluster, will receive patches a week prior to the cluster nodes and will provide backup for cluster VMs. In the event of a cluster issue we'll have the option to spin up backed up VMs on it. Basically the forth node isn't a node, it's a server that's used for testing and disaster recovery.
thanks in advance,
James
I'm looking for some input on best practices for updating / patching our three node cluster.
As a bit of background, we've been running 5.2 for nearly a year in a three node cluster. This was rock solid stable. I would update it once a month from the free repositories with no issues.
Fast forward to an update that installed version 5.3 a few weeks ago. The day after this we began experiencing random cluster node reboots. I've learned a great deal about logs and troubleshooting. Long story short, we're now on 5.4 and the cluster is stable again. We also bought support for all cluster nodes before upgrading to 5.4 so we're now using the enterprise repositories.
Thoughts? Suggestions? How would you recommend we do our patching, what's worked best for you and your organisation?
We were planning to add a forth node to the cluster purely so that CEPH would have an available OID to move data to should another node go down. Our thinking is now that this forth node should still be duplicate hardware but that it should be stand alone. Specifically the forth server run Proxmox outside of the cluster, will receive patches a week prior to the cluster nodes and will provide backup for cluster VMs. In the event of a cluster issue we'll have the option to spin up backed up VMs on it. Basically the forth node isn't a node, it's a server that's used for testing and disaster recovery.
thanks in advance,
James