I set up an 8 node cluster (pve 2.x) in the lab. At some point I needed to reboot all the machines, so I did (I was logged into all of them with ssh and so just issued reboot command).
After they all came back up, I'm hit with the loss of quorum. Each time I reboot a node, it informs me that the cluster isn't ready. I can issue pvecm expected 1 to manually restore quorum, but this appears to be only a temporary fix. Any time a node is restarted it comes up without quorum. As a result, I ended up deleting and rebuilding the cluster.
Is there a proper procedure for restarting the entire cluster that will avoid this situation? Once I've finished my configuration and testing, this cluster will need to be moved into colocation, but I'd hate to have to recreate the cluster after doing so. I also have to consider the possibility that unforeseen events might cause the cluster to be shutdown at some point in the future (e.g. total power loss in the cabinet). Is there any way to have the cluster recover after such an event?
After they all came back up, I'm hit with the loss of quorum. Each time I reboot a node, it informs me that the cluster isn't ready. I can issue pvecm expected 1 to manually restore quorum, but this appears to be only a temporary fix. Any time a node is restarted it comes up without quorum. As a result, I ended up deleting and rebuilding the cluster.
Is there a proper procedure for restarting the entire cluster that will avoid this situation? Once I've finished my configuration and testing, this cluster will need to be moved into colocation, but I'd hate to have to recreate the cluster after doing so. I also have to consider the possibility that unforeseen events might cause the cluster to be shutdown at some point in the future (e.g. total power loss in the cabinet). Is there any way to have the cluster recover after such an event?