Hey guys we did some site failover testing this past weekend.
When bringing the clusters back up at our main site, we found that corosync would fail to start and the nodes wouldn't join the cluster.
After lots of head banging, the only way we could get this corrected was to add a entry for each node in the hosts file on all cluster nodes. We typically only have a entry in the hosts file for that specific node, not all of them.
Here is what we use to have.
10.211.45.4 bunkmiscrit1.ccs.com bunkmiscrit1 pvelocalhost
Here is what we have now
10.211.45.4 bunkmiscrit1.ccs.com bunkmiscrit1 pvelocalhost
10.211.45.5 bunkmiscrit2.ccs.com bunkmiscrit2 pvelocalhost
10.211.45.6 bunkmiscrit3.ccs.com bunkmiscrit3 pvelocalhost
We have 30 or so clusters out in the field on older Proxmox5 versions that have just a single hosts file entry. I also have 2 inhouse test clusters that only have the single hosts file entry and they still work.
Any ideas on what I could be missing?
When bringing the clusters back up at our main site, we found that corosync would fail to start and the nodes wouldn't join the cluster.
After lots of head banging, the only way we could get this corrected was to add a entry for each node in the hosts file on all cluster nodes. We typically only have a entry in the hosts file for that specific node, not all of them.
Here is what we use to have.
10.211.45.4 bunkmiscrit1.ccs.com bunkmiscrit1 pvelocalhost
Here is what we have now
10.211.45.4 bunkmiscrit1.ccs.com bunkmiscrit1 pvelocalhost
10.211.45.5 bunkmiscrit2.ccs.com bunkmiscrit2 pvelocalhost
10.211.45.6 bunkmiscrit3.ccs.com bunkmiscrit3 pvelocalhost
We have 30 or so clusters out in the field on older Proxmox5 versions that have just a single hosts file entry. I also have 2 inhouse test clusters that only have the single hosts file entry and they still work.
Any ideas on what I could be missing?