Problem with ceph cluster without quorum

Douglas

Renowned Member
May 31, 2016
5
0
66
33
Hi,
I am facing problems during some lab tests that I am doing. I have a cluster of 4 nodes, when they are all online, everything works perfectly. I decided to add 2 more nodes just as a ceph monitor, without any OSD, being currently with 6 nodes. Simulating an environment with two zones, each zone with 3 nodes, by simulating the failure of 3 nodes at the same time. PVE remains active, but ceph appears to lose quorum, with 3 active nodes. The ceph -s or -w command returns nothing. And via GUI it returns "got timeout (500)". Technically with 3 nodes, ceph should be working normally, all pools I have are size = 2 and min_size = 2

Has anyone experienced a similar problem?
 
Has anyone experienced a similar problem?
It's not a problem per se, it's how quorum works.

Simulating an environment with two zones, each zone with 3 nodes, by simulating the failure of 3 nodes at the same time.
This means, that you have created a split brain. No side has more than 50%, and therefore has no quorum. Also the Proxmox VE nodes won't have it.
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum

For Ceph and Proxmox VE (corosync) you will need a third location that provides quorum and is seen by the other two zones. Aside from that, this setup is not recommended for long distances.
 
  • Like
Reactions: Douglas
It's not a problem per se, it's how quorum works.


This means, that you have created a split brain. No side has more than 50%, and therefore has no quorum. Also the Proxmox VE nodes won't have it.
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum

For Ceph and Proxmox VE (corosync) you will need a third location that provides quorum and is seen by the other two zones. Aside from that, this setup is not recommended for long distances.
And by the way size=2 is a bad idea ! Use at least size=3 and min_size=2 !
 
  • Like
Reactions: Douglas
And by the way size=2 is a bad idea ! Use at least size=3 and min_size=2 !
IO is paused when the min_size is not met. Since that is also 2, losing a copy will result in the halt of the cluster.
 
  • Like
Reactions: Douglas
Thank you all.
I will increase to 3 x 2 replicas and try to keep the data separate by modifying the crushmap for each datacenter, let's see how it works.