How can you maintain the cluster to continue to works after some nodes/datacenter down

danielc · Jun 27, 2018

Hello,

We are tring to pretend a disaster in our proxmox environments, so to see what we can do.
We have 4 ceph nodes only run ceph, plus some addition nodes only acts as HV.
We decleared 2 datacenter, 2 ceph server belongs to datacenter A, and the other 2 ceph servers belongs to datacenter B.

We changed the crush maps , declared datacenter, and also changed the replication rules to ensure every copy is at every datacenter every host.

That is all fine, but until we really shut down two ceph server which belongs to one datacenter, everything starts not working, VMs started to freeze.

How can you maintain the cluster to continue to works after some nodes/datacenter down?

We can not even run ceph commands now if we shutdown two ceph servers (e.g ceph3 and ceph4, but ceph1 and ceph2 are still online in this case):

root@ceph1:~# ceph osd tree
2018-06-27 17:54:24.333963 7f572ae99700 0 monclient(hunting): authenticate timed out after 300
2018-06-27 17:54:24.333992 7f572ae99700 0 librados: client.admin authentication error (110) Connection timed out

root@ceph1:~# ceph status
2018-06-27 18:09:29.116223 7fd00104b700 0 monclient(hunting): authenticate timed out after 300
2018-06-27 18:09:29.116245 7fd00104b700 0 librados: client.admin authentication error (110) Connection timed out
[errno 110] error connecting to the cluster

Even under the UI, the status is green with tick,

Cluster: CephCluster, Quorate: Yes

But it actually is not working now.
How can we solve this problem???
Thanks

danielc · Jun 27, 2018

Of crouse, if we turn back everythings on (for example just re-open the lan port of the ceph servers), now it is totaly fine...

root@ceph1:~# ceph status
cluster:
id:
health: HEALTH_OK

services:
mon: 4 daemons, quorum ceph1,ceph2,ceph3,ceph4
mgr: ceph4(active), standbys: ceph3, ceph2, ceph1
osd: 32 osds: 32 up, 32 in

data:
pools: 2 pools, 1088 pgs
objects: 95964 objects, 351 GB
usage: 1454 GB used, 115 TB / 116 TB avail
pgs: 1087 active+clean
1 active+clean+scrubbing+deep

io:
client: 49630 B/s rd, 6077 B/s wr, 5 op/s rd, 1 op/s wr

David Herselman · Jun 27, 2018

Ceph and Proxmox would need to reach quorum, which means a vote of more than 50%.

ie: Impossible with only 4 nodes, when 2 are unavailable.

Ceph RBD crush map would also need a min_size of 2, with replication count of 4...

mir · Jun 27, 2018

In generel clusters with an even node numbers is a disaster waiting to happen. Always use odd numbers for nodes in a cluster or in Ceph terms odd numbers for monitors.

danielc · Jun 29, 2018

mir said:
In generel clusters with an even node numbers is a disaster waiting to happen. Always use odd numbers for nodes in a cluster or in Ceph terms odd numbers for monitors.

Hello,

Thanks for this information, i dont even know about this. Do you have/know any technical documents talking about this??

danielc · Jun 29, 2018

David Herselman said:
Ceph and Proxmox would need to reach quorum, which means a vote of more than 50%.

ie: Impossible with only 4 nodes, when 2 are unavailable.

Ceph RBD crush map would also need a min_size of 2, with replication count of 4...

I indeed have 7 nodes in the cluster, which some of the nodes acts as hyperviser . But i do only have 4 monitors, as i mentioned i have 4 nodes for ceph only.

Even i shutdown two ceph servers, the quorum is still showing OK

root@ceph1:~# ceph status
2018-06-29 20:03:56.490455 7fc872847700 0 monclient(hunting): authenticate timed out after 300
2018-06-29 20:03:56.490489 7fc872847700 0 librados: client.admin authentication error (110) Connection timed out
[errno 110] error connecting to the cluster
root@ceph1:~# pvecm status
Quorum information
------------------
Date: Fri Jun 29 20:03:58 2018
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000001
Ring ID: 1/1080
Quorate: Yes

Votequorum information
----------------------
Expected votes: 7
Highest expected: 7
Total votes: 4
Quorum: 4
Flags: Quorate

Is it really the number of monitors in ceph causing this problem? I am really confusing here,

mir · Jun 29, 2018

danielc said:
Even i shutdown two ceph servers, the quorum is still showing OK

And if you shut down two monitor servers?

danielc · Jun 29, 2018

mir said:
And if you shut down two monitor servers?

Yes, i shunted down two ceph monitor server. VM freeze, ceph commands timed out.
4 monitors on the 4 ceph servers.
The pvecm status result i posted on here was after this.

danielc · Jun 30, 2018

Dear all,

Thanks, it looks like the number of the ceph monitor is the key of this problem.
After i took one hv to install monitors on it , now it runs as intended:

root@ceph1:~# ceph status
cluster:
id:
health: HEALTH_WARN
1 datacenter (16 osds) down
16 osds down
2 hosts (16 osds) down
Reduced data availability: 32 pgs inactive
Degraded data redundancy: 96227/380501 objects degraded (25.290%), 1088 pgs degraded, 1088 pgs undersized
320 slow requests are blocked > 32 sec
2/5 mons down, quorum ceph1,ceph2,hv1

services:
mon: 5 daemons, quorum ceph1,ceph2,hv1, out of quorum: ceph3, ceph4
mgr: ceph2(active), standbys: ceph1, hv1
osd: 32 osds: 16 up, 32 in

data:
pools: 2 pools, 1088 pgs
objects: 96227 objects, 352 GB
usage: 729 GB used, 58884 GB / 59614 GB avail
pgs: 2.941% pgs not active
96227/380501 objects degraded (25.290%)
1056 active+undersized+degraded
32 undersized+degraded+peered

io:
client: 2341 kB/s wr, 0 op/s rd, 1091 op/s wr

Interestingly, i look back the proxmox ceph install guide, it seems it does not mention anything about the number of clusters or the ceph monitors, i might have missed.
What a big lesson i have learned!
Thanks all, have a nice day!

Search

Search

How can you maintain the cluster to continue to works after some nodes/datacenter down

danielc

Member

danielc

Member

David Herselman

Renowned Member

mir

Famous Member

danielc

Member

danielc

Member

mir

Famous Member

danielc

Member

danielc

Member

We value your privacy