Ceph 75% degraded with only one host down of 4

Lucian Lazar · Jan 3, 2020

Hi all i am struggling to find the reason why my ceph cluster goes into 75% degraded (as seen in the screenshot above) when i reboot just one node.
The 4 node cluster is new, with no Vm or container so the used space is 0.
Each of the node contains an equal number of SSD OSDs(6 x 465gb) totalling 10TB and there is one poll with default replicated_rule and 3/2 parity + 1024 PG's. CEPH is running on a dedicated 10GBe network in LACP so advertised at 20GBps.
As seen in the screenshot, if i reboot one node i get that error of degraded but in my opinion it should be only 25% degraded (red color on graph) as there are still 3 nodes available. Is there something i am missing? Or do i interpreted incorrectly the graph?
Thank you all in advance.
pve-manager/6.1-5/9bf06119 (running kernel: 5.3.13-1-pve)

Stoiko Ivanov · Jan 3, 2020

If you have 4 hosts and a replication rule with 3/2 (size/min_size) then I would expect 75% of the pgs to lose one copy if you shutdown one node?
(the 25% that stay active+clean - have their 3 copies on the 3 remaining hosts).
Put differently with 4 hosts and a size of 3, each host has one copy of 75% of the pgs

However since all of your pgs are in state active - requests to the pool should be processed without any problem
see the explanation of the pg states: https://docs.ceph.com/docs/master/rados/operations/pg-states/

I hope this explains it!

Lucian Lazar · Jan 3, 2020

Thank you very much, it makes sense, my concern is that once i try filling up that storage pool (10Tb/3 =3,33 usable total) so when i will have let's say the maximum 3 TB provisioned, what will happen when one node will go down.

Stoiko Ivanov · Jan 3, 2020

a) try not to fill your pool to 100% (the OSDs will stop accepting requests when they become full)
b) there should be no change in the behavior - if one host goes down all your pgs remain active and your data is (read and write) accessible - if you run virtual guests on ceph they will continue to function. if 2 hosts go down some pgs will not have enough copies online anymore and thus not be writeable anymore - usually that means that the VM's disks are not responding anymore -> they crash

I hope this explains it

Lucian Lazar · Jan 3, 2020

Thank you so much, this clarified my doubts. In order to afford a 2 nodes down I should have at least 5 nodes for ceph if I understood correctly. Thank you again

Stoiko Ivanov · Jan 3, 2020

With a setting of 3/2 (size/min_size) yes.
With four nodes you could also go with 4/2 - then you should survive 2 nodes going down.

Apart from that - I would suggest to test this with one or 2 small VMs before taking it into production - this always helps me to get a feeling how things work

jsterr · Feb 17, 2021

Stoiko Ivanov said:
With a setting of 3/2 (size/min_size) yes.
With four nodes you could also go with 4/2 - then you should survive 2 nodes going down.

Apart from that - I would suggest to test this with one or 2 small VMs before taking it into production - this always helps me to get a feeling how things work

This does only work when the two nodes dont go down same time right? If you have 3 copies of data and 2 of the replikas go down you only have one data, means less then minsize means read only right? That would cause VMs to stop working right?

If one node goes down, ceph recovers and new replika will be built on other server, if the other server fails then, you should have replika size = minsize and everything ok?

Or am I missing something?

Alwin Antreich · Feb 18, 2021

Besides what @Stoiko Ivanov said, your cluster won't be quorate anymore. So no VM/CT can be started and with HA the nodes will reset as they are not in the quorate partition anymore.

Search

Search

Ceph 75% degraded with only one host down of 4

Lucian Lazar

Member

Attachments

Stoiko Ivanov

Proxmox Staff Member

Lucian Lazar

Member

Stoiko Ivanov

Proxmox Staff Member

Lucian Lazar

Member

Stoiko Ivanov

Proxmox Staff Member

jsterr

Renowned Member

Alwin Antreich

Active Member