Ceph 75% degraded with only one host down of 4

Lucian Lazar

Member
Apr 23, 2018
23
3
23
41
Romania
ecoit.ro
Hi all i am struggling to find the reason why my ceph cluster goes into 75% degraded (as seen in the screenshot above) when i reboot just one node.
The 4 node cluster is new, with no Vm or container so the used space is 0.
Each of the node contains an equal number of SSD OSDs(6 x 465gb) totalling 10TB and there is one poll with default replicated_rule and 3/2 parity + 1024 PG's. CEPH is running on a dedicated 10GBe network in LACP so advertised at 20GBps.
As seen in the screenshot, if i reboot one node i get that error of degraded but in my opinion it should be only 25% degraded (red color on graph) as there are still 3 nodes available. Is there something i am missing? Or do i interpreted incorrectly the graph?
Thank you all in advance.
pve-manager/6.1-5/9bf06119 (running kernel: 5.3.13-1-pve)
 

Attachments

  • Screen Shot 2020-01-03 at 15.11.13.png
    Screen Shot 2020-01-03 at 15.11.13.png
    374 KB · Views: 28
Last edited:
If you have 4 hosts and a replication rule with 3/2 (size/min_size) then I would expect 75% of the pgs to lose one copy if you shutdown one node?
(the 25% that stay active+clean - have their 3 copies on the 3 remaining hosts).
Put differently with 4 hosts and a size of 3, each host has one copy of 75% of the pgs

However since all of your pgs are in state active - requests to the pool should be processed without any problem
see the explanation of the pg states: https://docs.ceph.com/docs/master/rados/operations/pg-states/

I hope this explains it!
 
Thank you very much, it makes sense, my concern is that once i try filling up that storage pool (10Tb/3 =3,33 usable total) so when i will have let's say the maximum 3 TB provisioned, what will happen when one node will go down.
 
a) try not to fill your pool to 100% (the OSDs will stop accepting requests when they become full)
b) there should be no change in the behavior - if one host goes down all your pgs remain active and your data is (read and write) accessible - if you run virtual guests on ceph they will continue to function. if 2 hosts go down some pgs will not have enough copies online anymore and thus not be writeable anymore - usually that means that the VM's disks are not responding anymore -> they crash

I hope this explains it
 
Thank you so much, this clarified my doubts. In order to afford a 2 nodes down I should have at least 5 nodes for ceph if I understood correctly. Thank you again
 
With a setting of 3/2 (size/min_size) yes.
With four nodes you could also go with 4/2 - then you should survive 2 nodes going down.

Apart from that - I would suggest to test this with one or 2 small VMs before taking it into production - this always helps me to get a feeling how things work
 
With a setting of 3/2 (size/min_size) yes.
With four nodes you could also go with 4/2 - then you should survive 2 nodes going down.

Apart from that - I would suggest to test this with one or 2 small VMs before taking it into production - this always helps me to get a feeling how things work

This does only work when the two nodes dont go down same time right? If you have 3 copies of data and 2 of the replikas go down you only have one data, means less then minsize means read only right? That would cause VMs to stop working right?

If one node goes down, ceph recovers and new replika will be built on other server, if the other server fails then, you should have replika size = minsize and everything ok?

Or am I missing something?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!