Ceph Storage Calculation

starnetwork · Jan 27, 2018

Hello everyone,
I have 6 nodes, each node 1x128GB for Proxmox OS and additional 2x 960GB Samsung Enterprise SSD
for Ceph Cluster
so, I have in total
6 Hosts / nodes
2 osd per node
1 Pool with settings of 3/2 (size / min size)

my questions is what is the free disk I have for "work"?
3.84TB? (all disks / 3)
1.92TB (all nodes / 3)
Should I stay on size of 3 or should I change the size from 3 to 2?

Best Regards,

ssaki · Jan 28, 2018

starnetwork said:
Hello everyone,
I have 6 nodes, each node 1x128GB for Proxmox OS and additional 2x 960GB Samsung Enterprise SSD
for Ceph Cluster
so, I have in total
6 Hosts / nodes
2 osd per node
1 Pool with settings of 3/2 (size / min size)

my questions is what is the free disk I have for "work"?
3.84TB? (all disks / 3)
1.92TB (all nodes / 3)
Should I stay on size of 3 or should I change the size from 3 to 2?

Best Regards,

Hi,

With regards to the usable free space you can have a look at this calculator: Ceph: Safely Available Storage Calculator (search for it)
It will give you a good starting point and is quite informative as well.

Regards,

P.S. That URL posting restriction is ridiculous :S

Alwin · Jan 29, 2018

starnetwork said:
Should I stay on size of 3 or should I change the size from 3 to 2?

Stay on size 3/2 as on small clusters, the risk of loosing a disk that hold the same PG is significantly higher. And you may already know, only use 3 monitors, not more.

starnetwork said:
my questions is what is the free disk I have for "work"?

You can only fill the disks to 80% with default settings, before the first 'near full' warning appears. But also you need to take into account how many of your nodes can be lost till you hit the 80% mark (recovery).

starnetwork · Jan 29, 2018

Hi Alwin,
thanks for that answer!
1. for 8 nodes if's can move to 2/2? what is the logical behind that?
move nodes also say more percentage of something going wrong, no?

2. it's killing me, from 6 nodes of 2x960GB for each I have total usage of 1.92TB usage for all the nodes together
it's look like there is kind of RAID1 for both OSDs on each node and than additional replication for additional 2 nodes (total 3 copies)
so not make sense to move 2/2 replication if anyway I have the OSD replication for each node?

Regards,

Alwin · Jan 29, 2018

starnetwork said:
1. for 8 nodes if's can move to 2/2? what is the logical behind that?
move nodes also say more percentage of something going wrong, no?

I don't quite understand your question. The size 2 means, that Ceph only has two copies of a PG (holding objects). If one disks fails, then the risk that another disk with the same PG fails, before the second copy is recovered, is high enough. Somewhere on the Ceph mailling list, was a calculation floating around (AFAIR ~11%). It all depends on the time it takes to recover and how likely it is to loose a disk.

starnetwork said:
2. it's killing me, from 6 nodes of 2x960GB for each I have total usage of 1.92TB usage for all the nodes together
it's look like there is kind of RAID1 for both OSDs on each node and than additional replication for additional 2 nodes (total 3 copies)
so not make sense to move 2/2 replication if anyway I have the OSD replication for each node?

No, no RAID 1. The PGs are distributed by CRUSH (default setting) on node level. Eg. a PG will be on a disk on node 1 and on node 6.

=== size 3 ===
960 * 0.8 = 768 GB
768 * 12 = 9216 GB (80% fill of OSD)
9216 / 3 = 3072 GB (space after replication)
3072 / 6 = 512 GB (space per host)
512 * 2 = 1024 GB (data that needs to be moved after 2x nodes fail)
3072 - 1024 = 2048 GB (usable space)
2048 * 0.8 = 1638.4 GB (80% of usable space, 20% for growth in degraded state)

=== size 2 ===
960 * 0.8 = 768 GB
768 * 12 = 9216 GB (80% fill of OSD)
9216 / 2 = 4608 GB (space after replication)
4608 / 6 = 768 GB (space per host)
768 * 2 = 1536 GB (data that needs to be moved after 2x nodes fail)
4608 - 1536 = 3072 GB (usable space)
3072 * 0.8 = 2457.6 GB (80% of usable space, 20% for growth in degraded state)

The difference is one disk 819.2 GB. The calculation is just an example.

To say, this is a simple calculation and might not take everything into account, but hopefully illustrates a little what ceph is doing with all that space. You have to weigh, if you want more data safety or space. Also keep in mind that this calculation doesn't take into account how fast a recovery can be done and hence not representing the risk of a second disk failing.

starnetwork · Jan 29, 2018

WOW, thanks you for that detailed response and explain.
very helpful for me to understand how it works

dung3197 · Feb 23, 2021

Alwin said:
I don't quite understand your question. The size 2 means, that Ceph only has two copies of a PG (holding objects). If one disks fails, then the risk that another disk with the same PG fails, before the second copy is recovered, is high enough. Somewhere on the Ceph mailling list, was a calculation floating around (AFAIR ~11%). It all depends on the time it takes to recover and how likely it is to loose a disk.

No, no RAID 1. The PGs are distributed by CRUSH (default setting) on node level. Eg. a PG will be on a disk on node 1 and on node 6.

=== size 3 ===
960 * 0.8 = 768 GB
768 * 12 = 9216 GB (80% fill of OSD)
9216 / 3 = 3072 GB (space after replication)
3072 / 6 = 512 GB (space per host)
512 * 2 = 1024 GB (data that needs to be moved after 2x nodes fail)
3072 - 1024 = 2048 GB (usable space)
2048 * 0.8 = 1638.4 GB (80% of usable space, 20% for growth in degraded state)

=== size 2 ===
960 * 0.8 = 768 GB
768 * 12 = 9216 GB (80% fill of OSD)
9216 / 2 = 4608 GB (space after replication)
4608 / 6 = 768 GB (space per host)
768 * 2 = 1536 GB (data that needs to be moved after 2x nodes fail)
4608 - 1536 = 3072 GB (usable space)
3072 * 0.8 = 2457.6 GB (80% of usable space, 20% for growth in degraded state)

The difference is one disk 819.2 GB. The calculation is just an example.

To say, this is a simple calculation and might not take everything into account, but hopefully illustrates a little what ceph is doing with all that space. You have to weigh, if you want more data safety or space. Also keep in mind that this calculation doesn't take into account how fast a recovery can be done and hence not representing the risk of a second disk failing.

So, can you help me in demonstrating the way to calculate CEPH capacity if there's extra an 128GB OSD on all nodes in the above scenario, pls. Another issue I encountered the 1st time-lab is that I have 4 nodes:

- CEPH2: sdb 10GB, sdc 40GB
- CEPH3: sdb 10GB, sdc 40GB
- CEPH4: sdb 10GB
- CEPH5: sdb 10GB
And I configured Ceph replicates 3 times in ceph.conf, min=2
I can't understand why the out come result when checking status with "ceph -s" is summary capacity of nodes (120GB)

ph0x · Feb 23, 2021

Ceph shows you the raw available storage, which is 120GB in your case.
For calculating the safe cluster size you should definitely google for "Ceph: Safely Available Storage Calculator", available on florian.ca

dung3197 · Feb 23, 2021

ph0x said:
Ceph shows you the raw available storage, which is 120GB in your case.
For calculating the safe cluster size you should definitely google for "Ceph: Safely Available Storage Calculator", available on florian.ca

I did consult the page, but I still dont know why I'm able to store data to the cluster upto 120GB, though ceph's been configured with replica. And, OSDs capacities are differ so how can I calculate them manually to understand.

ph0x · Feb 23, 2021

You aren't. It's the raw size (4x10+2x40) of all your OSDs.

dung3197 · Feb 23, 2021

ph0x said:
You aren't. It's the raw size (4x10+2x40) of all your OSDs.

So, is there anyway that I could display the real cluster usable capacity ? Im so confused right now.

ph0x · Feb 23, 2021

Try ceph df

Search

Search

Ceph Storage Calculation

starnetwork

Renowned Member

ssaki

Member

Alwin

Proxmox Retired Staff

starnetwork

Renowned Member

Alwin

Proxmox Retired Staff

starnetwork

Renowned Member

dung3197

New Member

ph0x

Renowned Member

dung3197

New Member

ph0x

Renowned Member

dung3197

New Member

ph0x

Renowned Member

We value your privacy