Ceph Storage Calculation

starnetwork

Renowned Member
Dec 8, 2009
422
8
83
Hello everyone,
I have 6 nodes, each node 1x128GB for Proxmox OS and additional 2x 960GB Samsung Enterprise SSD
for Ceph Cluster
so, I have in total
6 Hosts / nodes
2 osd per node
1 Pool with settings of 3/2 (size / min size)

my questions is what is the free disk I have for "work"?
3.84TB? (all disks / 3)
1.92TB (all nodes / 3)
Should I stay on size of 3 or should I change the size from 3 to 2?

Best Regards,
 
  • Like
Reactions: Tmanok
Hello everyone,
I have 6 nodes, each node 1x128GB for Proxmox OS and additional 2x 960GB Samsung Enterprise SSD
for Ceph Cluster
so, I have in total
6 Hosts / nodes
2 osd per node
1 Pool with settings of 3/2 (size / min size)

my questions is what is the free disk I have for "work"?
3.84TB? (all disks / 3)
1.92TB (all nodes / 3)
Should I stay on size of 3 or should I change the size from 3 to 2?

Best Regards,


Hi,

With regards to the usable free space you can have a look at this calculator: Ceph: Safely Available Storage Calculator (search for it)
It will give you a good starting point and is quite informative as well.

Regards,

P.S. That URL posting restriction is ridiculous :S
 
Should I stay on size of 3 or should I change the size from 3 to 2?
Stay on size 3/2 as on small clusters, the risk of loosing a disk that hold the same PG is significantly higher. And you may already know, only use 3 monitors, not more.

my questions is what is the free disk I have for "work"?
You can only fill the disks to 80% with default settings, before the first 'near full' warning appears. But also you need to take into account how many of your nodes can be lost till you hit the 80% mark (recovery).
 
  • Like
Reactions: El Tebe
Hi Alwin,
thanks for that answer!
1. for 8 nodes if's can move to 2/2? what is the logical behind that?
move nodes also say more percentage of something going wrong, no?

2. it's killing me, from 6 nodes of 2x960GB for each I have total usage of 1.92TB usage for all the nodes together
it's look like there is kind of RAID1 for both OSDs on each node and than additional replication for additional 2 nodes (total 3 copies)
so not make sense to move 2/2 replication if anyway I have the OSD replication for each node?

Regards,
 
1. for 8 nodes if's can move to 2/2? what is the logical behind that?
move nodes also say more percentage of something going wrong, no?
I don't quite understand your question. The size 2 means, that Ceph only has two copies of a PG (holding objects). If one disks fails, then the risk that another disk with the same PG fails, before the second copy is recovered, is high enough. Somewhere on the Ceph mailling list, was a calculation floating around (AFAIR ~11%). It all depends on the time it takes to recover and how likely it is to loose a disk.

2. it's killing me, from 6 nodes of 2x960GB for each I have total usage of 1.92TB usage for all the nodes together
it's look like there is kind of RAID1 for both OSDs on each node and than additional replication for additional 2 nodes (total 3 copies)
so not make sense to move 2/2 replication if anyway I have the OSD replication for each node?
No, no RAID 1. The PGs are distributed by CRUSH (default setting) on node level. Eg. a PG will be on a disk on node 1 and on node 6.

=== size 3 ===
960 * 0.8 = 768 GB
768 * 12 = 9216 GB (80% fill of OSD)
9216 / 3 = 3072 GB (space after replication)
3072 / 6 = 512 GB (space per host)
512 * 2 = 1024 GB (data that needs to be moved after 2x nodes fail)
3072 - 1024 = 2048 GB (usable space)
2048 * 0.8 = 1638.4 GB (80% of usable space, 20% for growth in degraded state)

=== size 2 ===
960 * 0.8 = 768 GB
768 * 12 = 9216 GB (80% fill of OSD)
9216 / 2 = 4608 GB (space after replication)
4608 / 6 = 768 GB (space per host)
768 * 2 = 1536 GB (data that needs to be moved after 2x nodes fail)
4608 - 1536 = 3072 GB (usable space)
3072 * 0.8 = 2457.6 GB (80% of usable space, 20% for growth in degraded state)

The difference is one disk 819.2 GB. The calculation is just an example.

To say, this is a simple calculation and might not take everything into account, but hopefully illustrates a little what ceph is doing with all that space. You have to weigh, if you want more data safety or space. Also keep in mind that this calculation doesn't take into account how fast a recovery can be done and hence not representing the risk of a second disk failing.
 
  • Like
Reactions: Wichets and Tmanok
WOW, thanks you for that detailed response and explain.
very helpful for me to understand how it works
 
I don't quite understand your question. The size 2 means, that Ceph only has two copies of a PG (holding objects). If one disks fails, then the risk that another disk with the same PG fails, before the second copy is recovered, is high enough. Somewhere on the Ceph mailling list, was a calculation floating around (AFAIR ~11%). It all depends on the time it takes to recover and how likely it is to loose a disk.


No, no RAID 1. The PGs are distributed by CRUSH (default setting) on node level. Eg. a PG will be on a disk on node 1 and on node 6.

=== size 3 ===
960 * 0.8 = 768 GB
768 * 12 = 9216 GB (80% fill of OSD)
9216 / 3 = 3072 GB (space after replication)
3072 / 6 = 512 GB (space per host)
512 * 2 = 1024 GB (data that needs to be moved after 2x nodes fail)
3072 - 1024 = 2048 GB (usable space)
2048 * 0.8 = 1638.4 GB (80% of usable space, 20% for growth in degraded state)

=== size 2 ===
960 * 0.8 = 768 GB
768 * 12 = 9216 GB (80% fill of OSD)
9216 / 2 = 4608 GB (space after replication)
4608 / 6 = 768 GB (space per host)
768 * 2 = 1536 GB (data that needs to be moved after 2x nodes fail)
4608 - 1536 = 3072 GB (usable space)
3072 * 0.8 = 2457.6 GB (80% of usable space, 20% for growth in degraded state)

The difference is one disk 819.2 GB. The calculation is just an example.

To say, this is a simple calculation and might not take everything into account, but hopefully illustrates a little what ceph is doing with all that space. You have to weigh, if you want more data safety or space. Also keep in mind that this calculation doesn't take into account how fast a recovery can be done and hence not representing the risk of a second disk failing.
So, can you help me in demonstrating the way to calculate CEPH capacity if there's extra an 128GB OSD on all nodes in the above scenario, pls. Another issue I encountered the 1st time-lab is that I have 4 nodes:

- CEPH2: sdb 10GB, sdc 40GB
- CEPH3: sdb 10GB, sdc 40GB
- CEPH4: sdb 10GB
- CEPH5: sdb 10GB
And I configured Ceph replicates 3 times in ceph.conf, min=2
I can't understand why the out come result when checking status with "ceph -s" is summary capacity of nodes (120GB)
1614064706064.png
 
Ceph shows you the raw available storage, which is 120GB in your case.
For calculating the safe cluster size you should definitely google for "Ceph: Safely Available Storage Calculator", available on florian.ca
 
Last edited:
  • Like
Reactions: lDemoNl and Tmanok
Ceph shows you the raw available storage, which is 120GB in your case.
For calculating the safe cluster size you should definitely google for "Ceph: Safely Available Storage Calculator", available on florian.ca
I did consult the page, but I still dont know why I'm able to store data to the cluster upto 120GB, though ceph's been configured with replica. And, OSDs capacities are differ so how can I calculate them manually to understand.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!