Usable space on Ceph Storage

aychprox

Renowned Member
Oct 27, 2015
76
7
73
Hi,

I am trying to understand the usable space showed in proxmox under ceph storage. I tried to google but no luck to get direct answer. I am appreciate senior here can guide me about how to calculate usable space.

I had refer to https://forum.proxmox.com/threads/newbie-need-your-input.24176/page-2 but seems different from the example given by Q-wulf.

Current setup:

4 Nodes, 4 x 1TB OSD each nodes, 1 x 120GB SSD for journal and 1 x 500GB HDD for OS

Pool:

Size: 3
Min: 1
Pg_number: 1024

In ceph storage summary:

Type: RBD
Size: 14.55TB

Ceph Configuration:

[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.50.51.0/24
filestore xattr use omap = true
fsid = bf5d56ae-xxx-4db1-xxx-b11ddxxxcbd6a
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 10.50.51.0/24

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
osd max backfills = 1
osd recovery max active = 1
filestore flusher = false

[mon.1]
host = node2 mon addr = 10.50.51.16:6789

[mon.0]
host = node1 mon addr = 10.50.51.15:6789

[mon.2]
host = node3 mon addr = 10.50.51.17:6789



Crush map rules:

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit }
 
lets assume you did a "ceph -w" and get the following output:
2015-12-18 12:57:55.812746 mon.0 [INF] pgmap v455862: 512 pgs: 512 active+clean; 30904 MB data, 116 GB used, 6871 GB / 6988 GB avail; 28511 B/s wr, 10 op/s

That means the following:
available: 6871 GB / 6988 GB --> the total Capacity of all your osds is 6988 GB. Out of that there are 6871 Gb free
used: 116 GB --> there is 116 GB of space taken up my data in placement groups (of all pools)
data: 30904 MB --> there is about 31 GB of actual data (before replication or Erasure coding) residing on all of your pools combined.


Math

For replicated pools it works like this:
example 4/2 (size/min_size) --> each Gigabyte of actual data you put into the pool gets multiplied by "size" - so 4.

For Erasure Coded pools it works like this:
  • example k=3 m=1 --> each Gigabyte of actual data you put into the pool gets multiplied by a factor of (1+ k/m) so 1+1/3 roughly 1,3.
  • example k=2 m=2 --> each Gigabyte of actual data you put into the pool gets multiplied by a factor of (1+ k/m) so 1+2/2 so 2.
  • example k=20 m = 4 --> each Gigabyte of actual data you put into the pool gets multiplied by a factor of (1+ m/k) so 1+4/20 so 1,2.

hope that helps (and should be the same as the example you quoted (unless my math was off then))

Just a FYI your available Space is is mostly meaningless btw for operation of your node. It only gives you a general idea. This is because you can have multiple pools, that use different sizes and types (replicated vs erasure coded) and make use of different failure domains on a per pool basis (if you set it - e.g. OSD, Host, Room, Building, Datacenter)

You also can end up having different OSD sizes (once your ceph cluster grows organically) or different number of OSDs per Host.

What you have to do is look at "Ceph" > "OSD" overview in Proxmox and make sure that none of your OSD's ever gets to 100%. If it does, you can reweight it , so some data is moved off it and onto other osd's according to your crush rule-set.


Hope that sheds some light onto it.
 
Last edited:
  • Like
Reactions: El Tebe
Thanks for the clear example and explanation.
Definitely another meaniful class for me today!
 
Thanks for the clear example and explanation.
Definitely another meaniful class for me today!

2 things i forgot:
  • on EC pools you typically choose a leaf of "OSD", since you want to as many OSD's in that EC-pool as you can to get the lowest overhead for parity chunks you can get.
  • on Replicated pools you typically go with at least a leaf of "Host" (unless it is a single node Ceph-Cluster), since you wanna make sure you can not only take failing osd's but also a failing Host. depending on the requirement(s) you might wanna take a leaf with a higher id (check your crushmap).

lemme know how your SSHD's are turning out :)
 
lemme know how your SSHD's are turning out :)

sorry about this, unfortunately at this moment I can't bring down in-operation sshd servers for ceph pool. but I just updated the benchmark for single SSHD, results seems weird and must slower compared to those with hardware RAID. https://forum.proxmox.com/threads/newbie-need-your-input.24176/

But for general use, the above setup, we use 7200rpm HGST as OSDs.

I will post an update once sshd pool is ready !
 
I am pushing up that thread, but I still don't know how does it count:

My 3 node pool has 24x 136GB disks, so 3.2TB total space.

The Images in the lxc pool need 178GB, the images in the vm pool 784GB.
Than why it says usage: 1,43 TB??

Even if its counted twice cause of replication it should than be 1,9 TB..
Somehow it confuses me...
 
I am pushing up that thread, but I still don't know how does it count:

Been less then 20 months, its fine :D


(...)
My 3 node pool has 24x 136GB disks, so 3.2TB total space.

The Images in the lxc pool need 178GB, the images in the vm pool 784GB.
Than why it says usage: 1,43 TB??

Even if its counted twice cause of replication it should than be 1,9 TB..
Somehow it confuses me...

Q1:
Replicated Pool ?

Q2:
8 OSD per node ?

Q3:
Same failure Domain ? (As in Host/Node as opposed to OSD)

Q4 (if Q1, Q2 and Q3 = yes):

Did you set size == 3 and min_size == 1 for said replicated pool ?
Is that your only pool ? What settings do those other pools use ?

Q5:
Can you provide output of command line command "ceph -w" from one of your mon's cli's ?
 
Well I set size = 3 and min = 2, I have 2 pools called ceph-lxc, ceph-vm, they have been configured exactly like in the Proxmox video described.

Output:

Code:
cluster c4d0e591-a919-4df0-8627-d2fda956f7ff
    health HEALTH_OK
    monmap e3: 3 mons at {0=172.30.3.21:6789/0,1=172.30.3.22:6789/0,2=172.30.3.23:6789/0}
           election epoch 58, quorum 0,1,2 0,1,2
    osdmap e1964: 24 osds: 24 up, 24 in
           flags sortbitwise,require_jewel_osds
     pgmap v679695: 1024 pgs, 2 pools, 666 GB data, 167 kobjects
           1991 GB used, 1287 GB / 3279 GB avail
               1024 active+clean
 client io 41109 B/s rd, 430 kB/s wr, 5 op/s rd, 63 op/s wr
 
you have around 666GB of actual (logical) data, which is replicated 3x, for a total of 1991GB of physically used space on your OSD disks. the 666 are rounded, which is why you get a bit less than the expected 1998GB used. the (logical) usage as seen by Ceph is sometimes higher than what you see from the client side, because Ceph chunks your data into objects and counts those. also, reclaiming via trim does not always recover the full space like it would on a physical device.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!