[SOLVED] Ceph & cephfs_data Pools

linuxmanru · Feb 4, 2022

Hi everyone,
In my case I have 7 pve node.
osd_pool_default_min_size = 2
osd_pool_default_size = 3


ceph osd pool autoscale-status
POOL                     SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO TARGET RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE 
device_health_metrics  15426k                2.0        106.4T  0.0000                                  1.0       1              on        
vm.pool                 2731G                3.0        106.4T  0.0752       1.0000           1.0000   1.0     512              on        
cephfs_data             1978                 2.0        106.4T  0.0000                                  1.0      32              on        
cephfs_metadata        23799k                2.0        106.4T  0.0000                                  4.0       2              on


ceph df
--- RAW STORAGE ---
CLASS  SIZE     AVAIL   USED     RAW USED  %RAW USED
hdd    106 TiB  98 TiB  8.1 TiB   8.1 TiB       7.60
TOTAL  106 TiB  98 TiB  8.1 TiB   8.1 TiB       7.60
--- POOLS ---
POOL                   ID  PGS  STORED   OBJECTS  USED     %USED  MAX AVAIL
device_health_metrics   1    1   15 MiB       22   30 MiB      0     46 TiB
vm.pool                 2  512  2.7 TiB  741.87k  8.0 TiB   8.01     31 TiB
cephfs_data             3   32  1.9 KiB        0  3.9 KiB      0     46 TiB
cephfs_metadata         4    2   23 MiB       28   48 MiB      0     46 TiB


rados df
POOL_NAME                 USED  OBJECTS  CLONES   COPIES  MISSING_ON_PRIMARY  UNFOUND  DEGRADED     RD_OPS      RD     WR_OPS       WR  USED COMPR  UNDER COMPR
cephfs_data            3.9 KiB        0       0        0                   0        0         0          0     0 B          0      0 B         0 B          0 B
cephfs_metadata         48 MiB       28       0       56                   0        0         0          0     0 B         14   13 KiB         0 B          0 B
device_health_metrics   30 MiB       22       0       44                   0        0         0         22  44 KiB         22  231 KiB         0 B          0 B
vm.pool                8.0 TiB   741870       0  2225610                   0        0         0  113390981  74 TiB  707422564   12 TiB         0 B          0 B
total_objects    741920
total_used       8.1 TiB
total_avail      98 TiB
total_space      106 TiB

The pools cephfs_data & cephfs_metadata created when I Create CephFS.
The 7 node I add the last.
For now if check with CLI ceph df : cephfs_data + cephfs_metadata MAX AVAIL is 46 TiB , but the vm.pool 31 TiB
I don't understand why the last pool I create have MAX AVAIL space less than pool crated with Create CephFS.

Because for now the pool cephfs_data I don't use yet, if I destroy it like I read in : Destroy CephFS
I need ask if I destroy CephFS pool is will affect other my last pool (vm.pool) or affect the ceph storage ?
And the most importanrt if I destroy it MAX AVAIL space will increase for the pool -> vm.pool
If I need the cephfs_data pool I can it create again but after the destroy it before.

Proxmox VE version is :


proxmox-ve: 6.4-1 (running kernel: 5.4.162-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-12
pve-kernel-helper: 6.4-12
pve-kernel-5.4.162-1-pve: 5.4.162-2
pve-kernel-5.4.157-1-pve: 5.4.157-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
ceph: 15.2.15-pve1~bpo10
ceph-fuse: 15.2.15-pve1~bpo10
corosync: 3.1.5-pve2~bpo10+1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve2~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.3-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.7-pve1

gurubert · Feb 4, 2022

linuxmanru said:
For now if check with CLI ceph df : cephfs_data + cephfs_metadata MAX AVAIL is 46 TiB , but the vm.pool 31 TiB
I don't understand why the last pool I create have MAX AVAIL space less than pool crated with Create CephFS.

Is by any chance the replication size of the pool vm.pool set to 4?

aaron · Feb 4, 2022

linuxmanru said:
I need ask if I destroy CephFS pool is will affect other my last pool (vm.pool) or affect the ceph storage ?
And the most importanrt if I destroy it MAX AVAIL space will increase for the pool -> vm.pool

Looking at the output of ceph df, I don't think you will gain any noteworthy space if you destroy the Ceph FS.

If you run pveceph pool ls you should see which size/min_size are set.

linuxmanru · Feb 7, 2022

gurubert said:
Is by any chance the replication size of the pool vm.pool set to 4?

I not see the replication size of the vm.pool to 4?

linuxmanru · Feb 7, 2022

aaron said:
Looking at the output of ceph df, I don't think you will gain any noteworthy space if you destroy the Ceph FS.

If you run pveceph pool ls you should see which size/min_size are set.

If I not get more space if destroy Ceph FS pools I think not need to destroy it.
The output :
pveceph pool ls

Rich (BB code):

pveceph pool ls
┌───────────────────────┬──────┬──────────┬────────┬─────────────┬────────────────┬───────────────────┬──────────────────────────┬───────────────────────────┬─────────────────┬──────────────────────┬───────────────┐
│ Name                  │ Size │ Min Size │ PG Num │ min. PG Num │ Optimal PG Num │ PG Autoscale Mode │ PG Autoscale Target Size │ PG Autoscale Target Ratio │ Crush Rule Name │               %-Used │          Used │
╞═══════════════════════╪══════╪══════════╪════════╪═════════════╪════════════════╪═══════════════════╪══════════════════════════╪═══════════════════════════╪═════════════════╪══════════════════════╪═══════════════╡
│ cephfs_data           │    2 │        2 │     32 │             │             32 │ on                │                          │                           │ replicated_rule │                    0 │             0 │
├───────────────────────┼──────┼──────────┼────────┼─────────────┼────────────────┼───────────────────┼──────────────────────────┼───────────────────────────┼─────────────────┼──────────────────────┼───────────────┤
│ cephfs_metadata       │    2 │        2 │      2 │           2 │              2 │ on                │                          │                           │ replicated_rule │ 5.03807939367107e-07 │      49749196 │
├───────────────────────┼──────┼──────────┼────────┼─────────────┼────────────────┼───────────────────┼──────────────────────────┼───────────────────────────┼─────────────────┼──────────────────────┼───────────────┤
│ device_health_metrics │    2 │        2 │      1 │           1 │              1 │ on                │                          │                           │ replicated_rule │ 3.77427340936265e-07 │      37269565 │
├───────────────────────┼──────┼──────────┼────────┼─────────────┼────────────────┼───────────────────┼──────────────────────────┼───────────────────────────┼─────────────────┼──────────────────────┼───────────────┤
│ vm.pool               │    3 │        2 │    512 │             │            512 │ on                │                          │                         1 │ replicated_rule │   0.0823325961828232 │ 8859461644192 │
└───────────────────────┴──────┴──────────┴────────┴─────────────┴────────────────┴───────────────────┴──────────────────────────┴───────────────────────────┴─────────────────┴──────────────────────┴───────────────┘

gurubert · Feb 7, 2022

Your other pools are only size 2. Which is totally not recommended. This explains the space available difference.

You will not gain any more space by deleting the other pools. These only hold a couple of megabytes.

linuxmanru · Feb 7, 2022

gurubert said:
Your other pools are only size 2. Which is totally not recommended. This explains the space available difference.

You will not gain any more space by deleting the other pools. These only hold a couple of megabytes.

If I understand correctly: If I set up other pools with a size of 3/Min. Size 2 space, then the total space can change ?

linuxmanru · Feb 7, 2022

Now I set 3/2 on other pools and ceph df show :

[ICODE]
ceph df
--- RAW STORAGE ---
CLASS  SIZE     AVAIL   USED     RAW USED  %RAW USED
hdd    106 TiB  98 TiB  8.1 TiB   8.1 TiB       7.62
TOTAL  106 TiB  98 TiB  8.1 TiB   8.1 TiB       7.62
 
--- POOLS ---
POOL                   ID  PGS  STORED   OBJECTS  USED     %USED  MAX AVAIL
device_health_metrics   1    1   17 MiB       22   51 MiB      0     30 TiB
vm.pool                 2  512  2.7 TiB  743.78k  8.1 TiB   8.23     30 TiB
cephfs_data             3   32      0 B        0      0 B      0     30 TiB
cephfs_metadata         4    2   22 MiB       28   69 MiB      0     30 TiB

[/ICODE]

Is normal when set 3/2 ?

gurubert · Feb 7, 2022

Yes, you have 98TiB raw available, divided by 3 is roughly 30TiB. As the pools share all OSDs each pool has the same max avail space. If one pool starts to fill the other pools will have less max avail space.

linuxmanru · Feb 7, 2022

gurubert said:
Yes, you have 98TiB raw available, divided by 3 is roughly 30TiB. As the pools share all OSDs each pool has the same max avail space. If one pool starts to fill the other pools will have less max avail space.

Thanks. Now I understand.
Please tell my when you see the on the vm.pool replication size to 4?

linuxmanru · Feb 7, 2022

gurubert said:
Yes, you have 98TiB raw available, divided by 3 is roughly 30TiB. As the pools share all OSDs each pool has the same max avail space. If one pool starts to fill the other pools will have less max avail space.

One question about avail space.
If I try the command :
set fixed expected pool size
# ceph osd pool set MY_POOL_NAME target_size_bytes 60T ( or 50T)

or relative pool size (of full space)

# ceph osd pool set MY_POOL_NAME target_size_ratio .9

Is change the avail space on vm.pool?

aaron · Feb 7, 2022

any target_size / target_ratio is only there for the autoscaler to have an idea how large the pool will be. It will not have any effect on how much space the pool has / sees.

As @gurubert already mentioned, take the raw capacity of the cluster and divide it by the size of the pool, and you get roughly how large the pool can get.

BUT: do not fill the pool all the way up. Ceph needs free space in case one of your nodes fails. Also, if the OSDs are used very differently, it can happen that an OSD is getting quite a bit fuller than most others. This will also reduce the available space. If that happens, you can check if the pg_num could be optimized or the balancer activated.

If you require more space, add more / larger OSDs or more nodes to the cluster. Do NOT change the size/min_size value to anything lower than 3/2 as it is a recipe for data loss.

Ceph can deal and recover from a lot of situations quite well. Running out of space is one of the few things you want to avoid at all costs though!

linuxmanru · Feb 7, 2022

aaron said:
any target_size / target_ratio is only there for the autoscaler to have an idea how large the pool will be. It will not have any effect on how much space the pool has / sees.

As @gurubert already mentioned, take the raw capacity of the cluster and divide it by the size of the pool, and you get roughly how large the pool can get.

BUT: do not fill the pool all the way up. Ceph needs free space in case one of your nodes fails. Also, if the OSDs are used very differently, it can happen that an OSD is getting quite a bit fuller than most others. This will also reduce the available space. If that happens, you can check if the pg_num could be optimized or the balancer activated.

If you require more space, add more / larger OSDs or more nodes to the cluster. Do NOT change the size/min_size value to anything lower than 3/2 as it is a recipe for data loss.

Ceph can deal and recover from a lot of situations quite well. Running out of space is one of the few things you want to avoid at all costs though!

Thank you aaron, now I understand all.

Search

Search

[SOLVED] Ceph & cephfs_data Pools

linuxmanru

Member

gurubert

Famous Member

aaron

Proxmox Staff Member

linuxmanru

Member

linuxmanru

Member

gurubert

Famous Member

linuxmanru

Member

linuxmanru

Member

gurubert

Famous Member

linuxmanru

Member

linuxmanru

Member

aaron

Proxmox Staff Member

linuxmanru

Member