[SOLVED] Ceph Storage Confusion

skywavecomm · Aug 5, 2019

So my ceph pool reports its usage in the web gui as this:

But then when going to the storage summary for the ceph pool it reports it as this:

Is this normal or is there an issue going on that I need to resolve?

Thanks!

skywavecomm · Aug 5, 2019

Also when I move VMs off the ceph storage, the total amount decreases for some reason.

Alwin · Aug 6, 2019

skywavecomm said:
Is this normal or is there an issue going on that I need to resolve?

On which Ceph version are you (ceph versions)?

skywavecomm said:
Also when I move VMs off the ceph storage, the total amount decreases for some reason.

Yes, ofc. Its the total used.

skywavecomm · Aug 6, 2019

Alwin said:
On which Ceph version are you (ceph versions)?

Yes, ofc. Its the total used.

Sorry I meant it increases, not decreases haha.

Code:

root@roc-server01:~# ceph versions
{
    "mon": {
        "ceph version 14.2.1 (9257126ffb439de1652793b3e29f4c0b97a47b47) nautilus (stable)": 3
    },
    "mgr": {
        "ceph version 14.2.1 (9257126ffb439de1652793b3e29f4c0b97a47b47) nautilus (stable)": 3
    },
    "osd": {
        "ceph version 14.2.1 (9257126ffb439de1652793b3e29f4c0b97a47b47) nautilus (stable)": 6
    },
    "mds": {},
    "overall": {
        "ceph version 14.2.1 (9257126ffb439de1652793b3e29f4c0b97a47b47) nautilus (stable)": 12
    }
}

Alwin · Aug 7, 2019

skywavecomm said:
Sorry I meant it increases, not decreases haha.

What does a 'ceph df detail' show? And with the current Ceph Nautilus packages the on-disk format changed a little, as the storage calculation was changed to per pool and better reflect the storage rules. With the 14.2.2 version, a warning will show that the OSD format is old. That means, best re-create all the OSDs and the calculation should give more sane values.

The general difference, in the pool view, the %-usage is provided by Ceph, but on the storage view, the calculation is provided through our code. This is most likely why you see different %-usage.

ddkargas · Aug 26, 2019

I have the same error on usage storage space

in ceph df details from proxmox ceph cluster is

ceph df detail

RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 96 TiB 61 TiB 35 TiB 35 TiB 36.84
TOTAL 96 TiB 61 TiB 35 TiB 35 TiB 36.84

POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
ceph_vms 41 11 TiB 2.97M 34 TiB 38.80 18 TiB N/A N/A 2.97M 0 B 0 B

In proxmox vm cluster export rados df

rados df

POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
ceph_vms 34 TiB 2974289 2838 8922867 0 0 0 156075123 17 TiB 26474392 12 TiB 0 B 0 B

total_objects 2974289
total_used 35 TiB
total_avail 61 TiB
total_space 96 TiB

But in gui pool details export is

On ceph dashboard general usage with only one pool is 35.49 TB of 96.32 TB and 37%

I cannot understand how this number is there in stats of px shared external rbd pool

Please some help is ok or not

Whole proxmox and px cluster ceph is in last 14.2.2 nautilus

skywavecomm · Sep 18, 2019

@Alwin did an update or two ago make changes to the calculation of available storage for ceph, as it's showing usage as expected for me now.

Alwin · Sep 30, 2019

skywavecomm said:
@Alwin did an update or two ago make changes to the calculation of available storage for ceph, as it's showing usage as expected for me now.

Yes, in libpve-storage-perl: 6.0-8.

abzsol · Nov 4, 2019

facing same problem.

Code:

proxmox-ve: 6.0-2 (running kernel: 5.0.21-3-pve)
pve-manager: 6.0-9 (running version: 6.0-9/508dcee0)
pve-kernel-5.0: 6.0-9
pve-kernel-helper: 6.0-9
pve-kernel-4.15: 5.4-9
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-4.15.18-21-pve: 4.15.18-48
pve-kernel-4.15.18-18-pve: 4.15.18-44
pve-kernel-4.15.18-9-pve: 4.15.18-30
ceph: 14.2.4-pve1
ceph-fuse: 14.2.4-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-5
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-8
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-7
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-9
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve1

Alwin · Nov 4, 2019

abzsol said:
facing same problem.

What does a ceph df detail and ceph -s show?

abzsol · Nov 4, 2019

Alwin said:
What does a ceph df detail and ceph -s show?

Code:

root@pve01:~# ceph df detail
RAW STORAGE:
    CLASS     SIZE       AVAIL      USED       RAW USED     %RAW USED
    ssd       31 TiB     13 TiB     18 TiB       18 TiB         57.82
    TOTAL     31 TiB     13 TiB     18 TiB       18 TiB         57.82
 
POOLS:
    POOL         ID     STORED     OBJECTS     USED       %USED     MAX AVAIL     QUOTA OBJECTS     QUOTA BYTES     DIRTY     USED COMPR     UNDER COMPR
    cephpool      1     13 TiB       3.58M     18 TiB     63.86       3.4 TiB     N/A               N/A             3.58M            0 B             0 B

Code:

root@pve01:~# ceph -s
  cluster:
    id:     7ced7402-a929-461a-bd40-53f863fa46ab
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum pve01,pve02,pve03 (age 6h)
    mgr: pve03(active, since 5h), standbys: pve02, pve01
    osd: 9 osds: 9 up (since 5h), 9 in (since 4d)
 
  data:
    pools:   1 pools, 256 pgs
    objects: 3.58M objects, 5.9 TiB
    usage:   18 TiB used, 13 TiB / 31 TiB avail
    pgs:     256 active+clean
 
  io:
    client:   4.4 MiB/s rd, 15 MiB/s wr, 101 op/s rd, 826 op/s wr

Alwin · Nov 4, 2019

Did you run a recent upgrade? And about what are you concerned?

abzsol · Nov 4, 2019

Alwin said:
Did you run a recent upgrade? And about what are you concerned?

Yes i upgraded on 29-10-201 from 5.4 and luminous to 6 and nautilus following the guides.

I'm concerned about the spike in the graph and also the reported usage and free space is wrong in web gui. or i miss something?

before upgrade i was using ca 60%: 6,6 TB of 10.3 TB. now i see 12.98Tb of 16.40TB used. both values are wrong: the used space and the total space.

Alwin · Nov 5, 2019

abzsol said:
Yes i upgraded on 29-10-201 from 5.4 and luminous to 6 and nautilus following the guides.

Did you upgrade all the OSDs to the new on-disk format yet?

Code:

POOLS:
    POOL         ID     STORED     OBJECTS     USED       %USED     MAX AVAIL
    cephpool      1     13 TiB       3.58M     18 TiB     63.86       3.4 TiB

The usage on the storage tab is calculated from 'stored' + 'max avail' (if the stored field is available). The 'max avail' is without replica, so a 3*3.4=10.2 is the raw avail (rounding errors, as Ceph calculates in bytes).

The used (18TiB) is the raw used for this storage class (ssd).

abzsol · Nov 5, 2019

Alwin said:
Did you upgrade all the OSDs to the new on-disk format yet?

I did this command for every osd on every node:

Code:

service ceph-osd@X stop && ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-X && service ceph-osd@X start

Alwin said:
Code:

POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL cephpool 1 13 TiB 3.58M 18 TiB 63.86 3.4 TiB

The usage on the storage tab is calculated from 'stored' + 'max avail' (if the stored field is available). The 'max avail' is without replica, so a 3*3.4=10.2 is the raw avail (rounding errors, as Ceph calculates in bytes).

The used (18TiB) is the raw used for this storage class (ssd).

Hi Alwin and thanks for your support, si i think the wrong value is the "stored". i'm in replica 3 and i know i'm using 6 Tb on my ceph storage, so the used of 18TB is a correct value while the stored should be 6.

Alwin · Nov 5, 2019

What does ceph osd pool ls detail say? As the 3.58M objects * 4 MiB objects are = ~13TiB.

Code:

  data:
    pools:   1 pools, 256 pgs
    objects: 3.58M objects, 5.9 TiB

The status shows 5.9 TiB, that is strange.

abzsol · Nov 5, 2019

Alwin said:
What does ceph osd pool ls detail say? As the 3.58M objects * 4 MiB objects are = ~13TiB.

Code:

data: pools: 1 pools, 256 pgs objects: 3.58M objects, 5.9 TiB

The status shows 5.9 TiB, that is strange.

Code:

root@pve01:~# ceph osd pool ls detail
pool 1 'cephpool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 12525 lfor 0/0/45 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
        removed_snaps [1~3,5~6,d~fe,10c~93c,a49~9a0,13ec~b61,1f4e~6,1f55~1,1f57~29,1f81~b,1f8d~25,1fb3~5,1fb9~1,1fbb~83b,27f7~74d,2f46~1,2f48~3,2f4c~1,2f4f~1,2f52~1,2f57~6,2f5e~1,2f60~1,2f62~1,2f66~1,2f68~12,2f7d~4,2f83~2,2f86~1,2f8d~5,2f93~2,2f96~1,2f98~1,2f9a~1,2f9e~12,2fb3~3,2fb8~2,2fbb~1,2fc2~5,2fc8~2,2fcb~1,2fcd~1,2fcf~1,2fd3~11,2fe7~3,2feb~1,2fee~2,2ff6~6,2ffd~1,2fff~1,3001~1,3005~1,3007~11,301b~3,301f~1,3023~1,3025~1,302a~6,3031~1,3033~1,3035~1,3037~1,303b~11,304f~3,3053~1,3056~1,3059~1,305e~6,3065~1,3067~1,3069~1,306b~1,306f~10,3083~1,3087~3,308c~2,308f~2,3097~4,309c~3,30a0~1,30a2~1,30a4~1,30a8~11,30c0~4,30c6~2,30cc~1,30d0~6,30d7~1,30d9~1,30db~1,30df~1,30e1~6,30e8~6,30ef~1,30f1~1,30f6~1,30f8~1,30fa~2,30fd~1,30ff~1,3103~1,3105~2,310a~3,310e~2,3111~1,3115~1,3117~6,311e~5,3125~1,3129~3,312d~1,312f~1,3135~1,3137~1,3139~5,313f~1,3141~2,3144~1,3148~1,314a~a,3156~1,3158~1,315a~2,315e~2,3166~4,316b~3,316f~2,3172~1,3174~1,3178~a,3185~3,3189~1,318c~1,3190~1,3194~6,319b~2,319e~1,31a0~1,31a2~1,31a6~a,31b2~1,31b4~2,31b7~1,31b9~1,31bc~1,31c2~6,31c9~2,31cc~1,31ce~1,31d0~1,31d4~9,31e1~3,31e6~2,31ec~1,31ef~1,31f1~5,31f7~1,31f9~2,31fc~1,31fe~1,3202~3,3206~1,3208~1,320a~1]

Alwin · Nov 5, 2019

abzsol said:
I think the wrong value is the "stored".

Might be, but I am not sure. Could you re-create the OSDs and see if that changes the issue?

abzsol · Nov 6, 2019

Alwin said:
Might be, but I am not sure. Could you re-create the OSDs and see if that changes the issue?

Could you describe me how can I recreate the OSDs in the right way?
Do I have to do it one at the time and let it balance?
Do I have to set the noout flag during procedure?

Thanks

Alwin · Nov 7, 2019

On the CLI you can run the following:

Code:

ceph osd out osd.<id>

Code:

ceph osd safe-to-destroy osd.<id>

Wait till this gives the OK, the data is rebalanced to a different OSD on the same node.

Code:

systemctl stop ceph-osd@<id>.service
pveceph osd destroy <id>

And then create the OSD again.

[SOLVED] Ceph Storage Confusion

Active Member

Active Member

Proxmox Retired Staff

Active Member

Proxmox Retired Staff

Renowned Member

Active Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff