On which Ceph version are you (ceph versions)?Is this normal or is there an issue going on that I need to resolve?
Yes, ofc. Its the total used.Also when I move VMs off the ceph storage, the total amount decreases for some reason.
Sorry I meant it increases, not decreases haha.On which Ceph version are you (ceph versions)?
Yes, ofc. Its the total used.
root@roc-server01:~# ceph versions
{
"mon": {
"ceph version 14.2.1 (9257126ffb439de1652793b3e29f4c0b97a47b47) nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.1 (9257126ffb439de1652793b3e29f4c0b97a47b47) nautilus (stable)": 3
},
"osd": {
"ceph version 14.2.1 (9257126ffb439de1652793b3e29f4c0b97a47b47) nautilus (stable)": 6
},
"mds": {},
"overall": {
"ceph version 14.2.1 (9257126ffb439de1652793b3e29f4c0b97a47b47) nautilus (stable)": 12
}
}
What does a 'ceph df detail' show? And with the current Ceph Nautilus packages the on-disk format changed a little, as the storage calculation was changed to per pool and better reflect the storage rules. With the 14.2.2 version, a warning will show that the OSD format is old. That means, best re-create all the OSDs and the calculation should give more sane values.Sorry I meant it increases, not decreases haha.
Yes, in libpve-storage-perl: 6.0-8.@Alwin did an update or two ago make changes to the calculation of available storage for ceph, as it's showing usage as expected for me now.
proxmox-ve: 6.0-2 (running kernel: 5.0.21-3-pve)
pve-manager: 6.0-9 (running version: 6.0-9/508dcee0)
pve-kernel-5.0: 6.0-9
pve-kernel-helper: 6.0-9
pve-kernel-4.15: 5.4-9
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-4.15.18-21-pve: 4.15.18-48
pve-kernel-4.15.18-18-pve: 4.15.18-44
pve-kernel-4.15.18-9-pve: 4.15.18-30
ceph: 14.2.4-pve1
ceph-fuse: 14.2.4-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-5
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-8
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-7
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-9
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve1
What does aceph df detail
andceph -s
show?
root@pve01:~# ceph df detail
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
ssd 31 TiB 13 TiB 18 TiB 18 TiB 57.82
TOTAL 31 TiB 13 TiB 18 TiB 18 TiB 57.82
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL QUOTA OBJECTS QUOTA BYTES DIRTY USED COMPR UNDER COMPR
cephpool 1 13 TiB 3.58M 18 TiB 63.86 3.4 TiB N/A N/A 3.58M 0 B 0 B
root@pve01:~# ceph -s
cluster:
id: 7ced7402-a929-461a-bd40-53f863fa46ab
health: HEALTH_OK
services:
mon: 3 daemons, quorum pve01,pve02,pve03 (age 6h)
mgr: pve03(active, since 5h), standbys: pve02, pve01
osd: 9 osds: 9 up (since 5h), 9 in (since 4d)
data:
pools: 1 pools, 256 pgs
objects: 3.58M objects, 5.9 TiB
usage: 18 TiB used, 13 TiB / 31 TiB avail
pgs: 256 active+clean
io:
client: 4.4 MiB/s rd, 15 MiB/s wr, 101 op/s rd, 826 op/s wr
Yes i upgraded on 29-10-201 from 5.4 and luminous to 6 and nautilus following the guides.Did you run a recent upgrade? And about what are you concerned?
Did you upgrade all the OSDs to the new on-disk format yet?Yes i upgraded on 29-10-201 from 5.4 and luminous to 6 and nautilus following the guides.
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
cephpool 1 13 TiB 3.58M 18 TiB 63.86 3.4 TiB
Did you upgrade all the OSDs to the new on-disk format yet?
service ceph-osd@X stop && ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-X && service ceph-osd@X start
Code:POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL cephpool 1 13 TiB 3.58M 18 TiB 63.86 3.4 TiB
The usage on the storage tab is calculated from 'stored' + 'max avail' (if the stored field is available). The 'max avail' is without replica, so a 3*3.4=10.2 is the raw avail (rounding errors, as Ceph calculates in bytes).
The used (18TiB) is the raw used for this storage class (ssd).
What doesceph osd pool ls detail
say? As the 3.58M objects * 4 MiB objects are = ~13TiB.
The status shows 5.9 TiB, that is strange.Code:data: pools: 1 pools, 256 pgs objects: 3.58M objects, 5.9 TiB
root@pve01:~# ceph osd pool ls detail
pool 1 'cephpool' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 256 pgp_num 256 autoscale_mode warn last_change 12525 lfor 0/0/45 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
removed_snaps [1~3,5~6,d~fe,10c~93c,a49~9a0,13ec~b61,1f4e~6,1f55~1,1f57~29,1f81~b,1f8d~25,1fb3~5,1fb9~1,1fbb~83b,27f7~74d,2f46~1,2f48~3,2f4c~1,2f4f~1,2f52~1,2f57~6,2f5e~1,2f60~1,2f62~1,2f66~1,2f68~12,2f7d~4,2f83~2,2f86~1,2f8d~5,2f93~2,2f96~1,2f98~1,2f9a~1,2f9e~12,2fb3~3,2fb8~2,2fbb~1,2fc2~5,2fc8~2,2fcb~1,2fcd~1,2fcf~1,2fd3~11,2fe7~3,2feb~1,2fee~2,2ff6~6,2ffd~1,2fff~1,3001~1,3005~1,3007~11,301b~3,301f~1,3023~1,3025~1,302a~6,3031~1,3033~1,3035~1,3037~1,303b~11,304f~3,3053~1,3056~1,3059~1,305e~6,3065~1,3067~1,3069~1,306b~1,306f~10,3083~1,3087~3,308c~2,308f~2,3097~4,309c~3,30a0~1,30a2~1,30a4~1,30a8~11,30c0~4,30c6~2,30cc~1,30d0~6,30d7~1,30d9~1,30db~1,30df~1,30e1~6,30e8~6,30ef~1,30f1~1,30f6~1,30f8~1,30fa~2,30fd~1,30ff~1,3103~1,3105~2,310a~3,310e~2,3111~1,3115~1,3117~6,311e~5,3125~1,3129~3,312d~1,312f~1,3135~1,3137~1,3139~5,313f~1,3141~2,3144~1,3148~1,314a~a,3156~1,3158~1,315a~2,315e~2,3166~4,316b~3,316f~2,3172~1,3174~1,3178~a,3185~3,3189~1,318c~1,3190~1,3194~6,319b~2,319e~1,31a0~1,31a2~1,31a6~a,31b2~1,31b4~2,31b7~1,31b9~1,31bc~1,31c2~6,31c9~2,31cc~1,31ce~1,31d0~1,31d4~9,31e1~3,31e6~2,31ec~1,31ef~1,31f1~5,31f7~1,31f9~2,31fc~1,31fe~1,3202~3,3206~1,3208~1,320a~1]
Might be, but I am not sure. Could you re-create the OSDs and see if that changes the issue?I think the wrong value is the "stored".
Could you describe me how can I recreate the OSDs in the right way?Might be, but I am not sure. Could you re-create the OSDs and see if that changes the issue?
ceph osd out osd.<id>
ceph osd safe-to-destroy osd.<id>
systemctl stop ceph-osd@<id>.service
pveceph osd destroy <id>