Datacentre summary - Incorrect storage reported

Jun 8, 2016
344
75
93
48
Johannesburg, South Africa
Viewing available storage in the Datacentre summary screen reports 28% utilisation, which is missleading:
pve5_dc_summary.JPG
Ceph summary view reports 35% utilisation:
pve5_ceph_summary.JPG


It would, in my opinion, be better to forecast Ceph storage utilisation based on the current pool consumption. Whilst I understand pools may have different replication counts and/or erasure coding the current display is simply miss leading. It would probably be better to show the average OSD utilisation, for each group of OSDs that are part of a given pool and it's associated crush rule.

Code:
[root@kvm1 ~]# ceph osd crush tree --show-shadow
ID  CLASS WEIGHT  TYPE NAME
-12   ssd 2.72818 root default~ssd
 -9   ssd 0.90939     host kvm1~ssd
  2   ssd 0.45470         osd.2
  3   ssd 0.45470         osd.3
-10   ssd 0.90939     host kvm2~ssd
  5   ssd 0.45470         osd.5
  7   ssd 0.45470         osd.7
-11   ssd 0.90939     host kvm3~ssd
  4   ssd 0.45470         osd.4
 10   ssd 0.45470         osd.10
 -8   hdd 7.09387 root default~hdd
 -5   hdd 2.36462     host kvm1~hdd
  6   hdd 0.27271         osd.6
  8   hdd 0.27271         osd.8
 16   hdd 1.81921         osd.16
 -6   hdd 2.36462     host kvm2~hdd
  0   hdd 0.27271         osd.0
  1   hdd 0.27271         osd.1
 15   hdd 1.81921         osd.15
 -7   hdd 2.36462     host kvm3~hdd
  9   hdd 0.27271         osd.9
 11   hdd 0.27271         osd.11
 14   hdd 1.81921         osd.14

[root@kvm1 ~]# ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE   USE   AVAIL %USE  VAR  PGS
 6   hdd 0.27271  1.00000   279G  126G  152G 45.27 1.28 115
 8   hdd 0.27271  1.00000   279G  149G  130G 53.39 1.51 139
16   hdd 1.81921  1.00000  1862G  905G  956G 48.63 1.38 830
 2   ssd 0.45470  1.00000   465G 1550M  464G  0.33 0.01   0
 3   ssd 0.45470  1.00000   465G 1645M  464G  0.35 0.01   0
 0   hdd 0.27271  1.00000   279G  133G  145G 47.77 1.35 121
 1   hdd 0.27271  1.00000   279G  151G  128G 54.14 1.53 147
15   hdd 1.81921  1.00000  1862G  895G  967G 48.07 1.36 816
 5   ssd 0.45470  1.00000   465G 1376M  464G  0.29 0.01   0
 7   ssd 0.45470  1.00000   465G 1462M  464G  0.31 0.01   0
 9   hdd 0.27271  1.00000   279G  157G  121G 56.34 1.60 142
11   hdd 0.27271  1.00000   279G  141G  137G 50.76 1.44 133
14   hdd 1.81921  1.00000  1862G  881G  981G 47.30 1.34 809
 4   ssd 0.45470  1.00000   465G 1603M  464G  0.34 0.01   0
10   ssd 0.45470  1.00000   465G 1471M  464G  0.31 0.01   0
                    TOTAL 10058G 3550G 6507G 35.30
MIN/MAX VAR: 0.01/1.60  STDDEV: 25.09

There are two main pools here, rbd_hdd which references 'hdd' OSDs and rbd_ssd which references ssd OSDs...
 
Viewing available storage in the Datacentre summary screen reports 28% utilisation, which is missleading:
this does incorporate all storages (incl. local), you can select which storages are counted with the settings popup under the 'gear' symbol on the top right
 
Thanks dcsapak,

The reported storage availability information is immensely useful, especially as thin provisioned LVM2 pools and Ceph will fail catastrophically, should they run out of available storage. When changing the settings via the gear symbol I can now exclusively show stats for

The current dashboard however still indicates that only 34% has been utilised. My argument is however still that the display needs to be different when running Ceph as at most a third of the remaining space is actually available. We have actually allocated all available space and exclusively haven't run out of storage due to sparse provisioning.

ie: An Administrator would need to continually and carefully monitor the actual amount of available storage.

raw utilisation: 14.88 TB used of 43.64 TB available (34%)
allocated utilisation: -2 TB available
 
this does incorporate all storages (incl. local), you can select which storages are counted with the settings popup under the 'gear' symbol on the top right
Well that just made that graph useful... I do want to note that a working datacenter will have different storage back ends for different types of storage; adding all of them together has no practical value- I would want to know if any of my specific storage TYPES are nearing full, not all of the collectively.
 
  • Like
Reactions: brwainer