Hello,
in my cluster consisting of 4 OSD nodes there's a HDD failure.
This affects currently 31 disks.
Each node has 48 HDDs à 2TB connected.
This results in this crushmap:
root hdd_strgbox {
id -17 # do not change unnecessarily
id -19 class hdd # do not change unnecessarily
id -21 class nvme # do not change unnecessarily
# weight 312.428
alg straw2
hash 0 # rjenkins1
item ld5505-hdd_strgbox weight 78.107
item ld5506-hdd_strgbox weight 78.107
item ld5507-hdd_strgbox weight 78.107
item ld5508-hdd_strgbox weight 78.107
This means there should be ~100TB disk storage available considering a replication factor 3.
Checking the disk utilization the related pool db_backup is shown with 94% used space or 54.5TiB.
root@ld3955:~# ceph df detail
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
398TiB 232TiB 166TiB 41.70 14.47M
POOLS:
NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED
backup 4 N/A N/A 0B 0 3.33TiB 0 0 0B 0B 0B
nvme 6 N/A N/A 0B 0 7.36TiB 0 0 0B 0B 0B
db_backup 11 N/A N/A 54.5TiB 94.23 3.33TiB 14285830 14.29M 213KiB 36.2MiB 163TiB
pve_cephfs_data 21 N/A N/A 242GiB 0.68 34.3TiB 64232 64.23k 80.6KiB 69.5KiB 726GiB
pve_cephfs_metadata 22 N/A N/A 126MiB 0 34.3TiB 53 53 42B 2.09KiB 378MiB
Even when I consider that one node goes down there should be more storage available with the available OSDs.
Can you please explain why Ceph displays 94% used disk space and a health warning?
root@ld3955:~# ceph health
HEALTH_WARN 3 backfillfull osd(s); 8 nearfull osd(s); 2 pool(s) backfillfull
THX
in my cluster consisting of 4 OSD nodes there's a HDD failure.
This affects currently 31 disks.
Each node has 48 HDDs à 2TB connected.
This results in this crushmap:
root hdd_strgbox {
id -17 # do not change unnecessarily
id -19 class hdd # do not change unnecessarily
id -21 class nvme # do not change unnecessarily
# weight 312.428
alg straw2
hash 0 # rjenkins1
item ld5505-hdd_strgbox weight 78.107
item ld5506-hdd_strgbox weight 78.107
item ld5507-hdd_strgbox weight 78.107
item ld5508-hdd_strgbox weight 78.107
This means there should be ~100TB disk storage available considering a replication factor 3.
Checking the disk utilization the related pool db_backup is shown with 94% used space or 54.5TiB.
root@ld3955:~# ceph df detail
GLOBAL:
SIZE AVAIL RAW USED %RAW USED OBJECTS
398TiB 232TiB 166TiB 41.70 14.47M
POOLS:
NAME ID QUOTA OBJECTS QUOTA BYTES USED %USED MAX AVAIL OBJECTS DIRTY READ WRITE RAW USED
backup 4 N/A N/A 0B 0 3.33TiB 0 0 0B 0B 0B
nvme 6 N/A N/A 0B 0 7.36TiB 0 0 0B 0B 0B
db_backup 11 N/A N/A 54.5TiB 94.23 3.33TiB 14285830 14.29M 213KiB 36.2MiB 163TiB
pve_cephfs_data 21 N/A N/A 242GiB 0.68 34.3TiB 64232 64.23k 80.6KiB 69.5KiB 726GiB
pve_cephfs_metadata 22 N/A N/A 126MiB 0 34.3TiB 53 53 42B 2.09KiB 378MiB
Even when I consider that one node goes down there should be more storage available with the available OSDs.
Can you please explain why Ceph displays 94% used disk space and a health warning?
root@ld3955:~# ceph health
HEALTH_WARN 3 backfillfull osd(s); 8 nearfull osd(s); 2 pool(s) backfillfull
THX