Failed OSD for ceph stale pg

hahosting · Nov 24, 2023

Hi Guys,

Long shot but we have an old 2/1 pool, on our proxmox install hypoerconverged and have lost an osd. I now have 3 stale pg's funnily enough showing as on this osd.

Is there any way I can try to recover data from the failed disk and import it back in? The disk shows in the node but just never starts normaly with an input out put error

Ceph -s

root@vms-ceph121:/home/richard.admin# ceph -s
cluster:
id: 93cc6f61-f2dd-4346-8813-71de1fa1c221
health: HEALTH_WARN
noout,nobackfill,norecover flag(s) set
2 osds down
1 host (1 osds) down
2 nearfull osd(s)
Reduced data availability: 3 pgs stale
Degraded data redundancy: 318198/47393185 objects degraded (0.671%), 555 pgs degraded, 214 pgs undersized
2 pool(s) nearfull
768 slow ops, oldest one blocked for 15390 sec, daemons [osd.1701,osd.1706,osd.509] have slow ops.

services:
mon: 9 daemons, quorum vms-ceph110,vms-ceph112,vms-ceph113,vms-ceph114,vms-ceph117,vms-ceph120,vms-ceph119,vms-ceph106,vms-ceph121 (age 29m)
mgr: vms-ceph113(active, since 7w), standbys: vms-ceph106, vms-ceph110, vms-ceph114, vms-ceph117, vms-ceph119, vms-ceph120, vms-ceph121, vms-ceph112
osd: 107 osds: 104 up (since 28m), 106 in (since 88m); 235 remapped pgs
flags noout,nobackfill,norecover

data:
pools: 5 pools, 2625 pgs
objects: 16.84M objects, 63 TiB
usage: 179 TiB used, 130 TiB / 309 TiB avail
pgs: 318198/47393185 objects degraded (0.671%)
179854/47393185 objects misplaced (0.379%)
1900 active+clean
321 active+recovery_wait+degraded
169 active+recovery_wait+undersized+degraded+remapped
99 active+recovery_wait
60 active+clean+remapped
37 active+undersized+degraded
19 active+recovering+degraded
11 active+recovering
5 active+recovering+undersized+degraded+remapped
3 stale+active+undersized+degraded
1 active+recovering+degraded+remapped

io:
client: 59 MiB/s rd, 26 MiB/s wr, 698 op/s rd, 548 op/s wr
recovery: 679 B/s, 0 objects/s

progress:
Global Recovery Event (5h)
[====================........] (remaining: 112m)

root@vms-ceph121:/home/richard.admin# ceph health detail
HEALTH_WARN noout,nobackfill,norecover flag(s) set; 2 osds down; 1 host (1 osds) down; 2 nearfull osd(s); Reduced data availability: 3 pgs stale; Degraded data redundancy: 318175/47393245 objects degraded (0.671%), 542 pgs degraded, 214 pgs undersized; 2 pool(s) nearfull; 768 slow ops, oldest one blocked for 15441 sec, daemons [osd.1701,osd.1706,osd.509] have slow ops.
[WRN] OSDMAP_FLAGS: noout,nobackfill,norecover flag(s) set
[WRN] OSD_DOWN: 2 osds down
osd.1203 (root=youtrack-general-root,host=vms-ceph112_youtrack-general_rows) is down
osd.2112 (root=ha-ssd-root,host=vms-ceph121_ha-ssd) is down
[WRN] OSD_HOST_DOWN: 1 host (1 osds) down
host vms-ceph121_ha-ssd (root=ha-ssd-root) (1 osds) is down
[WRN] OSD_NEARFULL: 2 nearfull osd(s)
osd.1406 is near full
osd.2005 is near full
[WRN] PG_AVAILABILITY: Reduced data availability: 3 pgs stale
pg 8.11 is stuck stale for 5h, current state stale+active+undersized+degraded, last acting [1203]
pg 8.1e0 is stuck stale for 5h, current state stale+active+undersized+degraded, last acting [1203]
pg 8.351 is stuck stale for 5h, current state stale+active+undersized+degraded, last acting [1203]
[WRN] PG_DEGRADED: Degraded data redundancy: 318175/47393245 objects degraded (0.671%), 542 pgs degraded, 214 pgs undersized
pg 58.380 is active+recovery_wait+degraded, acting [1202,704,604]
pg 58.386 is stuck undersized for 28m, current state active+recovery_wait+undersized+degraded+remapped, last acting [1601,1904]
pg 58.388 is active+recovery_wait+degraded, acting [1208,1501,1901]
pg 58.393 is stuck undersized for 57m, current state active+recovery_wait+undersized+degraded+remapped, last acting [1007,1107]
pg 58.394 is active+recovery_wait+degraded, acting [1208,801,1604]
pg 58.395 is active+recovery_wait+degraded, acting [1201,801,1104]
pg 58.397 is stuck undersized for 28m, current state active+recovery_wait+undersized+degraded+remapped, last acting [801,604]
pg 58.398 is active+recovery_wait+degraded, acting [1201,704,1708]
pg 58.39c is active+recovery_wait+degraded, acting [1201,1607,1708]
pg 58.39d is stuck undersized for 28m, current state active+recovery_wait+undersized+degraded+remapped, last acting [607,1104]
pg 58.3a1 is active+recovery_wait+degraded, acting [1904,1205,601]
pg 58.3a4 is stuck undersized for 28m, current state active+recovery_wait+undersized+degraded+remapped, last acting [1604,2101]
pg 58.3a5 is stuck undersized for 57m, current state active+recovery_wait+undersized+degraded+remapped, last acting [1901,1007]
pg 58.3a6 is stuck undersized for 57m, current state active+recovery_wait+undersized+degraded+remapped, last acting [804,1607]
pg 58.3a9 is active+recovery_wait+degraded, acting [601,401,1208]
pg 58.3ad is active+recovery_wait+degraded, acting [2107,1202,1001]
pg 58.3ae is active+recovery_wait+degraded, acting [1104,1708,1204]
pg 58.3af is active+recovery_wait+degraded, acting [2104,1702,1004]
pg 58.3c0 is active+recovery_wait+degraded, acting [2104,1202,407]
pg 58.3c1 is stuck undersized for 28m, current state active+recovery_wait+undersized+degraded+remapped, last acting [707,1705]
pg 58.3c2 is active+recovery_wait+degraded, acting [1204,404,1708]
pg 58.3c3 is active+recovery_wait+degraded, acting [1101,807,1208]
pg 58.3c7 is active+recovery_wait+degraded, acting [1004,1507,1207]
pg 58.3c8 is stuck undersized for 28m, current state active+recovery_wait+undersized+degraded+remapped, last acting [1504,2101]
pg 58.3c9 is stuck undersized for 57m, current state active+recovery_wait+undersized+degraded+remapped, last acting [1702,1504]
pg 58.3ca is stuck undersized for 28m, current state active+recovery_wait+undersized+degraded+remapped, last acting [1604,804]
pg 58.3cb is active+recovery_wait+degraded, acting [804,1901,1205]
pg 58.3cc is stuck undersized for 28m, current state active+recovery_wait+undersized+degraded+remapped, last acting [1507,1607]
pg 58.3cd is active+recovery_wait+degraded, acting [1208,1004,1104]
pg 58.3ce is stuck undersized for 28m, current state active+recovery_wait+undersized+degraded+remapped, last acting [1507,1104]
pg 58.3cf is stuck undersized for 28m, current state active+recovery_wait+undersized+degraded+remapped, last acting [804,1708]
pg 58.3d2 is active+recovery_wait+degraded, acting [2107,1702,1107]
pg 58.3d6 is active+recovery_wait+degraded, acting [404,1208,2104]
pg 58.3d8 is active+recovery_wait+degraded, acting [2104,804,707]
pg 58.3db is stuck undersized for 28m, current state active+recovery_wait+undersized+degraded+remapped, last acting [1705,1601]
pg 58.3de is stuck undersized for 28m, current state active+recovery_wait+undersized+degraded+remapped, last acting [701,1501]
pg 58.3df is active+recovery_wait+degraded, acting [1504,2104,607]
pg 58.3e0 is active+recovery_wait+degraded, acting [701,1204,1904]
pg 58.3e7 is active+recovery_wait+degraded, acting [1607,1202,1901]
pg 58.3e9 is active+recovery_wait+degraded, acting [1208,1104,1507]
pg 58.3eb is active+recovery_wait+degraded, acting [607,401,1205]
pg 58.3ee is active+recovery_wait+degraded, acting [1705,807,1201]
pg 58.3f1 is active+recovery_wait+degraded, acting [1201,1507,1107]
pg 58.3f3 is stuck undersized for 57m, current state active+recovery_wait+undersized+degraded+remapped, last acting [801,1901]
pg 58.3f5 is stuck undersized for 28m, current state active+recovery_wait+undersized+degraded+remapped, last acting [704,2101]
pg 58.3f6 is stuck undersized for 28m, current state active+recovery_wait+undersized+degraded+remapped, last acting [401,607]
pg 58.3f8 is stuck undersized for 57m, current state active+recovery_wait+undersized+degraded+remapped, last acting [1507,701]
pg 58.3f9 is active+recovery_wait+degraded, acting [1205,607,1901]
pg 58.3fa is active+recovery_wait+degraded, acting [1202,1104,707]
pg 58.3fd is active+recovery_wait+degraded, acting [1208,1705,404]
pg 58.3ff is active+recovery_wait+degraded, acting [1705,807,1207]
[WRN] POOL_NEARFULL: 2 pool(s) nearfull
pool 'youtrack-general-pool' is nearfull
pool 'device_health_metrics' is nearfull
[WRN] SLOW_OPS: 768 slow ops, oldest one blocked for 15441 sec, daemons [osd.1701,osd.1706,osd.509] have slow ops.
root@vms-ceph121:/home/richard.admin#

hahosting · Nov 24, 2023

root@vms-ceph116:/home/richard.admin# ceph pg map 8.11
osdmap e215254 pg 8.11 (8.11) -> up [509] acting [509]
root@vms-ceph116:/home/richard.admin#

hahosting · Nov 24, 2023

a query just never returns a result

root@vms-ceph112:/home/richard.admin# ceph pg 8.11 query

hahosting · Nov 24, 2023

I cant shed any light on why it thinks the PG's are only on one osd though

[WRN] PG_AVAILABILITY: Reduced data availability: 3 pgs stale
pg 8.11 is stuck stale for 5h, current state stale+active+undersized+degraded, last acting [1203]
pg 8.1e0 is stuck stale for 5h, current state stale+active+undersized+degraded, last acting [1203]
pg 8.351 is stuck stale for 5h, current state stale+active+undersized+degraded, last acting [1203]

sb-jw · Nov 24, 2023

Phew, you've done pretty much everything wrong that could be done wrong... But more on that later.

hahosting said:
2 osds down
1 host (1 osds) down
2 nearfull osd(s)

You should fix that first, there are two OSDs down, not one. At 2/1 that could mean the end of this data.
And please use the code tags here, no one else can read them.

hahosting said:
noout,nobackfill,norecover flag(s) set

nobackfill, norecover, norebalance - recovery or datarebalancing is suspended

Why did you put that?

=> https://docs.ceph.com/en/quincy/rados/operations/health-checks/#osdmap-flags

hahosting · Nov 24, 2023

Thank you yes 2 are down and i can't get them to come online at this point. All our other pools are 3/2 this is just an old legacy one before my time here.

hahosting · Nov 24, 2023

ive tried everything to get the drives to mount but they just wont have it i will see if i can find why

sb-jw · Nov 24, 2023

Please post the Output of the Commands ceph osd df tree and ceph df

Use the code tags please!

hahosting · Nov 24, 2023

Code:

root@vms-ceph112:/home/richard.admin# ceph df
--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    267 TiB  117 TiB  150 TiB   150 TiB      56.26
ssd     42 TiB   13 TiB   29 TiB    29 TiB      68.53
TOTAL  309 TiB  130 TiB  179 TiB   179 TiB      57.93
 
--- POOLS ---
POOL                   ID   PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
xx-general-pool   8  1024   11 TiB    3.12M   11 TiB  79.02    1.5 TiB
ha-general-32-pool     58  1024   15 TiB    3.92M   15 TiB  48.66    5.1 TiB
device_health_metrics  66     1  298 MiB       67  298 MiB      0    1.5 TiB
ha-backup-pool         67   512   35 TiB    9.27M   35 TiB  55.35    9.5 TiB
ha-ssd-32              68    64  2.0 TiB  524.43k  2.0 TiB  50.07    837 GiB
root@vms-ceph112:/home/richard.admin#

sb-jw · Nov 24, 2023

This still looks usable. But the output from the other command is more relevant.

hahosting · Nov 24, 2023

Code:

root@vms-ceph112:/home/richard.admin# ceph osd df tree
ID    CLASS  WEIGHT    REWEIGHT  SIZE     RAW USE   DATA      OMAP      META     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME                                 
-133         39.00000         -   72 TiB    44 TiB    43 TiB    14 MiB  104 GiB   28 TiB  60.75  1.05    -          root ha-general-replication-root         
-147          3.00000         -  5.7 TiB   3.7 TiB   3.5 TiB   1.1 MiB  8.4 GiB  2.0 TiB  64.96  1.12    -              host vms-ceph106_ha-general_rows     
 601    hdd   1.00000   1.00000  1.9 TiB   1.3 TiB   1.2 TiB   365 KiB  3.0 GiB  653 GiB  66.24  1.14   85      up          osd.601                           
 604    hdd   1.00000   1.00000  1.9 TiB   1.2 TiB   1.1 TiB   374 KiB  2.6 GiB  724 GiB  62.56  1.08   80      up          osd.604                           
 607    hdd   1.00000   1.00000  1.9 TiB   1.2 TiB   1.2 TiB   387 KiB  2.8 GiB  656 GiB  66.07  1.14   85      up          osd.607                           
 -94          3.00000         -  5.7 TiB   3.6 TiB   3.4 TiB   1.1 MiB  8.3 GiB  2.1 TiB  63.71  1.10    -              host vms-ceph110_ha-general_rows     
1001    hdd   1.00000   1.00000  1.9 TiB   1.3 TiB   1.3 TiB   354 KiB  3.0 GiB  556 GiB  71.21  1.23   92      up          osd.1001                         
1004    hdd   1.00000   1.00000  1.9 TiB   1.2 TiB   1.1 TiB   372 KiB  2.8 GiB  738 GiB  61.80  1.07   79      up          osd.1004                         
1007    hdd   1.00000   1.00000  1.9 TiB   1.1 TiB   1.0 TiB   367 KiB  2.5 GiB  810 GiB  58.12  1.00   74      up          osd.1007                         
 -45                0         -      0 B       0 B       0 B       0 B      0 B      0 B      0     0    -              host vms-ceph111_ha-general_rows     
  -2          6.00000         -   11 TiB   6.2 TiB   6.2 TiB   2.0 MiB   13 GiB  4.7 TiB  56.61  0.98    -              host vms-ceph112_ha-general_rows     
1201    hdd   1.00000   1.00000  1.8 TiB   967 GiB   965 GiB   336 KiB  2.1 GiB  896 GiB  51.89  0.90   68      up          osd.1201                         
1202    hdd   1.00000   1.00000  1.8 TiB   970 GiB   968 GiB   348 KiB  1.9 GiB  893 GiB  52.07  0.90   68      up          osd.1202                         
1204    hdd   1.00000   1.00000  1.8 TiB   971 GiB   969 GiB   343 KiB  2.0 GiB  892 GiB  52.11  0.90   68      up          osd.1204                         
1205    hdd   1.00000   1.00000  1.8 TiB   1.1 TiB   1.1 TiB   352 KiB  2.5 GiB  709 GiB  61.93  1.07   81      up          osd.1205                         
1207    hdd   1.00000   1.00000  1.8 TiB  1013 GiB  1011 GiB   341 KiB  2.1 GiB  850 GiB  54.35  0.94   71      up          osd.1207                         
1208    hdd   1.00000   1.00000  1.8 TiB   1.2 TiB   1.2 TiB   348 KiB  2.8 GiB  609 GiB  67.29  1.16   88      up          osd.1208                         
 -14          6.00000         -   11 TiB   5.7 TiB   5.6 TiB   2.0 MiB   16 GiB  5.3 TiB  51.77  0.89    -              host vms-ceph113_ha-general_rows     
 401    hdd   1.00000   1.00000  1.8 TiB   881 GiB   879 GiB   339 KiB  2.5 GiB  982 GiB  47.28  0.82   62      up          osd.401                           
 404    hdd   1.00000   1.00000  1.8 TiB   928 GiB   927 GiB   341 KiB  2.7 GiB  935 GiB  49.84  0.86   65      up          osd.404                           
 407    hdd   1.00000   1.00000  1.8 TiB   826 GiB   825 GiB   336 KiB  2.5 GiB  1.0 TiB  44.36  0.77   58      up          osd.407                           
 801    hdd   1.00000   1.00000  1.8 TiB   984 GiB   982 GiB   336 KiB  2.5 GiB  879 GiB  52.83  0.91   69      up          osd.801                           
 804    hdd   1.00000   1.00000  1.8 TiB   1.1 TiB   1.1 TiB   352 KiB  3.1 GiB  721 GiB  61.29  1.06   80      up          osd.804                           
 807    hdd   1.00000   1.00000  1.8 TiB   1.0 TiB  1023 GiB   348 KiB  2.5 GiB  838 GiB  55.02  0.95   72      up          osd.807                           
 -19          3.00000         -  5.5 TiB   3.1 TiB   3.1 TiB  1017 KiB  8.3 GiB  2.4 TiB  56.34  0.97    -              host vms-ceph114_ha-general_rows     
 701    hdd   1.00000   1.00000  1.8 TiB   924 GiB   922 GiB   336 KiB  2.5 GiB  939 GiB  49.59  0.86   65      up          osd.701                           
 704    hdd   1.00000   1.00000  1.8 TiB   929 GiB   927 GiB   334 KiB  2.6 GiB  934 GiB  49.86  0.86   65      up          osd.704                           
 707    hdd   1.00000   1.00000  1.8 TiB   1.3 TiB   1.3 TiB   347 KiB  3.2 GiB  567 GiB  69.57  1.20   91      up          osd.707                           
 -20          3.00000         -  5.5 TiB   3.5 TiB   3.5 TiB   1.0 MiB  7.7 GiB  1.9 TiB  64.81  1.12    -              host vms-ceph115_ha-general_rows     
1501    hdd   1.00000   1.00000  1.8 TiB   1.1 TiB   1.1 TiB   345 KiB  2.5 GiB  686 GiB  63.16  1.09   83      up          osd.1501                         
1504    hdd   1.00000   1.00000  1.8 TiB   1.3 TiB   1.3 TiB   360 KiB  2.5 GiB  523 GiB  71.90  1.24   94      up          osd.1504                         
1507    hdd   1.00000   1.00000  1.8 TiB   1.1 TiB   1.1 TiB   347 KiB  2.7 GiB  757 GiB  59.37  1.02   78      up          osd.1507                         
 -28          3.00000         -  5.7 TiB   3.9 TiB   3.7 TiB   1.1 MiB  8.7 GiB  1.8 TiB  68.80  1.19    -              host vms-ceph116_ha-general_rows     
1601    hdd   1.00000   1.00000  1.9 TiB   1.4 TiB   1.3 TiB   363 KiB  2.6 GiB  500 GiB  74.15  1.28   96      up          osd.1601                         
1604    hdd   1.00000   1.00000  1.9 TiB   1.2 TiB   1.2 TiB   369 KiB  2.9 GiB  684 GiB  64.61  1.12   83      up          osd.1604                         
1607    hdd   1.00000   1.00000  1.9 TiB   1.3 TiB   1.2 TiB   368 KiB  3.2 GiB  625 GiB  67.65  1.17   87      up          osd.1607                         
 -33          3.00000         -  5.5 TiB   3.6 TiB   3.6 TiB   1.1 MiB  9.0 GiB  1.9 TiB  66.03  1.14    -              host vms-ceph117_ha-general_rows     
1702    hdd   1.00000   1.00000  1.8 TiB   1.3 TiB   1.3 TiB   347 KiB  3.1 GiB  538 GiB  71.14  1.23   93      up          osd.1702                         
1705    hdd   1.00000   1.00000  1.8 TiB   1.3 TiB   1.3 TiB   384 KiB  3.4 GiB  539 GiB  71.07  1.23   93      up          osd.1705                         
1708    hdd   1.00000   1.00000  1.8 TiB   1.0 TiB   1.0 TiB   352 KiB  2.6 GiB  822 GiB  55.88  0.96   73      up          osd.1708                         
 -46                0         -      0 B       0 B       0 B       0 B      0 B      0 B      0     0    -              host vms-ceph118_ha-general_rows     
-100          3.00000         -  5.7 TiB   3.5 TiB   3.3 TiB   1.1 MiB  8.6 GiB  2.1 TiB  62.06  1.07    -              host vms-ceph119_ha-general_rows     
1901    hdd   1.00000   1.00000  1.9 TiB   1.4 TiB   1.3 TiB   368 KiB  3.1 GiB  525 GiB  72.86  1.26   94      up          osd.1901                         
1904    hdd   1.00000   1.00000  1.9 TiB  1011 GiB   941 GiB   351 KiB  2.6 GiB  922 GiB  52.31  0.90   66      up          osd.1904                         
1907    hdd   1.00000   1.00000  1.9 TiB   1.2 TiB   1.1 TiB   359 KiB  2.9 GiB  754 GiB  61.00  1.05   78      up          osd.1907                         
-112          3.00000         -  5.5 TiB   3.4 TiB   3.4 TiB   1.1 MiB  8.4 GiB  2.0 TiB  62.92  1.09    -              host vms-ceph120_ha-general_rows     
1101    hdd   1.00000   1.00000  1.8 TiB   1.2 TiB   1.2 TiB   369 KiB  2.8 GiB  666 GiB  64.27  1.11   84      up          osd.1101                         
1104    hdd   1.00000   1.00000  1.8 TiB   1.1 TiB   1.1 TiB   352 KiB  2.7 GiB  709 GiB  61.92  1.07   81      up          osd.1104                         
1107    hdd   1.00000   1.00000  1.8 TiB   1.1 TiB   1.1 TiB   361 KiB  2.9 GiB  697 GiB  62.59  1.08   82      up          osd.1107                         
-109          3.00000         -  5.7 TiB   3.6 TiB   3.3 TiB   1.1 MiB  7.3 GiB  2.1 TiB  62.70  1.08    -              host vms-ceph121_ha-general_rows     
2101    hdd   1.00000   1.00000  1.9 TiB   1.1 TiB   1.1 TiB   377 KiB  2.2 GiB  783 GiB  59.52  1.03   76      up          osd.2101                         
2104    hdd   1.00000   1.00000  1.9 TiB   1.2 TiB   1.2 TiB   377 KiB  2.7 GiB  653 GiB  66.20  1.14   85      up          osd.2104                         
2107    hdd   1.00000   1.00000  1.9 TiB   1.2 TiB   1.1 TiB   361 KiB  2.5 GiB  727 GiB  62.39  1.08   80      up          osd.2107                         
-115          5.00000         -  8.7 TiB   5.9 TiB   5.9 TiB   1.7 MiB   15 GiB  2.8 TiB  67.70  1.17    -          root ha-ssd-root                         
-130                0         -      0 B       0 B       0 B       0 B      0 B      0 B      0     0    -              host vms-ceph106_ha-ssd               
-129                0         -      0 B       0 B       0 B       0 B      0 B      0 B      0     0    -              host vms-ceph110_ha-ssd               
-128                0         -      0 B       0 B       0 B       0 B      0 B      0 B      0     0    -              host vms-ceph112_ha-ssd               
-127                0         -      0 B       0 B       0 B       0 B      0 B      0 B      0     0    -              host vms-ceph113_ha-ssd               
-126                0         -      0 B       0 B       0 B       0 B      0 B      0 B      0     0    -              host vms-ceph114_ha-ssd               
-125          1.00000         -  1.7 TiB   1.1 TiB   1.1 TiB   344 KiB  2.5 GiB  623 GiB  65.18  1.13    -              host vms-ceph115_ha-ssd               
1512    ssd   1.00000   1.00000  1.7 TiB   1.1 TiB   1.1 TiB   344 KiB  2.5 GiB  623 GiB  65.18  1.13   37      up          osd.1512                         
-124          1.00000         -  1.7 TiB   1.3 TiB   1.3 TiB   344 KiB  3.5 GiB  495 GiB  72.33  1.25    -              host vms-ceph116_ha-ssd               
1611    ssd   1.00000   1.00000  1.7 TiB   1.3 TiB   1.3 TiB   344 KiB  3.5 GiB  495 GiB  72.33  1.25   41      up          osd.1611                         
-123          1.00000         -  1.7 TiB   1.2 TiB   1.2 TiB   341 KiB  3.3 GiB  559 GiB  68.75  1.19    -              host vms-ceph117_ha-ssd               
1712    ssd   1.00000   1.00000  1.7 TiB   1.2 TiB   1.2 TiB   341 KiB  3.3 GiB  559 GiB  68.75  1.19   39      up          osd.1712                         
-122                0         -      0 B       0 B       0 B       0 B      0 B      0 B      0     0    -              host vms-ceph119_ha-ssd               
-121          1.00000         -  1.7 TiB   1.2 TiB   1.2 TiB   347 KiB  3.2 GiB  590 GiB  66.99  1.16    -              host vms-ceph120_ha-ssd               
2012    ssd   1.00000   1.00000  1.7 TiB   1.2 TiB   1.2 TiB   347 KiB  3.2 GiB  590 GiB  66.99  1.16   38      up          osd.2012                         
-116          1.00000         -  1.7 TiB   1.1 TiB   1.1 TiB   337 KiB  3.0 GiB  622 GiB  65.24  1.13    -              host vms-ceph121_ha-ssd               
2112    ssd   1.00000   1.00000  1.7 TiB   1.1 TiB   1.1 TiB   337 KiB  3.0 GiB  622 GiB  65.24  1.13    0    down          osd.2112

hahosting · Nov 24, 2023

having to break it down its too large

hahosting · Nov 24, 2023

Code:

 -57         23.00000         -  195 TiB   106 TiB   106 TiB   7.4 MiB  234 GiB   88 TiB  54.60  0.94    -          root ha-backup-root                       
 -58                0         -      0 B       0 B       0 B       0 B      0 B      0 B      0     0    -              host vms-ceph106_ha-backup           
 -63                0         -      0 B       0 B       0 B       0 B      0 B      0 B      0     0    -              host vms-ceph110_ha-backup           
 -64                0         -      0 B       0 B       0 B       0 B      0 B      0 B      0     0    -              host vms-ceph112_ha-backup           
 -65                0         -      0 B       0 B       0 B       0 B      0 B      0 B      0     0    -              host vms-ceph113_ha-backup           
 -67          3.00000         -   25 TiB    14 TiB    14 TiB   981 KiB   30 GiB   11 TiB  54.86  0.95    -              host vms-ceph114_ha-backup           
1413    hdd   1.00000   1.00000  7.3 TiB   5.3 TiB   5.3 TiB   327 KiB   12 GiB  2.0 TiB  73.20  1.26   77      up          osd.1413                         
1416    hdd   1.00000   1.00000  9.1 TiB   4.5 TiB   4.5 TiB   330 KiB  9.1 GiB  4.6 TiB  49.46  0.85   65      up          osd.1416                         
1419    hdd   1.00000   1.00000  9.1 TiB   4.1 TiB   4.1 TiB   324 KiB  8.3 GiB  4.9 TiB  45.59  0.79   60      up          osd.1419                         
 -69          4.00000         -   35 TiB    17 TiB    17 TiB   1.3 MiB   38 GiB   17 TiB  50.47  0.87    -              host vms-ceph115_ha-backup           
1503    hdd   1.00000   1.00000  7.3 TiB   4.0 TiB   3.9 TiB   331 KiB  8.1 GiB  3.3 TiB  54.28  0.94   57      up          osd.1503                         
1506    hdd   1.00000   1.00000  9.1 TiB   4.0 TiB   4.0 TiB   330 KiB   11 GiB  5.1 TiB  44.22  0.76   58      up          osd.1506                         
1509    hdd   1.00000   1.00000  9.1 TiB   4.8 TiB   4.8 TiB   331 KiB  9.7 GiB  4.3 TiB  52.46  0.91   69      up          osd.1509                         
1519    hdd   1.00000   1.00000  9.1 TiB   4.7 TiB   4.7 TiB   329 KiB  8.5 GiB  4.4 TiB  51.70  0.89   68      up          osd.1519                         
 -70          3.00000         -   25 TiB    14 TiB    14 TiB   976 KiB   32 GiB   12 TiB  53.82  0.93    -              host vms-ceph116_ha-backup           
1602    hdd   1.00000   1.00000  7.3 TiB   4.5 TiB   4.5 TiB   325 KiB  9.5 GiB  2.8 TiB  61.83  1.07   65      up          osd.1602                         
1605    hdd   1.00000   1.00000  9.1 TiB   4.4 TiB   4.4 TiB   326 KiB  9.0 GiB  4.7 TiB  47.95  0.83   63      up          osd.1605                         
1618    hdd   1.00000   1.00000  9.1 TiB   4.8 TiB   4.8 TiB   325 KiB   13 GiB  4.2 TiB  53.28  0.92   70      up          osd.1618                         
 -71          3.00000         -   25 TiB    15 TiB    15 TiB   986 KiB   39 GiB   10 TiB  58.96  1.02    -              host vms-ceph117_ha-backup           
1713    hdd   1.00000   1.00000  7.3 TiB   4.8 TiB   4.8 TiB   331 KiB  8.9 GiB  2.5 TiB  65.60  1.13   69      up          osd.1713                         
1716    hdd   1.00000   1.00000  9.1 TiB   4.9 TiB   4.9 TiB   326 KiB   15 GiB  4.2 TiB  54.04  0.93   71      up          osd.1716                         
1719    hdd   1.00000   1.00000  9.1 TiB   5.3 TiB   5.3 TiB   329 KiB   16 GiB  3.8 TiB  58.58  1.01   77      up          osd.1719                         
 -72          4.00000         -   33 TiB    18 TiB    18 TiB   1.3 MiB   42 GiB   14 TiB  56.00  0.97    -              host vms-ceph119_ha-backup           
1903    hdd   1.00000   1.00000  7.3 TiB   5.7 TiB   5.7 TiB   332 KiB   16 GiB  1.6 TiB  77.97  1.35   82      up          osd.1903                         
1906    hdd   1.00000   1.00000  9.1 TiB   3.7 TiB   3.7 TiB   330 KiB  7.7 GiB  5.4 TiB  40.33  0.70   53      up          osd.1906                         
1909    hdd   1.00000   1.00000  9.1 TiB   4.7 TiB   4.7 TiB   328 KiB  8.8 GiB  4.4 TiB  51.70  0.89   68      up          osd.1909                         
1912    hdd   1.00000   1.00000  7.3 TiB   4.3 TiB   4.3 TiB   326 KiB  9.9 GiB  3.0 TiB  59.00  1.02   62      up          osd.1912                         
 -73          3.00000         -   25 TiB    14 TiB    14 TiB   977 KiB   26 GiB   11 TiB  55.66  0.96    -              host vms-ceph120_ha-backup           
2009    hdd   1.00000   1.00000  9.1 TiB   4.5 TiB   4.5 TiB   324 KiB  8.7 GiB  4.6 TiB  49.39  0.85   65      up          osd.2009                         
2013    hdd   1.00000   1.00000  7.3 TiB   5.1 TiB   5.1 TiB   326 KiB  9.5 GiB  2.2 TiB  70.34  1.21   74      up          osd.2013                         
2019    hdd   1.00000   1.00000  9.1 TiB   4.6 TiB   4.6 TiB   327 KiB  8.0 GiB  4.5 TiB  50.18  0.87   66      up          osd.2019                         
-103          3.00000         -   25 TiB    14 TiB    14 TiB   986 KiB   27 GiB   12 TiB  53.48  0.92    -              host vms-ceph121_ha-backup           
2103    hdd   1.00000   1.00000  7.3 TiB   4.6 TiB   4.6 TiB   329 KiB  8.4 GiB  2.7 TiB  62.73  1.08   66      up          osd.2103                         
2106    hdd   1.00000   1.00000  9.1 TiB   4.4 TiB   4.4 TiB   329 KiB  8.8 GiB  4.7 TiB  48.61  0.84   64      up          osd.2106                         
2109    hdd   1.00000   1.00000  9.1 TiB   4.6 TiB   4.6 TiB   328 KiB  9.6 GiB  4.5 TiB  50.94  0.88   67      up          osd.2109

sb-jw · Nov 24, 2023

What is nearfull ratio set to?

ceph osd dump | grep ratio

hahosting · Nov 24, 2023

Code:

  -1         40.00000         -   33 TiB    23 TiB    18 TiB   2.5 GiB   48 GiB   10 TiB  68.75  1.19    -          root xx-general-root               
 -68          6.00000         -  5.2 TiB   3.4 TiB   1.8 TiB   735 MiB      0 B  1.8 TiB  65.22  1.13    -              host vms-ceph106_xx-general_rows
 603    ssd   1.00000   1.00000  894 GiB   575 GiB   318 GiB   100 MiB      0 B  318 GiB  64.39  1.11   50      up          osd.603                           
 606    ssd   1.00000   1.00000  894 GiB   609 GiB   285 GiB   120 MiB      0 B  285 GiB  68.16  1.18   53      up          osd.606                           
 609    ssd   1.00000   1.00000  894 GiB   546 GiB   348 GiB   117 MiB      0 B  348 GiB  61.10  1.05   47      up          osd.609                           
 803    ssd   1.00000   1.00000  894 GiB   579 GiB   314 GiB   127 MiB      0 B  314 GiB  64.82  1.12   50      up          osd.803                           
 806    ssd   1.00000   1.00000  894 GiB   541 GiB   353 GiB   142 MiB      0 B  353 GiB  60.52  1.04   47      up          osd.806                           
 809    ssd   1.00000   1.00000  894 GiB   647 GiB   247 GiB   128 MiB      0 B  247 GiB  72.33  1.25   56      up          osd.809                           
 -91          6.00000         -  5.2 TiB   3.6 TiB   1.7 TiB  1006 MiB      0 B  1.7 TiB  67.84  1.17    -              host vms-ceph110_xx-general_rows
 903    ssd   1.00000   1.00000  894 GiB   610 GiB   283 GiB   207 MiB      0 B  283 GiB  68.30  1.18   53      up          osd.903                           
 906    ssd   1.00000   1.00000  894 GiB   509 GiB   385 GiB    90 MiB      0 B  385 GiB  56.94  0.98   44      up          osd.906                           
 909    ssd   1.00000   1.00000  894 GiB   680 GiB   214 GiB   267 MiB      0 B  214 GiB  76.09  1.31   59      up          osd.909                           
1002    ssd   1.00000   1.00000  894 GiB   624 GiB   270 GiB   137 MiB      0 B  270 GiB  69.78  1.20   54      up          osd.1002                         
1005    ssd   1.00000   1.00000  894 GiB   567 GiB   326 GiB   190 MiB      0 B  326 GiB  63.49  1.10   49      up          osd.1005                         
1008    ssd   1.00000   1.00000  894 GiB   648 GiB   246 GiB   115 MiB      0 B  246 GiB  72.45  1.25   56      up          osd.1008                         
 -49                0         -      0 B       0 B       0 B       0 B      0 B      0 B      0     0    -              host vms-ceph111_xx-general_rows
  -5          3.00000         -  1.7 TiB   1.3 TiB   1.3 TiB   643 KiB  3.1 GiB  467 GiB  73.87  1.28    -              host vms-ceph112_xx-general_rows
1203    ssd   1.00000   1.00000      0 B       0 B       0 B       0 B      0 B      0 B      0     0    3    down          osd.1203                         
1206    ssd   1.00000   1.00000  894 GiB   703 GiB   702 GiB   320 KiB  1.6 GiB  191 GiB  78.65  1.36   62      up          osd.1206                         
1209    ssd   1.00000   1.00000  894 GiB   618 GiB   616 GiB   323 KiB  1.5 GiB  276 GiB  69.10  1.19   54      up          osd.1209                         
  -9          3.00000         -  2.6 TiB   1.8 TiB   1.8 TiB   974 KiB  7.2 GiB  838 GiB  68.76  1.19    -              host vms-ceph113_xx-general_rows
 403    ssd   1.00000   1.00000  894 GiB   649 GiB   647 GiB   333 KiB  2.2 GiB  245 GiB  72.56  1.25   57      up          osd.403                           
 406    ssd   1.00000   1.00000  894 GiB   593 GiB   591 GiB   322 KiB  2.6 GiB  302 GiB  66.28  1.14   52      up          osd.406                           
 409    ssd   1.00000   1.00000  894 GiB   603 GiB   602 GiB   319 KiB  2.5 GiB  291 GiB  67.43  1.16   53      up          osd.409                           
 -12          3.00000         -  2.6 TiB   2.0 TiB   633 GiB   418 MiB      0 B  633 GiB  76.40  1.32    -              host vms-ceph114_xx-general_rows
1401    ssd   1.00000   1.00000  894 GiB   588 GiB   306 GiB   156 MiB      0 B  306 GiB  65.76  1.14   51      up          osd.1401                         
1403    ssd   1.00000   1.00000  894 GiB   689 GiB   205 GiB   111 MiB      0 B  205 GiB  77.11  1.33   60      up          osd.1403                         
1406    ssd   1.00000   1.00000  894 GiB   772 GiB   122 GiB   151 MiB      0 B  122 GiB  86.33  1.49   67      up          osd.1406                         
 -18          4.00000         -  3.5 TiB   2.3 TiB   2.2 TiB   1.3 MiB  7.8 GiB  1.2 TiB  64.62  1.12    -              host vms-ceph115_xx-general_rows
 503    ssd   1.00000   1.00000  894 GiB   501 GiB   500 GiB   325 KiB  1.5 GiB  393 GiB  56.04  0.97   44      up          osd.503                           
 506    ssd   1.00000   1.00000  894 GiB   526 GiB   523 GiB   330 KiB  2.2 GiB  369 GiB  58.78  1.01   46      up          osd.506                           
 509    ssd   1.00000   1.00000  894 GiB   602 GiB   600 GiB   324 KiB  2.0 GiB  292 GiB  67.36  1.16   53      up          osd.509                           
1511    ssd   1.00000   1.00000  894 GiB   682 GiB   680 GiB   327 KiB  2.2 GiB  212 GiB  76.32  1.32   60      up          osd.1511                         
 -27          3.00000         -  1.7 TiB   1.2 TiB   1.2 TiB   300 MiB  3.4 GiB  573 GiB  67.95  1.17    -              host vms-ceph116_xx-general_rows
1603    ssd   1.00000         0      0 B       0 B       0 B       0 B      0 B      0 B      0     0    0    down          osd.1603                         
1606    ssd   1.00000   1.00000  894 GiB   581 GiB   579 GiB   299 MiB  1.7 GiB  313 GiB  64.97  1.12   52      up          osd.1606                         
1608    ssd   1.00000   0.84999  894 GiB   634 GiB   633 GiB   325 KiB  1.7 GiB  260 GiB  70.93  1.22   56      up          osd.1608                         
 -36          3.00000         -  2.6 TiB   2.0 TiB   2.0 TiB   113 MiB  7.0 GiB  676 GiB  74.82  1.29    -              host vms-ceph117_xx-general_rows
1701    ssd   1.00000   1.00000  894 GiB   614 GiB   613 GiB   322 KiB  2.5 GiB  280 GiB  68.69  1.19   54      up          osd.1701                         
1703    ssd   1.00000   1.00000  894 GiB   702 GiB   701 GiB   318 KiB  2.4 GiB  192 GiB  78.53  1.36   62      up          osd.1703                         
1706    ssd   1.00000   1.00000  894 GiB   691 GiB   689 GiB   112 MiB  2.2 GiB  204 GiB  77.23  1.33   62      up          osd.1706                         
 -39                0         -      0 B       0 B       0 B       0 B      0 B      0 B      0     0    -              host vms-ceph118_xx-general_rows
 -42          4.00000         -  3.5 TiB   2.2 TiB   2.2 TiB   1.3 MiB  9.0 GiB  1.3 TiB  62.43  1.08    -              host vms-ceph119_xx-general_rows
1902    ssd   1.00000   1.00000  894 GiB   627 GiB   624 GiB   325 KiB  2.4 GiB  267 GiB  70.09  1.21   55      up          osd.1902                         
1905    ssd   1.00000   1.00000  894 GiB   603 GiB   600 GiB   326 KiB  2.2 GiB  292 GiB  67.39  1.16   53      up          osd.1905                         
1908    ssd   1.00000   1.00000  894 GiB   558 GiB   555 GiB   326 KiB  2.4 GiB  337 GiB  62.35  1.08   49      up          osd.1908                         
1911    ssd   1.00000   1.00000  894 GiB   446 GiB   444 GiB   330 KiB  2.1 GiB  448 GiB  49.89  0.86   39      up          osd.1911                         
 -54          4.00000         -  3.5 TiB   2.5 TiB   2.5 TiB   1.3 MiB  9.4 GiB  971 GiB  72.84  1.26    -              host vms-ceph120_xx-general_rows
2003    ssd   1.00000   1.00000  894 GiB   672 GiB   670 GiB   326 KiB  2.6 GiB  222 GiB  75.20  1.30   59      up          osd.2003                         
2005    ssd   1.00000   1.00000  894 GiB   760 GiB   758 GiB   327 KiB  2.3 GiB  134 GiB  85.00  1.47   67      up          osd.2005                         
2008    ssd   1.00000   1.00000  894 GiB   593 GiB   591 GiB   322 KiB  2.2 GiB  301 GiB  66.29  1.14   52      up          osd.2008                         
2011    ssd   1.00000   1.00000  894 GiB   580 GiB   578 GiB   322 KiB  2.3 GiB  314 GiB  64.88  1.12   51      up          osd.2011                         
-106          1.00000         -  894 GiB   634 GiB   633 GiB   326 KiB  1.3 GiB  260 GiB  70.90  1.22    -              host vms-ceph121_xx-general_rows
2105    ssd   1.00000   1.00000  894 GiB   634 GiB   633 GiB   326 KiB  1.3 GiB  260 GiB  70.90  1.22   56      up          osd.2105                         
                          TOTAL  309 TiB   179 TiB   173 TiB   2.5 GiB  401 GiB  130 TiB  57.93                                                               
MIN/MAX VAR: 0/1.49  STDDEV: 12.00

hahosting · Nov 24, 2023

Code:

root@vms-ceph112:/home/richard.admin# ceph osd dump | grep ratio
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
root@vms-ceph112:/home/richard.admin#

sb-jw · Nov 24, 2023

Okay, quick summary

Code:

2 osds down = 1203 (HDD) and 2112 (SSD)
1 host (1 osds) down = host vms-ceph121_ha-ssd
2 nearfull osd(s) = 1406 (SSD) and 2005 (SSD)

If the summary and everything about the SSD pool is known. Then the only problematic HDD left is the 1203.
Are you saying you don't get any feedback from any of the commands?

Code:

ceph pg 8.11 query
ceph pg 8.1e0 query
ceph pg 8.351 query

Have you ever done a smartctl on the disk or viewed journalctl from the OSD? Is the OSD actually physically lost or can it possibly be started again?

//EDIT:
As an alternative command to ceph pg query you can also try ceph pg dump_stuck stale.

hahosting · Nov 24, 2023

yes 1203 being the issue. we have removed the disk and its reading from a recovery point of view but very slowly so im assuming at this point its a bad disk and being on an old 2/1 has ruined my day.

I am trying to get the drive recovered to a new one and perhaps it might activate?

hahosting · Nov 24, 2023

yes the query command just sits forever. I will try it again to be sure

hahosting · Nov 24, 2023

Code:

root@vms-ceph112:/home/richard.admin# ceph pg dump_stuck stale
PG_STAT  STATE                             UP      UP_PRIMARY  ACTING  ACTING_PRIMARY
8.1e0    stale+active+undersized+degraded  [1203]        1203  [1203]            1203
8.11     stale+active+undersized+degraded  [1203]        1203  [1203]            1203
8.351    stale+active+undersized+degraded  [1203]        1203  [1203]            1203
ok
root@vms-ceph112:/home/richard.admin#

Failed OSD for ceph stale pg

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Famous Member

Well-Known Member

Well-Known Member

Famous Member

Well-Known Member

Famous Member

Well-Known Member

Well-Known Member

Well-Known Member

Famous Member

Well-Known Member

Well-Known Member

Famous Member

Well-Known Member

Well-Known Member

Well-Known Member

We value your privacy