Ceph replacing drive - no notification of data loss?

CodeBreaker · Jul 25, 2022

I'm replacing HDD's in cluster with a higher capacity drives. Some not that important video data is on a size 2 min 1 CephFS pool (yes, I know, that can lead to data loss).

The process that I was doing was:
1. set global flag noout
2. stop and out an OSD then destroy it (from GUI)
3. pull out disk and insert a new one
4. create osd
5. unset global flag noout
6. wait for recovery
7. go back to step 1 for next drive.

This was going slow so I did a bit of googling on how to speed this up and found that you can remove multiple drives if all of object PGs are not on those drives. So I did my analysis and found two drives to remove. I did the steps above for two drives at the same time. I don't know where I made a mistake (could be in writing down PGs) so I lost some PGs and some video files were corrupted.

During this process there was no error that data was lost, only warning that there is a degraded data redundancy (as with single drive). After recovery/rebalance, warning was gone. I (z)greped ceph logs for "lost", "corrupt" and it didn't return anything. I could not find any indication that data was lost and only found out by inspecting data.

Is it possible to find out if data was lost and what data was lost (CephFS and RBD)?

udo · Jul 25, 2022

Hi,
have you removed two OSDs from different servers, or from one server?

Are all PGs clean and aktive? E.g. following command don't shows PGs?

Code:

ceph pg dump | grep -v active+clean

If all clean, you can start an deep-scrub on all active+clean PGs

Code:

ceph pg dump | grep active+clean | cut -d' ' -f1 | while read i; do ceph pg deep-scrub ${i}; done

Udo

CodeBreaker · Jul 25, 2022

I've removed from different servers.

This is what ceph pg dump | grep -v active+clean shows:

Code:

version 68465
stamp 2022-07-25T22:14:46.049009+0200
last_osdmap_epoch 0
last_pg_scan 0
PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED  UNFOUND  BYTES        OMAP_BYTES*  OMAP_KEYS*  LOG    DISK_LOG  STATE                        STATE_STAMP                      VERSION         REPORTED        UP        UP_PRIMARY  ACTING    ACTING_PRIMARY  LAST_SCRUB      SCRUB_STAMP                      LAST_DEEP_SCRUB  DEEP_SCRUB_STAMP                 SNAPTRIMQ_LEN
dumped all
                                                                      
12  850846  0  0  0  0  3047691325691         0       0  494972  494972
7    26125  0  0  0  0      183964140  81153907  165287   43871   43871
6   136126  0  0  0  0              0         0       0   88345   88345
3     7896  0  0  0  0    32751819000       218      20  132609  132609
2   120701  0  0  0  0   501845192256      5281     462  710127  710127
1       22  0  0  0  0              0  77919062    3617    3111    3111
                                                                            
sum  1141716  0  0  0  0  3582472301087  159078468  169386  1473035  1473035
OSD_STAT  USED     AVAIL    USED_RAW  TOTAL    HB_PEERS                   PG_SUM  PRIMARY_PG_SUM
2         1.5 TiB  1.3 TiB   1.5 TiB  2.7 TiB   [1,3,4,5,6,7,8,10,11,12]     159              48
13        162 GiB  211 GiB   162 GiB  373 GiB   [0,1,3,4,5,8,9,10,11,12]     104              27
0         197 GiB  176 GiB   197 GiB  373 GiB   [1,3,4,5,6,8,9,10,12,13]     124              40
1         168 GiB  205 GiB   168 GiB  373 GiB   [0,2,4,5,6,8,9,10,12,13]     106              41
3         1.3 TiB  1.4 TiB   1.3 TiB  2.7 TiB   [2,4,5,6,7,8,9,10,11,13]     161              50
4         173 GiB  199 GiB   173 GiB  373 GiB   [0,1,3,5,6,8,9,10,12,13]     110              34
5         178 GiB  194 GiB   178 GiB  373 GiB   [0,1,3,4,6,8,9,10,12,13]     113              39
6         1.3 TiB  1.4 TiB   1.3 TiB  2.7 TiB  [2,3,5,7,8,9,10,11,12,13]     148              52
7         1.5 TiB  1.3 TiB   1.5 TiB  2.7 TiB  [1,2,3,6,8,9,10,11,12,13]     172              59
8         152 GiB  221 GiB   152 GiB  373 GiB    [0,1,3,4,5,6,7,9,12,13]      97              41
9         204 GiB  168 GiB   204 GiB  373 GiB  [0,1,4,5,6,8,10,11,12,13]     131              52
10        1.5 TiB  1.3 TiB   1.5 TiB  2.7 TiB   [2,3,4,5,6,7,9,11,12,13]     167              49
11        1.4 TiB  1.4 TiB   1.4 TiB  2.7 TiB   [2,3,4,5,6,7,8,10,12,13]     153              62
12        158 GiB  215 GiB   158 GiB  373 GiB    [0,1,3,4,5,6,8,9,11,13]      98              31
sum       9.8 TiB  9.5 TiB   9.8 TiB   19 TiB

What will deep scrub do for me in my case?

udo · Jul 26, 2022

Hi,
an deep scrup compares the data from the primary OSD with the data of all replicas - so if this run without error, your data are valid.

Udo

CodeBreaker · Jul 26, 2022

Thanks. Will do that.

But, isn't manager suppose to know that there are missing PGs? If I removed disks that are housing all (two) copies of the PG then there is no way to recover them and the system should have an error somewhere?

udo · Jul 26, 2022

Are you sure, that you have two copies only? That would be dangerous…

What is the output of following command?

Code:

for i in `ceph osd lspools | tr -d ",[0-9]"`
  do
     ceph osd dump | grep \'$i\'
done

Udo

Search

Search

Ceph replacing drive - no notification of data loss?

CodeBreaker

Active Member

udo

Distinguished Member

CodeBreaker

Active Member

udo

Distinguished Member

CodeBreaker

Active Member

udo

Distinguished Member

We value your privacy