I'm replacing HDD's in cluster with a higher capacity drives. Some not that important video data is on a size 2 min 1 CephFS pool (yes, I know, that can lead to data loss).
The process that I was doing was:
1. set global flag noout
2. stop and out an OSD then destroy it (from GUI)
3. pull out disk and insert a new one
4. create osd
5. unset global flag noout
6. wait for recovery
7. go back to step 1 for next drive.
This was going slow so I did a bit of googling on how to speed this up and found that you can remove multiple drives if all of object PGs are not on those drives. So I did my analysis and found two drives to remove. I did the steps above for two drives at the same time. I don't know where I made a mistake (could be in writing down PGs) so I lost some PGs and some video files were corrupted.
During this process there was no error that data was lost, only warning that there is a degraded data redundancy (as with single drive). After recovery/rebalance, warning was gone. I (z)greped ceph logs for "lost", "corrupt" and it didn't return anything. I could not find any indication that data was lost and only found out by inspecting data.
Is it possible to find out if data was lost and what data was lost (CephFS and RBD)?
The process that I was doing was:
1. set global flag noout
2. stop and out an OSD then destroy it (from GUI)
3. pull out disk and insert a new one
4. create osd
5. unset global flag noout
6. wait for recovery
7. go back to step 1 for next drive.
This was going slow so I did a bit of googling on how to speed this up and found that you can remove multiple drives if all of object PGs are not on those drives. So I did my analysis and found two drives to remove. I did the steps above for two drives at the same time. I don't know where I made a mistake (could be in writing down PGs) so I lost some PGs and some video files were corrupted.
During this process there was no error that data was lost, only warning that there is a degraded data redundancy (as with single drive). After recovery/rebalance, warning was gone. I (z)greped ceph logs for "lost", "corrupt" and it didn't return anything. I could not find any indication that data was lost and only found out by inspecting data.
Is it possible to find out if data was lost and what data was lost (CephFS and RBD)?