Last edited:
What size/min_size does the pool have? And are those OSDs online?pg 11.45 is stuck undersized for 220401.107415, current state active+undersized+degraded, last acting [5,4]
And aside, the 5.4 is EoL.My version is Virtual Environment 5.4-15
osd pool default min size = 2What size/min_size does the pool have? And are those OSDs online?
I know. I will upgrade it after the replacement from HDD to SSD.And aside, the 5.4 is EoL.
These are seconds. As long as Ceph will not be able to recreate the third copy the message will stay.The number XXXXX.XXXXX is always changing every time. Is it mean the recovering running?
ceph osd tree
output look like?These are seconds. As long as Ceph will not be able to recreate the third copy the message will stay.
How does theceph osd tree
output look like?
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 39.87958 root default
-3 10.77039 host vgpm01
3 hdd 7.27730 osd.3 up 0 1.00000
0 ssd 3.49309 osd.0 up 1.00000 1.00000
-5 14.55460 host vgpm02
1 hdd 7.27730 osd.1 up 1.00000 1.00000
4 hdd 7.27730 osd.4 up 1.00000 1.00000
-7 14.55460 host vgpm03
2 hdd 7.27730 osd.2 up 1.00000 1.00000
5 hdd 7.27730 osd.5 up 1.00000 1.00000
ceph osd dump
?# ceph osd dump
epoch 676
fsid 0caf72c1-b05d-4f73-88da-ca4a2b89225f
created 2017-11-29 08:33:35.211810
modified 2021-03-01 18:29:29.970358
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 19
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release luminous
pool 10 'vgpool01' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 503 flags hashpspool stripe_width 0 application rbd
removed_snaps [1~25,28~2,2d~2]
pool 11 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 355 flags hashpspool stripe_width 0 application cephfs
pool 12 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 355 flags hashpspool stripe_width 0 application cephfs
max_osd 6
osd.0 up in weight 1 up_from 529 up_thru 654 down_at 0 last_clean_interval [0,0) 172.20.0.11:6805/1760366 172.20.0.11:6806/1760366 172.20.0.11:6807/1760366 172.20.0.11:6808/1760366 exists,up fd4702b5-605e-4079-8fd5-8a38e75b82d1
osd.1 up in weight 1 up_from 393 up_thru 670 down_at 391 last_clean_interval [373,390) 172.20.0.12:6801/3418 172.20.0.12:6802/3418 172.20.0.12:6803/3418 172.20.0.12:6804/3418 exists,up 7ac6976a-bee6-4b89-b2af-db2e1094d153
osd.2 up in weight 1 up_from 402 up_thru 658 down_at 398 last_clean_interval [378,397) 172.20.0.13:6805/3446 172.20.0.13:6806/3446 172.20.0.13:6807/3446 172.20.0.13:6808/3446 exists,up b7c24889-a023-4ca6-9d4b-e5fda00be3e4
osd.3 up out weight 0 up_from 676 up_thru 532 down_at 675 last_clean_interval [674,674) 172.20.0.11:6801/1693427 172.20.0.11:6802/1693427 172.20.0.11:6803/1693427 172.20.0.11:6804/1693427 exists,up dfd6c311-1fd7-474e-b737-afb23941de3c
osd.4 up in weight 1 up_from 412 up_thru 656 down_at 410 last_clean_interval [395,411) 172.20.0.12:6805/3578 172.20.0.12:6809/1003578 172.20.0.12:6810/1003578 172.20.0.12:6811/1003578 exists,up c3db964e-282c-4330-93a3-e5d300e496a4
osd.5 up in weight 1 up_from 401 up_thru 672 down_at 398 last_clean_interval [380,397) 172.20.0.13:6801/3217 172.20.0.13:6802/3217 172.20.0.13:6803/3217 172.20.0.13:6804/3217 exists,up 9a9ede50-ecef-44d2-a0b0-2bcad894ee05
ceph pg 11.45 mark_unfound_lost delete
Do you mean just ignore the warning and continue to the replacement process?Besides that one PG you have three copies of your data, each on one node. The PG is from the cephfs_data pool. Just replace the OSD, the recovery should take care of PG.
yes.Do you mean just ignore the warning and continue to the replacement process?
Yes that should happen. Probably Ceph can't place any more data on that one PG. The Cephfs has to many PGs, checkout the pgcalc from Ceph.Will the recovery process start when the osd.3 is destroyed? but it is very strange why only one pg is degraded.
This will drop the reference to that PG and with that its data that would be still on the other OSDs.I have one more question. If I mark the pg as lost by 'mark_unfound_lost delete' command that I mentioned, is it meaningless?
So, will the command recover the lost PG from other OSDs to keep 3 pg replicas?This will drop the reference to that PG and with that its data that would be still on the other OSDs.