[SOLVED] Ceph: HEALH_WARN never ends after osd out

stats · Feb 28, 2021

Hello,

I'm trying to replace HDD to SSD.
As my understanding, I let a target osd out and wait to become HEALTH_OK and destroy it to remove the current HDD physically.
but after the osd out operation , HEALTH_WARN never ends. How can I fix it?

My version is Virtual Environment 5.4-15

Satoshi

Alwin Antreich · Feb 28, 2021

The OSD is down and out, ceph has to recover from other OSDs. Depending on the speed of the other OSDs this might take some time. Do you see any recovery traffic?

stats · Mar 1, 2021

I found following messages. Is it stucked?

Degraded data redundancy: 46/1454715 objects degraded (0.003%), 1 pg degraded, 1 pg undersized
pg 11.45 is stuck undersized for 220401.107415, current state active+undersized+degraded, last acting [5,4]

Alwin Antreich · Mar 1, 2021

stats said:
pg 11.45 is stuck undersized for 220401.107415, current state active+undersized+degraded, last acting [5,4]

What size/min_size does the pool have? And are those OSDs online?

stats said:
My version is Virtual Environment 5.4-15

And aside, the 5.4 is EoL.

stats · Mar 2, 2021

Alwin Antreich said:
What size/min_size does the pool have? And are those OSDs online?

osd pool default min size = 2
osd pool default size = 3

Yes, the OSDs are online.

Alwin Antreich said:
And aside, the 5.4 is EoL.

I know. I will upgrade it after the replacement from HDD to SSD.

pg 11.45 is stuck undersized for XXXXX.XXXXX, current state active+undersized+degraded, last acting [5,4]

The number XXXXX.XXXXX is always changing every time. Is it mean the recovering running?

Alwin Antreich · Mar 2, 2021

stats said:
The number XXXXX.XXXXX is always changing every time. Is it mean the recovering running?

These are seconds. As long as Ceph will not be able to recreate the third copy the message will stay.

How does the ceph osd tree output look like?

stats · Mar 2, 2021

Alwin Antreich said:
These are seconds. As long as Ceph will not be able to recreate the third copy the message will stay.

How does the ceph osd tree output look like?

Code:

# ceph osd tree
ID CLASS WEIGHT   TYPE NAME       STATUS REWEIGHT PRI-AFF
-1       39.87958 root default                           
-3       10.77039     host vgpm01                         
 3   hdd  7.27730         osd.3       up        0 1.00000
 0   ssd  3.49309         osd.0       up  1.00000 1.00000
-5       14.55460     host vgpm02                         
 1   hdd  7.27730         osd.1       up  1.00000 1.00000
 4   hdd  7.27730         osd.4       up  1.00000 1.00000
-7       14.55460     host vgpm03                         
 2   hdd  7.27730         osd.2       up  1.00000 1.00000
 5   hdd  7.27730         osd.5       up  1.00000 1.00000

Alwin Antreich · Mar 2, 2021

And do you have any special crush rules ceph osd dump?
Also, is there enough space on the cluster, since the SSDs are only half the size of the HDDs.

Since there are only two OSDs on one host, the OSD with reweight 1 will need to hold the data of the OSD with reweight 0. If there isn't enough space to do that the recovery can't continue. But since you have two copies of your data left, the replacement of the HDD can continue, as long as there will be enough space on the new SSDs.

stats · Mar 2, 2021

Code:

# ceph osd dump
epoch 676
fsid 0caf72c1-b05d-4f73-88da-ca4a2b89225f
created 2017-11-29 08:33:35.211810
modified 2021-03-01 18:29:29.970358
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 19
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release luminous
pool 10 'vgpool01' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 503 flags hashpspool stripe_width 0 application rbd
    removed_snaps [1~25,28~2,2d~2]
pool 11 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 355 flags hashpspool stripe_width 0 application cephfs
pool 12 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 355 flags hashpspool stripe_width 0 application cephfs
max_osd 6
osd.0 up   in  weight 1 up_from 529 up_thru 654 down_at 0 last_clean_interval [0,0) 172.20.0.11:6805/1760366 172.20.0.11:6806/1760366 172.20.0.11:6807/1760366 172.20.0.11:6808/1760366 exists,up fd4702b5-605e-4079-8fd5-8a38e75b82d1
osd.1 up   in  weight 1 up_from 393 up_thru 670 down_at 391 last_clean_interval [373,390) 172.20.0.12:6801/3418 172.20.0.12:6802/3418 172.20.0.12:6803/3418 172.20.0.12:6804/3418 exists,up 7ac6976a-bee6-4b89-b2af-db2e1094d153
osd.2 up   in  weight 1 up_from 402 up_thru 658 down_at 398 last_clean_interval [378,397) 172.20.0.13:6805/3446 172.20.0.13:6806/3446 172.20.0.13:6807/3446 172.20.0.13:6808/3446 exists,up b7c24889-a023-4ca6-9d4b-e5fda00be3e4
osd.3 up   out weight 0 up_from 676 up_thru 532 down_at 675 last_clean_interval [674,674) 172.20.0.11:6801/1693427 172.20.0.11:6802/1693427 172.20.0.11:6803/1693427 172.20.0.11:6804/1693427 exists,up dfd6c311-1fd7-474e-b737-afb23941de3c
osd.4 up   in  weight 1 up_from 412 up_thru 656 down_at 410 last_clean_interval [395,411) 172.20.0.12:6805/3578 172.20.0.12:6809/1003578 172.20.0.12:6810/1003578 172.20.0.12:6811/1003578 exists,up c3db964e-282c-4330-93a3-e5d300e496a4
osd.5 up   in  weight 1 up_from 401 up_thru 672 down_at 398 last_clean_interval [380,397) 172.20.0.13:6801/3217 172.20.0.13:6802/3217 172.20.0.13:6803/3217 172.20.0.13:6804/3217 exists,up 9a9ede50-ecef-44d2-a0b0-2bcad894ee05

I think there is enough space.

stats · Mar 2, 2021

Should I mark the pg as lost by following command? I don't know how it works.
ceph pg 11.45 mark_unfound_lost delete

https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/

Alwin Antreich · Mar 3, 2021

Besides that one PG you have three copies of your data, each on one node. The PG is from the cephfs_data pool. Just replace the OSD, the recovery should take care of PG.

stats · Mar 3, 2021

Alwin Antreich said:
Besides that one PG you have three copies of your data, each on one node. The PG is from the cephfs_data pool. Just replace the OSD, the recovery should take care of PG.

Do you mean just ignore the warning and continue to the replacement process?
Will the recovery process start when the osd.3 is destroyed? but it is very strange why only one pg is degraded.

Alwin Antreich · Mar 3, 2021

stats said:
Do you mean just ignore the warning and continue to the replacement process?

yes.

stats said:
Will the recovery process start when the osd.3 is destroyed? but it is very strange why only one pg is degraded.

Yes that should happen. Probably Ceph can't place any more data on that one PG. The Cephfs has to many PGs, checkout the pgcalc from Ceph.

stats · Mar 3, 2021

Thank you very much. I will try it.
I have one more question. If I mark the pg as lost by 'mark_unfound_lost delete' command that I mentioned, is it meaningless?

Alwin Antreich · Mar 3, 2021

stats said:
I have one more question. If I mark the pg as lost by 'mark_unfound_lost delete' command that I mentioned, is it meaningless?

This will drop the reference to that PG and with that its data that would be still on the other OSDs.

stats · Mar 3, 2021

Alwin Antreich said:
This will drop the reference to that PG and with that its data that would be still on the other OSDs.

So, will the command recover the lost PG from other OSDs to keep 3 pg replicas?

stats · Mar 4, 2021

ceph helth has became HEALTH_OK after destroying osd.3.
Thank you very much.

Search

Search

[SOLVED] Ceph: HEALH_WARN never ends after osd out

stats

Well-Known Member

Alwin Antreich

Well-Known Member

stats

Well-Known Member

Alwin Antreich

Well-Known Member

stats

Well-Known Member

Alwin Antreich

Well-Known Member

stats

Well-Known Member

Alwin Antreich

Well-Known Member

stats

Well-Known Member

stats

Well-Known Member

Alwin Antreich

Well-Known Member

stats

Well-Known Member

Alwin Antreich

Well-Known Member

stats

Well-Known Member

Alwin Antreich

Well-Known Member

stats

Well-Known Member

stats

Well-Known Member

We value your privacy