[SOLVED] Ceph: HEALH_WARN never ends after osd out

stats

Well-Known Member
Mar 6, 2017
43
1
48
Hello,

I'm trying to replace HDD to SSD.
As my understanding, I let a target osd out and wait to become HEALTH_OK and destroy it to remove the current HDD physically.
but after the osd out operation , HEALTH_WARN never ends. How can I fix it?

My version is Virtual Environment 5.4-15

Satoshi

degraded_status.png
 
Last edited:
I found following messages. Is it stucked?

Degraded data redundancy: 46/1454715 objects degraded (0.003%), 1 pg degraded, 1 pg undersized
pg 11.45 is stuck undersized for 220401.107415, current state active+undersized+degraded, last acting [5,4]
 
What size/min_size does the pool have? And are those OSDs online?
osd pool default min size = 2
osd pool default size = 3

Yes, the OSDs are online.

And aside, the 5.4 is EoL.
I know. I will upgrade it after the replacement from HDD to SSD.

pg 11.45 is stuck undersized for XXXXX.XXXXX, current state active+undersized+degraded, last acting [5,4]

The number XXXXX.XXXXX is always changing every time. Is it mean the recovering running?
 
Last edited:
These are seconds. As long as Ceph will not be able to recreate the third copy the message will stay.

How does the ceph osd tree output look like?
Code:
# ceph osd tree
ID CLASS WEIGHT   TYPE NAME       STATUS REWEIGHT PRI-AFF
-1       39.87958 root default                           
-3       10.77039     host vgpm01                         
 3   hdd  7.27730         osd.3       up        0 1.00000
 0   ssd  3.49309         osd.0       up  1.00000 1.00000
-5       14.55460     host vgpm02                         
 1   hdd  7.27730         osd.1       up  1.00000 1.00000
 4   hdd  7.27730         osd.4       up  1.00000 1.00000
-7       14.55460     host vgpm03                         
 2   hdd  7.27730         osd.2       up  1.00000 1.00000
 5   hdd  7.27730         osd.5       up  1.00000 1.00000
 
And do you have any special crush rules ceph osd dump?
Also, is there enough space on the cluster, since the SSDs are only half the size of the HDDs.

Since there are only two OSDs on one host, the OSD with reweight 1 will need to hold the data of the OSD with reweight 0. If there isn't enough space to do that the recovery can't continue. But since you have two copies of your data left, the replacement of the HDD can continue, as long as there will be enough space on the new SSDs.
 
Code:
# ceph osd dump
epoch 676
fsid 0caf72c1-b05d-4f73-88da-ca4a2b89225f
created 2017-11-29 08:33:35.211810
modified 2021-03-01 18:29:29.970358
flags sortbitwise,recovery_deletes,purged_snapdirs
crush_version 19
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85
require_min_compat_client jewel
min_compat_client jewel
require_osd_release luminous
pool 10 'vgpool01' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 64 pgp_num 64 last_change 503 flags hashpspool stripe_width 0 application rbd
    removed_snaps [1~25,28~2,2d~2]
pool 11 'cephfs_data' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 355 flags hashpspool stripe_width 0 application cephfs
pool 12 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 last_change 355 flags hashpspool stripe_width 0 application cephfs
max_osd 6
osd.0 up   in  weight 1 up_from 529 up_thru 654 down_at 0 last_clean_interval [0,0) 172.20.0.11:6805/1760366 172.20.0.11:6806/1760366 172.20.0.11:6807/1760366 172.20.0.11:6808/1760366 exists,up fd4702b5-605e-4079-8fd5-8a38e75b82d1
osd.1 up   in  weight 1 up_from 393 up_thru 670 down_at 391 last_clean_interval [373,390) 172.20.0.12:6801/3418 172.20.0.12:6802/3418 172.20.0.12:6803/3418 172.20.0.12:6804/3418 exists,up 7ac6976a-bee6-4b89-b2af-db2e1094d153
osd.2 up   in  weight 1 up_from 402 up_thru 658 down_at 398 last_clean_interval [378,397) 172.20.0.13:6805/3446 172.20.0.13:6806/3446 172.20.0.13:6807/3446 172.20.0.13:6808/3446 exists,up b7c24889-a023-4ca6-9d4b-e5fda00be3e4
osd.3 up   out weight 0 up_from 676 up_thru 532 down_at 675 last_clean_interval [674,674) 172.20.0.11:6801/1693427 172.20.0.11:6802/1693427 172.20.0.11:6803/1693427 172.20.0.11:6804/1693427 exists,up dfd6c311-1fd7-474e-b737-afb23941de3c
osd.4 up   in  weight 1 up_from 412 up_thru 656 down_at 410 last_clean_interval [395,411) 172.20.0.12:6805/3578 172.20.0.12:6809/1003578 172.20.0.12:6810/1003578 172.20.0.12:6811/1003578 exists,up c3db964e-282c-4330-93a3-e5d300e496a4
osd.5 up   in  weight 1 up_from 401 up_thru 672 down_at 398 last_clean_interval [380,397) 172.20.0.13:6801/3217 172.20.0.13:6802/3217 172.20.0.13:6803/3217 172.20.0.13:6804/3217 exists,up 9a9ede50-ecef-44d2-a0b0-2bcad894ee05

I think there is enough space.
osd_status_20210302.png
 
Besides that one PG you have three copies of your data, each on one node. The PG is from the cephfs_data pool. Just replace the OSD, the recovery should take care of PG.
Do you mean just ignore the warning and continue to the replacement process?
Will the recovery process start when the osd.3 is destroyed? but it is very strange why only one pg is degraded.
 
Thank you very much. I will try it.
I have one more question. If I mark the pg as lost by 'mark_unfound_lost delete' command that I mentioned, is it meaningless?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!