[SOLVED] misplaced objects after removing OSD

walter.egosson · Jan 27, 2022

Hi!
We have 3 identical servers running promox+ceph with 2 HDDs per server as OSDs:
- OS: debian Buster
- proxmox version 6.4-1
- ceph version 14.2.22-pve1 (nautilus)

One OSD went down so we decided to remove it following the ceph documentation here.

Now we have 5 OSD left:

Code:

$ sudo ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE    RAW USE  DATA    OMAP    META AVAIL   %USE  VAR  PGS STATUS
 1   hdd 0.79999  0.85004 2.7 TiB  2.2 TiB 587 GiB  92 MiB  0 B 587 GiB 79.00 1.65 494     up
 2   hdd 2.72800  1.00000 2.7 TiB  1.2 TiB 1.5 TiB 120 MiB  0 B 1.5 TiB 44.12 0.92 255     up
 3   hdd 2.73000  1.00000 2.7 TiB 1021 GiB 1.7 TiB 152 MiB  0 B 1.7 TiB 36.53 0.77 250     up
 4   hdd 2.73000  1.00000 2.7 TiB  1.1 TiB 1.7 TiB 200 MiB  0 B 1.7 TiB 38.85 0.81 243     up
 5   hdd 2.73000  1.00000 2.7 TiB  1.1 TiB 1.6 TiB 181 MiB  0 B 1.6 TiB 40.27 0.84 258     up
                    TOTAL  14 TiB  6.5 TiB 7.1 TiB 744 MiB  0 B 7.1 TiB 47.75

The issue is, after the OSD was removed, we got a lot of "objects misplaced"

Code:

$ sudo ceph status
  cluster:
    id:     c8e950e4-1cc2-48b6-a2ba-c99594cf7d35
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum 0,1,2 (age 64m)
    mgr: srv-virt-1(active, since 8d), standbys: srv-virt-3, srv-virt-2
    mds: cephfs:1 {0=srv-virt-3=up:active}
    osd: 5 osds: 5 up (since 5w), 5 in (since 5w); 92 remapped pgs
 
  data:
    pools:   3 pools, 500 pgs
    objects: 578.74k objects, 2.2 TiB
    usage:   6.5 TiB used, 7.1 TiB / 14 TiB avail
    pgs:     111286/1736223 objects misplaced (6.410%)
             411 active+clean
             89  active+clean+remapped
 
  io:
    client:   0 B/s rd, 5.9 MiB/s wr, 0 op/s rd, 45 op/s wr

Since one OSD was over 80 percents full while others where less than 50 percent used. We thought a reweight would solve the issue, but still the"objects misplaced" remained after a reweight.

Questions:

Since the health says OK, does that misplaced objects warning have any impact? Should we ignore it?
Is there a way to fix this "misplacement" ? How?

Thank you all!

RokaKen · Jan 27, 2022

walter.egosson said:
Questions:

Since the health says OK, does that misplaced objects warning have any impact? Should we ignore it?

Well, warnings have a purpose -- ignoring them is up to you. But, OBJECT_MISPLACED [0] doesn't indicate an immediate problem unless you lose another OSD.

[0] https://docs.ceph.com/en/nautilus/rados/operations/health-checks/#object-misplaced

walter.egosson said:
Is there a way to fix this "misplacement" ? How?

Assuming you have a default 3/2 size for your pool(s) and a default CRUSH with failure domain of 'host', replace the failed OSD and Ceph will recover.

walter.egosson · Feb 17, 2022

For some reason, adding back the removed OSD brought back the cluster to healthy status, it looks like we missed something but cannot tell what.

[SOLVED] misplaced objects after removing OSD

walter.egosson

Active Member

RokaKen

Active Member

walter.egosson

Active Member

We value your privacy