[SOLVED] Ceph OSD recovery

starblazer

New Member
May 17, 2020
4
1
1
43
Hi! So, I lost two server boot drives and I need to recreate my cluser and get ceph started again.

I apparently need to recreate the /var/lib/ceph/osd/ceph-* directories and get them mounted... however, I cannot figure out for the live of google and me to get them mounted.

I see the lv/vgs
I activate them..
they still don't mount.

If I didn't have reduced data availability, I'd just blow the other OSDs away and recreate.

Ideas?
 
Hi! So, I lost two server boot drives and I need to recreate my cluser and get ceph started again.
From the same node? And did you have more than one MON? What does ceph -s & ceph osd tree show?

I apparently need to recreate the /var/lib/ceph/osd/ceph-* directories and get them mounted... however, I cannot figure out for the live of google and me to get them mounted.
If ceph packages are installed the base directories will be created.
 
Two separate servers in a three server cluster.

From the only machine that lived:
Code:
root@supermicro:~# ceph -s
  cluster:
    id:     d62464d5-4e1f-4167-8177-c82896881270
    health: HEALTH_WARN
            1 filesystem is degraded
            insufficient standby MDS daemons available
            1 MDSs report slow metadata IOs
            7 osds down
            1 host (7 osds) down
            2 pool(s) have non-power-of-two pg_num
            Reduced data availability: 250 pgs inactive
            Degraded data redundancy: 3481906/5222859 objects degraded (66.667%), 215 pgs degraded, 250 pgs undersized
            73 pgs not deep-scrubbed in time
            1 daemons have recently crashed
            too few PGs per OSD (20 < min 30)

  services:
    mon: 2 daemons, quorum supermicro,pve2 (age 92m)
    mgr: supermicro(active, since 92m), standbys: pve2
    mds: cephfs:1/1 {0=supermicro=up:replay}
    osd: 18 osds: 5 up (since 96m), 12 in (since 46m)

  data:
    pools:   2 pools, 250 pgs
    objects: 1.74M objects, 6.6 TiB
    usage:   6.6 TiB used, 21 TiB / 27 TiB avail
    pgs:     100.000% pgs not active
             3481906/5222859 objects degraded (66.667%)
             215 undersized+degraded+peered
             35  undersized+peered

root@supermicro:~#

Code:
root@supermicro:~# ceph osd tree
ID CLASS WEIGHT   TYPE NAME           STATUS REWEIGHT PRI-AFF
-1       62.76027 root default
-3       19.09940     host pve2
 0   hdd  2.72849         osd.0         down  1.00000 1.00000
 1   hdd  2.72849         osd.1         down  1.00000 1.00000
 2   hdd  2.72849         osd.2         down  1.00000 1.00000
 3   hdd  2.72849         osd.3         down  1.00000 1.00000
 9   hdd  2.72849         osd.9         down  1.00000 1.00000
10   hdd  2.72849         osd.10        down  1.00000 1.00000
11   hdd  2.72849         osd.11        down  1.00000 1.00000
-7       16.37091     host pve3
12   hdd  2.72849         osd.12        down        0 1.00000
13   hdd  2.72849         osd.13        down        0 1.00000
14   hdd  2.72849         osd.14        down        0 1.00000
15   hdd  2.72849         osd.15        down        0 1.00000
16   hdd  2.72849         osd.16        down        0 1.00000
17   hdd  2.72849         osd.17        down        0 1.00000
-5       27.28996     host supermicro
 4   hdd  5.45799         osd.4           up  1.00000 1.00000
 5   hdd  5.45799         osd.5           up  1.00000 1.00000
 6   hdd  5.45799         osd.6           up  1.00000 1.00000
 7   hdd  5.45799         osd.7           up  1.00000 1.00000
 8   hdd  5.45799         osd.8           up  1.00000 1.00000
root@supermicro:~#

If ceph packages are installed the base directories will be created.
The base directory is created but none of the OSD disks /var/lib/ceph/osd/ceph-* wasn't created from the disks.
 
Okay, after the 123rd time switching up my google terms.... "mount ceph lvm recovery" isn't useful lol.

Code:
ceph-volume lvm activate --all

fixed it.
 
  • Like
Reactions: GTA_doum
A reboot should do this as well.
Well, that didn't happen. On two servers, both with a fresh install of 6.2.

I really should test to make sure it comes up after a reboot. I'm sure it will now once it was activated on the new machine.
 
Okay, after the 123rd time switching up my google terms.... "mount ceph lvm recovery" isn't useful lol.

Code:
ceph-volume lvm activate --all

fixed it.
Like you, rebooting was not enough, only this command restarted the osd! Thanks a million, it's been three days I was looking for a way to start the ceph osds!