[SOLVED] Ceph OSD recovery

starblazer

New Member
May 17, 2020
4
1
1
42
Hi! So, I lost two server boot drives and I need to recreate my cluser and get ceph started again.

I apparently need to recreate the /var/lib/ceph/osd/ceph-* directories and get them mounted... however, I cannot figure out for the live of google and me to get them mounted.

I see the lv/vgs
I activate them..
they still don't mount.

If I didn't have reduced data availability, I'd just blow the other OSDs away and recreate.

Ideas?
 
Hi! So, I lost two server boot drives and I need to recreate my cluser and get ceph started again.
From the same node? And did you have more than one MON? What does ceph -s & ceph osd tree show?

I apparently need to recreate the /var/lib/ceph/osd/ceph-* directories and get them mounted... however, I cannot figure out for the live of google and me to get them mounted.
If ceph packages are installed the base directories will be created.
 
Two separate servers in a three server cluster.

From the only machine that lived:
Code:
root@supermicro:~# ceph -s
  cluster:
    id:     d62464d5-4e1f-4167-8177-c82896881270
    health: HEALTH_WARN
            1 filesystem is degraded
            insufficient standby MDS daemons available
            1 MDSs report slow metadata IOs
            7 osds down
            1 host (7 osds) down
            2 pool(s) have non-power-of-two pg_num
            Reduced data availability: 250 pgs inactive
            Degraded data redundancy: 3481906/5222859 objects degraded (66.667%), 215 pgs degraded, 250 pgs undersized
            73 pgs not deep-scrubbed in time
            1 daemons have recently crashed
            too few PGs per OSD (20 < min 30)

  services:
    mon: 2 daemons, quorum supermicro,pve2 (age 92m)
    mgr: supermicro(active, since 92m), standbys: pve2
    mds: cephfs:1/1 {0=supermicro=up:replay}
    osd: 18 osds: 5 up (since 96m), 12 in (since 46m)

  data:
    pools:   2 pools, 250 pgs
    objects: 1.74M objects, 6.6 TiB
    usage:   6.6 TiB used, 21 TiB / 27 TiB avail
    pgs:     100.000% pgs not active
             3481906/5222859 objects degraded (66.667%)
             215 undersized+degraded+peered
             35  undersized+peered

root@supermicro:~#

Code:
root@supermicro:~# ceph osd tree
ID CLASS WEIGHT   TYPE NAME           STATUS REWEIGHT PRI-AFF
-1       62.76027 root default
-3       19.09940     host pve2
 0   hdd  2.72849         osd.0         down  1.00000 1.00000
 1   hdd  2.72849         osd.1         down  1.00000 1.00000
 2   hdd  2.72849         osd.2         down  1.00000 1.00000
 3   hdd  2.72849         osd.3         down  1.00000 1.00000
 9   hdd  2.72849         osd.9         down  1.00000 1.00000
10   hdd  2.72849         osd.10        down  1.00000 1.00000
11   hdd  2.72849         osd.11        down  1.00000 1.00000
-7       16.37091     host pve3
12   hdd  2.72849         osd.12        down        0 1.00000
13   hdd  2.72849         osd.13        down        0 1.00000
14   hdd  2.72849         osd.14        down        0 1.00000
15   hdd  2.72849         osd.15        down        0 1.00000
16   hdd  2.72849         osd.16        down        0 1.00000
17   hdd  2.72849         osd.17        down        0 1.00000
-5       27.28996     host supermicro
 4   hdd  5.45799         osd.4           up  1.00000 1.00000
 5   hdd  5.45799         osd.5           up  1.00000 1.00000
 6   hdd  5.45799         osd.6           up  1.00000 1.00000
 7   hdd  5.45799         osd.7           up  1.00000 1.00000
 8   hdd  5.45799         osd.8           up  1.00000 1.00000
root@supermicro:~#

If ceph packages are installed the base directories will be created.
The base directory is created but none of the OSD disks /var/lib/ceph/osd/ceph-* wasn't created from the disks.
 
Okay, after the 123rd time switching up my google terms.... "mount ceph lvm recovery" isn't useful lol.

Code:
ceph-volume lvm activate --all

fixed it.
 
  • Like
Reactions: GTA_doum
A reboot should do this as well.
Well, that didn't happen. On two servers, both with a fresh install of 6.2.

I really should test to make sure it comes up after a reboot. I'm sure it will now once it was activated on the new machine.
 
Okay, after the 123rd time switching up my google terms.... "mount ceph lvm recovery" isn't useful lol.

Code:
ceph-volume lvm activate --all

fixed it.
Like you, rebooting was not enough, only this command restarted the osd! Thanks a million, it's been three days I was looking for a way to start the ceph osds!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!