Ceph OSD problem

Hany Elsersawy · Dec 30, 2018

Hi,
I have 3 cluster servers working fine with Ceph storage and all VM's too. Now i have 2 new server joined the cluster successfully but by mistake i have run command ceph purge on cluster server 4 that's lead to ceph storage on all nodes down and the ceph.conf file doesn't exist any more.
So can i rebuild the ceph again without loosing VM's data.
Here is one server lsblk command:
==
sdb 8:16 0 1.1T 0 disk
└─sdb1 8:17 0 1.1T 0 part
sdc 8:32 0 1.1T 0 disk
└─sdc1 8:33 0 1.1T 0 part
sdd 8:48 0 1.1T 0 disk
└─sdd1 8:49 0 1.1T 0 part
==
Those disks are the ceph storage created.
Thanks

sb-jw · Dec 30, 2018

Hey,

I have posted here some Configs: https://forum.proxmox.com/threads/pveceph-purge-recovery-advice-needed.50002/#post-233126

Check if you can use them.
Hope you have your fsid, admin keyring and other details? If you lost your mons, it is possible to rebuild the mons because every OSD knows who they are.

Please write down all ceph related Infos, how man pools, how man replica, how man osd, was ceph in a healthy state, etc.
All those Infos can help you to recreate the CEPH.
If you do not wipe the OSDs, the data still be there.
Post the shell history here, if it's possible with the command output so we will see what's going on there.

Hany Elsersawy · Dec 30, 2018

HI sb-jw,
I have run command: pveceph init --network 11.11.11.0/24 and this helped me on creating [global] section with this info:

Code:

[global]
         auth client required = cephx
         auth cluster required = cephx
         auth service required = cephx
         cluster network = 11.11.11.0/24
         fsid = 96c4a26a-4415-479d-a8ff-ccf74eb435ae
         keyring = /etc/pve/priv/$cluster.$name.keyring
         mon allow pool delete = true
         mon_dns_srv_name = ceph-mon
         osd journal size = 5120
         osd pool default min size = 2
         osd pool default size = 3
         public network = 11.11.11.0/24

fsid created again, pools and replica can't remember,osd are 9 (3 on each server) and yes ceph was healthy and was working fine.
regarding the history, the purge command was from new server joined the cluster number 4 server so i guess it won't help.

Hany Elsersawy · Dec 30, 2018

Somehow I have manged to renable the monitor and : ceph -s shows:

Code:

  cluster:
    id:     b7bad4ec-df7c-40c9-b094-8bda16be9823
    health: HEALTH_WARN
            Reduced data availability: 128 pgs inactive

  services:
    mon: 3 daemons, quorum o-px01,o-px02,o-px03
    mgr: o-px01(active), standbys: o-px03, o-px02
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   1 pools, 128 pgs
    objects: 0 objects, 0B
    usage:   0B used, 0B / 0B avail
    pgs:     100.000% pgs unknown
             128 unknown

showing that no OSD there and also for all disks i have reactivate them by:
ceph-disk activate --reactivate /dev/sdb1 and the other disks.

Search

Search

Ceph OSD problem

Hany Elsersawy

New Member

sb-jw

Famous Member

Hany Elsersawy

New Member

Hany Elsersawy

New Member