Ceph OSD problem

Hany Elsersawy

New Member
Dec 30, 2018
3
0
1
37
Hi,
I have 3 cluster servers working fine with Ceph storage and all VM's too. Now i have 2 new server joined the cluster successfully but by mistake i have run command ceph purge on cluster server 4 that's lead to ceph storage on all nodes down and the ceph.conf file doesn't exist any more.
So can i rebuild the ceph again without loosing VM's data.
Here is one server lsblk command:
==
sdb 8:16 0 1.1T 0 disk
└─sdb1 8:17 0 1.1T 0 part
sdc 8:32 0 1.1T 0 disk
└─sdc1 8:33 0 1.1T 0 part
sdd 8:48 0 1.1T 0 disk
└─sdd1 8:49 0 1.1T 0 part
==
Those disks are the ceph storage created.
Thanks
 
Hey,

I have posted here some Configs: https://forum.proxmox.com/threads/pveceph-purge-recovery-advice-needed.50002/#post-233126

Check if you can use them.
Hope you have your fsid, admin keyring and other details? If you lost your mons, it is possible to rebuild the mons because every OSD knows who they are.

Please write down all ceph related Infos, how man pools, how man replica, how man osd, was ceph in a healthy state, etc.
All those Infos can help you to recreate the CEPH.
If you do not wipe the OSDs, the data still be there.
Post the shell history here, if it's possible with the command output so we will see what's going on there.
 
HI sb-jw,
I have run command: pveceph init --network 11.11.11.0/24 and this helped me on creating [global] section with this info:


Code:
[global]
         auth client required = cephx
         auth cluster required = cephx
         auth service required = cephx
         cluster network = 11.11.11.0/24
         fsid = 96c4a26a-4415-479d-a8ff-ccf74eb435ae
         keyring = /etc/pve/priv/$cluster.$name.keyring
         mon allow pool delete = true
         mon_dns_srv_name = ceph-mon
         osd journal size = 5120
         osd pool default min size = 2
         osd pool default size = 3
         public network = 11.11.11.0/24

fsid created again, pools and replica can't remember,osd are 9 (3 on each server) and yes ceph was healthy and was working fine.
regarding the history, the purge command was from new server joined the cluster number 4 server so i guess it won't help.
 
Somehow I have manged to renable the monitor and : ceph -s shows:
Code:
  cluster:
    id:     b7bad4ec-df7c-40c9-b094-8bda16be9823
    health: HEALTH_WARN
            Reduced data availability: 128 pgs inactive

  services:
    mon: 3 daemons, quorum o-px01,o-px02,o-px03
    mgr: o-px01(active), standbys: o-px03, o-px02
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   1 pools, 128 pgs
    objects: 0 objects, 0B
    usage:   0B used, 0B / 0B avail
    pgs:     100.000% pgs unknown
             128 unknown
showing that no OSD there and also for all disks i have reactivate them by:
ceph-disk activate --reactivate /dev/sdb1 and the other disks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!