No OSDs in CEPH after recreating monitor

daubner

New Member
Jan 7, 2025
8
1
3
Hello!

I'm trying to create a disaster recovery plan for our PVE cluster including CEPH. Our current config involves three monitors on our three servers.
We'll be using three monitors and standard pool configuration (3 replicas). I'm trying to set up a manual for deleting monitor configuration and running on only one monitor.

I got a monmap from quorate ceph, removed the two stopped monitors from it and edited ceph.conf accordingly:
Code:
root@nextclouda:~# ceph mon getmap -o /root/monmap
root@nextclouda:~# monmaptool --rm nextcloudb /root/monmap
root@nextclouda:~# monmaptool --rm nextcloudc /root/monmap
root@nextclouda:~# cat /etc/pve/ceph.conf
[global]
    auth_client_required = cephx
    auth_cluster_required = cephx
    auth_service_required = cephx
    cluster_network = 10.0.0.1/24
    fsid = cf282c03-77a3-458d-8989-b4a477f121dd
    mon_allow_pool_delete = true
    mon_host = 10.0.1.1
#10.0.1.2 10.0.1.3
    ms_bind_ipv4 = true
    ms_bind_ipv6 = false
    osd_pool_default_min_size = 2
    osd_pool_default_size = 3
    public_network = 10.0.1.1/24

[client]
    keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
    keyring = /etc/pve/ceph/$cluster.$name.keyring

[mds]
    keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.nextclouda]
    host = nextclouda
    mds_standby_for_name = pve

[mds.nextcloudb]
    host = nextcloudb
    mds_standby_for_name = pve

[mds.nextcloudc]
    host = nextcloudc
    mds_standby_for_name = pve

[mon.nextclouda]
    public_addr = 10.0.1.1

#[mon.nextcloudb]
#    public_addr = 10.0.1.2
#
#[mon.nextcloudc]
#    public_addr = 10.0.1.3

Using these commands I was able to remove monitor configuration and recreate one monitor that has a quorum (third host nextcloudc is fully shutdown and monitor service on second host nextcloudb is stopped manually):
Code:
root@nextclouda:~# systemctl stop ceph-mon@nextclouda
root@nextclouda:~# rm -rf /var/lib/ceph/mon/ceph-nextclouda
root@nextclouda:~# ceph-mon --monmap /root/monmap --keyring /etc/pve/priv/ceph.mon.keyring --mkfs -i nextclouda -m 10.0.1.1
root@nextclouda:~# chown -R ceph:ceph /var/lib/ceph/mon/ceph-nextclouda
root@nextclouda:~# systemctl start ceph-mon@nextclouda
root@nextclouda:~# ceph -s
  cluster:
    id:     cf282c03-77a3-458d-8989-b4a477f121dd
    health: HEALTH_WARN
            mon is allowing insecure global_id reclaim

  services:
    mon: 1 daemons, quorum nextclouda (age 50s)
    mgr: no daemons active
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     

root@nextclouda:~# ceph osd tree
ID  CLASS  WEIGHT  TYPE NAME     STATUS  REWEIGHT  PRI-AFF
-1              0  root default

But as you can see, now the monitor doesn't see any OSDs, pools or managers for cephfs. I'm trying to do this without recreating everything manually but I'll resort to it if I have to. I'll be very thankfull for your help and/or insights if what I'm trying to do makes sense.

I have a backup of /var/lib/ceph and original monmap in case it can help.


Code:
package versions:

proxmox-ve: 8.3.0 (running kernel: 6.8.12-9-pve)
pve-manager: 8.3.5 (running version: 8.3.5/dac3aa88bac3f300)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8: 6.8.12-9
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-4-pve-signed: 6.8.12-4
ceph: 19.2.1-pve2
ceph-fuse: 19.2.1-pve2
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
dnsmasq: 2.90-4~deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve1
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.1
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.0
libpve-guest-common-perl: 5.2.0
libpve-http-server-perl: 5.2.0
libpve-network-perl: 0.10.1
libpve-rs-perl: 0.9.3
libpve-storage-perl: 8.3.5
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.3.7-1
proxmox-backup-file-restore: 3.3.7-1
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.8
pve-cluster: 8.1.0
pve-container: 5.2.5
pve-docs: 8.3.1
pve-edk2-firmware: 4.2025.02-3
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.15-3
pve-ha-manager: 4.0.6
pve-i18n: 3.4.1
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-1
qemu-server: 8.3.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2

Thank you very much and have a nice rest of the day!
 
With "# ceph-mon --monmap /root/monmap --keyring /etc/pve/priv/ceph.mon.keyring --mkfs -i nextclouda -m 10.0.1.1" you created a new MON database (--mkfs) and removed all info from the old one, not only the monmap.

You should have just inserted the new monmap with "ceph-mon -i mon.nextclouda --inject-monmap /root/monmap".

Your old Ceph cluster may be restored by extracting the cluster map from one of the OSDs:

https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds
 
Thank you very much for your response!

I'm able to restore the original monitor by destroying the monitor and redeploying it (through Proxmox GUI), but it's good to know we can extract the monmap from OSDs.

Unfortunately I'm still running into problems:
before inserting the monmap the monitor is still somehow running even if the systemctl service shows it stopped but I can kill the process:
Code:
root@nextclouda:~# lsof -i :3300
COMMAND      PID USER   FD   TYPE   DEVICE SIZE/OFF NODE NAME
ceph-osd    4046 ceph   91u  IPv4 24004284      0t0  TCP 10.0.1.1:43066->10.0.1.1:3300 (ESTABLISHED)
ceph-osd    4052 ceph   92u  IPv4 24057885      0t0  TCP 10.0.1.1:43052->10.0.1.1:3300 (ESTABLISHED)
ceph-mds 1063362 ceph   38u  IPv4 24041674      0t0  TCP 10.0.1.1:43090->10.0.1.1:3300 (ESTABLISHED)
ceph-mds 1277231 ceph   31u  IPv4 24039765      0t0  TCP 10.0.1.1:43076->10.0.1.1:3300 (ESTABLISHED)
ceph-mds 1278172 ceph   30u  IPv4 24036442      0t0  TCP 10.0.1.1:43096->10.0.1.1:3300 (ESTABLISHED)
ceph-mon 1544762 root   29u  IPv4 24003721      0t0  TCP 10.0.1.1:3300 (LISTEN)
ceph-mon 1544762 root   32u  IPv4 24040538      0t0  TCP 10.0.1.1:3300->10.0.1.1:43052 (ESTABLISHED)
ceph-mon 1544762 root   37u  IPv4 24040541      0t0  TCP 10.0.1.1:3300->10.0.1.1:43066 (ESTABLISHED)
ceph-mon 1544762 root   40u  IPv4 24040544      0t0  TCP 10.0.1.1:3300->10.0.1.2:50622 (ESTABLISHED)

root@nextclouda:~# kill -9 1544762

Once killed, I can insert a new monmap (without two other monitors and without --mkfs) and attempt to start it, but it's still not working as expected:
e11 get_health_metrics reporting 3487 slow ops, oldest is auth(proto 0 41 bytes epoch 0)

Is there maybe a config I'm missing? Do the stopped monitors just need to be removed from /etc/pve/ceph.conf? Does the modified monmap need to be pushed to managers/osds as well?

Thank you!
 
You really need your old cluster map extracted from the OSDs.

As long as you are only deploying a new MON you create a new Ceph cluster. The existing OSDs will not be able to join it.

The ceph.conf file does not matter here. It only tells the clients and the OSDs where to find the MONs.