PVE 8.4 Ceph Regression: No Recovery Path After All MON DBs Are Lost

May 17, 2023
66
35
23
We discovered a serious regression in Proxmox VE 8.4 when managing Ceph clusters:


Summary​


On PVE ≤ 8.3, if all Ceph MONs were lost (bad upgrade, purge, accidental wipe), you could re-bootstrap the cluster manually using monmaptool + ceph-mon --mkfs and recover OSDs.


On PVE 8.4, this path no longer works.
  • The wizard/init flow refuses to create a fresh MON DB if none exist.
  • Systemd’s ceph-mon@.service expects CLUSTER=ceph and pre-seeded keyrings that only exist when the wizard succeeds.
  • Manual bootstrap (monmaptool + mkfs) leaves the MON process crashing in a loop (RADOS object not found).

Effect​

  • If all MON DBs are lost, the Ceph cluster is unrecoverable in place.
  • Even with healthy OSDs, you cannot bring the cluster back online.
  • The only option left is to completely re-init Ceph with a new FSID and restore from backup.

This raises the risk profile significantly:
  • Previously, a failed upgrade = downtime, but recoverable.
  • Now, a failed upgrade can = permanent data loss if no backup exists.

Implications​

  • Operators can no longer rely on MON bootstrap procedures.
  • The recommended “3 MONs” is insufficient for safe upgrades. Likely we need 5 MONs minimum to avoid quorum collapse during upgrades.
  • There is currently no documented recovery path for total MON failure.

Ask​

  • Can Proxmox clarify if this is an intended change?
  • If so, are there official workarounds (e.g. pveceph init bootstrap option, documented MON DB backup/restore procedure)?
  • If not intended, can we track this as a bug/regression in 8.4 Ceph integration?

This is urgent: production clusters can get trapped in a dead-end state after a failed Ceph upgrade, something that was recoverable in prior releases.




⚠️ Environment:
  • PVE 8.4, Ceph Quincy/Pacific (18.2.7-pve1)
  • Clean purge/reinstall attempts fail with rados_connect failed - No such file or directory.
  • Manual monmap/mkfs leaves MON crashing on startup.



Suggested mitigations for operators​

  • Run 5 MONs minimum to reduce risk during rolling upgrades.
  • Back up /var/lib/ceph/mon/*/store.db and /etc/pve/priv/ceph.* before any Ceph upgrade.
  • Do not purge all MONs at once: you may not be able to recover.



This feels like a critical regression that needs dev acknowledgement.
 
Last edited:
Follow-up / Findings:

I replicated this issue in lab on 3 nodes (all PVE 8.4).
  • I fully purged Ceph back to a zeroized state on each node.
  • Ran the standard Ceph install + wizard via the PVE GUI.
  • The wizard reported “Installation successful”.

Result:
  • /etc/pve/ceph.conf is written and consistent across nodes.
  • fsid is present.
  • Keyrings (ceph.client.admin.keyring, ceph.mon.keyring) are created under /etc/pve/priv.
  • But no monitor DB is created under /var/lib/ceph/mon/* on any node.
  • ceph-mon@<node>; units remain inactive (dead) with nothing to start from.

This leaves the cluster in a state where configs + keys exist, but no quorum can ever form because the initial MON DB is never initialized.

So it looks like a regression in the 8.4 Ceph wizard: writes config + keys, but skips/doesn’t finish the first monitor bootstrap.

This is the output after the install, landing on a dead Ceph environment.
Bash:
root@cl2-hci01:~# cat /etc/pve/ceph.conf
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 192.168.253.111/24
        fsid = 3420a5f9-4347-40e5-b5f8-dfd71e2ae761
        mon_allow_pool_delete = true
        osd_pool_default_min_size = 2
        osd_pool_default_size = 2
        public_network = 192.168.254.111/24

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring


root@cl2-hci01:~# for node in cl2-qrm1 cl2-hci01 cl2-hci02; do
  echo "--- $node ---"
  ssh root@$node cat /etc/pve/ceph.conf || echo "missing"
done
--- cl2-qrm1 ---
ssh: Could not resolve hostname cl2-qrm1: Name or service not known
missing

--- cl2-hci01 ---
The authenticity of host 'cl2-hci01 (172.16.10.111)' can't be established.
ED25519 key fingerprint is SHA256:hZJ9A7iMwHm/eaenk75KMLXU61xpzVRPUomNRI01lZ4.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'cl2-hci01' (ED25519) to the list of known hosts.
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 192.168.253.111/24
        fsid = 3420a5f9-4347-40e5-b5f8-dfd71e2ae761
        mon_allow_pool_delete = true
        osd_pool_default_min_size = 2
        osd_pool_default_size = 2
        public_network = 192.168.254.111/24

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring

--- cl2-hci02 ---
The authenticity of host 'cl2-hci02 (172.16.10.112)' can't be established.
ED25519 key fingerprint is SHA256:XFdNC4kYUU3ZIL3+sdDJ/prHN99/yVEXpyzzlOKk1V4.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'cl2-hci02' (ED25519) to the list of known hosts.
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 192.168.253.111/24
        fsid = 3420a5f9-4347-40e5-b5f8-dfd71e2ae761
        mon_allow_pool_delete = true
        osd_pool_default_min_size = 2
        osd_pool_default_size = 2
        public_network = 192.168.254.111/24

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring


root@cl2-hci01:~# ls -l /etc/pve/priv/ceph.client.admin.keyring
-rw------- 1 root www-data 151 Sep 20 21:52 /etc/pve/priv/ceph.client.admin.keyring

root@cl2-hci01:~# ls -l /etc/pve/priv/ | grep ceph
-rw------- 1 root www-data  151 Sep 20 21:52 ceph.client.admin.keyring
-rw------- 1 root www-data  228 Sep 20 21:52 ceph.mon.keyring

root@cl2-hci01:~# ls -ld /var/lib/ceph/mon/*
ls: cannot access '/var/lib/ceph/mon/*': No such file or directory

root@cl2-hci01:~# grep fsid /etc/pve/ceph.conf
        fsid = 3420a5f9-4347-40e5-b5f8-dfd71e2ae761

root@cl2-hci01:~# strings /var/lib/ceph/mon/*/store.db/* 2>/dev/null | grep -m1 fsid
# (no output)

root@cl2-hci01:~# systemctl status ceph-mon@$(hostname -s)
○ ceph-mon@cl2-hci01.service - Ceph cluster monitor daemon
     Loaded: loaded (/lib/systemd/system/ceph-mon@.service; disabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
             └─ceph-after-pve-cluster.conf
     Active: inactive (dead)