PVE 8.4 Ceph Regression: No Recovery Path After All MON DBs Are Lost

_--James--_ · Sunday at 04:55

We discovered a serious regression in Proxmox VE 8.4 when managing Ceph clusters:

Summary

On PVE ≤ 8.3, if all Ceph MONs were lost (bad upgrade, purge, accidental wipe), you could re-bootstrap the cluster manually using monmaptool + ceph-mon --mkfs and recover OSDs.

On PVE 8.4, this path no longer works.

The wizard/init flow refuses to create a fresh MON DB if none exist.
Systemd’s ceph-mon@.service expects CLUSTER=ceph and pre-seeded keyrings that only exist when the wizard succeeds.
Manual bootstrap (monmaptool + mkfs) leaves the MON process crashing in a loop (RADOS object not found).

Effect

If all MON DBs are lost, the Ceph cluster is unrecoverable in place.
Even with healthy OSDs, you cannot bring the cluster back online.
The only option left is to completely re-init Ceph with a new FSID and restore from backup.

This raises the risk profile significantly:

Previously, a failed upgrade = downtime, but recoverable.
Now, a failed upgrade can = permanent data loss if no backup exists.

Implications

Operators can no longer rely on MON bootstrap procedures.
The recommended “3 MONs” is insufficient for safe upgrades. Likely we need 5 MONs minimum to avoid quorum collapse during upgrades.
There is currently no documented recovery path for total MON failure.

Ask

Can Proxmox clarify if this is an intended change?
If so, are there official workarounds (e.g. pveceph init bootstrap option, documented MON DB backup/restore procedure)?
If not intended, can we track this as a bug/regression in 8.4 Ceph integration?

This is urgent: production clusters can get trapped in a dead-end state after a failed Ceph upgrade, something that was recoverable in prior releases.

Environment:

PVE 8.4, Ceph Quincy/Pacific (18.2.7-pve1)
Clean purge/reinstall attempts fail with rados_connect failed - No such file or directory.
Manual monmap/mkfs leaves MON crashing on startup.

Suggested mitigations for operators

Run 5 MONs minimum to reduce risk during rolling upgrades.
Back up /var/lib/ceph/mon/*/store.db and /etc/pve/priv/ceph.* before any Ceph upgrade.
Do not purge all MONs at once: you may not be able to recover.

This feels like a critical regression that needs dev acknowledgement.

_--James--_ · Sunday at 07:03

Follow-up / Findings:

I replicated this issue in lab on 3 nodes (all PVE 8.4).

I fully purged Ceph back to a zeroized state on each node.
Ran the standard Ceph install + wizard via the PVE GUI.
The wizard reported “Installation successful”.

Result:

/etc/pve/ceph.conf is written and consistent across nodes.
fsid is present.
Keyrings (ceph.client.admin.keyring, ceph.mon.keyring) are created under /etc/pve/priv.
But no monitor DB is created under /var/lib/ceph/mon/* on any node.
ceph-mon@<node>; units remain inactive (dead) with nothing to start from.

This leaves the cluster in a state where configs + keys exist, but no quorum can ever form because the initial MON DB is never initialized.

So it looks like a regression in the 8.4 Ceph wizard: writes config + keys, but skips/doesn’t finish the first monitor bootstrap.

This is the output after the install, landing on a dead Ceph environment.

Bash:

root@cl2-hci01:~# cat /etc/pve/ceph.conf
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 192.168.253.111/24
        fsid = 3420a5f9-4347-40e5-b5f8-dfd71e2ae761
        mon_allow_pool_delete = true
        osd_pool_default_min_size = 2
        osd_pool_default_size = 2
        public_network = 192.168.254.111/24

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring


root@cl2-hci01:~# for node in cl2-qrm1 cl2-hci01 cl2-hci02; do
  echo "--- $node ---"
  ssh root@$node cat /etc/pve/ceph.conf || echo "missing"
done
--- cl2-qrm1 ---
ssh: Could not resolve hostname cl2-qrm1: Name or service not known
missing

--- cl2-hci01 ---
The authenticity of host 'cl2-hci01 (172.16.10.111)' can't be established.
ED25519 key fingerprint is SHA256:hZJ9A7iMwHm/eaenk75KMLXU61xpzVRPUomNRI01lZ4.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'cl2-hci01' (ED25519) to the list of known hosts.
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 192.168.253.111/24
        fsid = 3420a5f9-4347-40e5-b5f8-dfd71e2ae761
        mon_allow_pool_delete = true
        osd_pool_default_min_size = 2
        osd_pool_default_size = 2
        public_network = 192.168.254.111/24

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring

--- cl2-hci02 ---
The authenticity of host 'cl2-hci02 (172.16.10.112)' can't be established.
ED25519 key fingerprint is SHA256:XFdNC4kYUU3ZIL3+sdDJ/prHN99/yVEXpyzzlOKk1V4.
This key is not known by any other names.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'cl2-hci02' (ED25519) to the list of known hosts.
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 192.168.253.111/24
        fsid = 3420a5f9-4347-40e5-b5f8-dfd71e2ae761
        mon_allow_pool_delete = true
        osd_pool_default_min_size = 2
        osd_pool_default_size = 2
        public_network = 192.168.254.111/24

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring


root@cl2-hci01:~# ls -l /etc/pve/priv/ceph.client.admin.keyring
-rw------- 1 root www-data 151 Sep 20 21:52 /etc/pve/priv/ceph.client.admin.keyring

root@cl2-hci01:~# ls -l /etc/pve/priv/ | grep ceph
-rw------- 1 root www-data  151 Sep 20 21:52 ceph.client.admin.keyring
-rw------- 1 root www-data  228 Sep 20 21:52 ceph.mon.keyring

root@cl2-hci01:~# ls -ld /var/lib/ceph/mon/*
ls: cannot access '/var/lib/ceph/mon/*': No such file or directory

root@cl2-hci01:~# grep fsid /etc/pve/ceph.conf
        fsid = 3420a5f9-4347-40e5-b5f8-dfd71e2ae761

root@cl2-hci01:~# strings /var/lib/ceph/mon/*/store.db/* 2>/dev/null | grep -m1 fsid
# (no output)

root@cl2-hci01:~# systemctl status ceph-mon@$(hostname -s)
○ ceph-mon@cl2-hci01.service - Ceph cluster monitor daemon
     Loaded: loaded (/lib/systemd/system/ceph-mon@.service; disabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
             └─ceph-after-pve-cluster.conf
     Active: inactive (dead)

Search

Search

PVE 8.4 Ceph Regression: No Recovery Path After All MON DBs Are Lost

_--James--_

Member

Summary

Effect

Implications

Ask

Suggested mitigations for operators

_--James--_

Member

We value your privacy

PVE 8.4 Ceph Regression: No Recovery Path After All MON DBs Are Lost

_--James--_

Member

Summary​

Effect​

Implications​

Ask​

Suggested mitigations for operators​

_--James--_

Member

We value your privacy

Summary

Effect

Implications

Ask

Suggested mitigations for operators