We discovered a serious regression in Proxmox VE 8.4 when managing Ceph clusters:
On PVE ≤ 8.3, if all Ceph MONs were lost (bad upgrade, purge, accidental wipe), you could re-bootstrap the cluster manually using monmaptool + ceph-mon --mkfs and recover OSDs.
On PVE 8.4, this path no longer works.
This raises the risk profile significantly:
This is urgent: production clusters can get trapped in a dead-end state after a failed Ceph upgrade, something that was recoverable in prior releases.
Environment:
This feels like a critical regression that needs dev acknowledgement.
Summary
On PVE ≤ 8.3, if all Ceph MONs were lost (bad upgrade, purge, accidental wipe), you could re-bootstrap the cluster manually using monmaptool + ceph-mon --mkfs and recover OSDs.
On PVE 8.4, this path no longer works.
- The wizard/init flow refuses to create a fresh MON DB if none exist.
- Systemd’s ceph-mon@.service expects CLUSTER=ceph and pre-seeded keyrings that only exist when the wizard succeeds.
- Manual bootstrap (monmaptool + mkfs) leaves the MON process crashing in a loop (RADOS object not found).
Effect
- If all MON DBs are lost, the Ceph cluster is unrecoverable in place.
- Even with healthy OSDs, you cannot bring the cluster back online.
- The only option left is to completely re-init Ceph with a new FSID and restore from backup.
This raises the risk profile significantly:
- Previously, a failed upgrade = downtime, but recoverable.
- Now, a failed upgrade can = permanent data loss if no backup exists.
Implications
- Operators can no longer rely on MON bootstrap procedures.
- The recommended “3 MONs” is insufficient for safe upgrades. Likely we need 5 MONs minimum to avoid quorum collapse during upgrades.
- There is currently no documented recovery path for total MON failure.
Ask
- Can Proxmox clarify if this is an intended change?
- If so, are there official workarounds (e.g. pveceph init bootstrap option, documented MON DB backup/restore procedure)?
- If not intended, can we track this as a bug/regression in 8.4 Ceph integration?
This is urgent: production clusters can get trapped in a dead-end state after a failed Ceph upgrade, something that was recoverable in prior releases.

- PVE 8.4, Ceph Quincy/Pacific (18.2.7-pve1)
- Clean purge/reinstall attempts fail with rados_connect failed - No such file or directory.
- Manual monmap/mkfs leaves MON crashing on startup.
Suggested mitigations for operators
- Run 5 MONs minimum to reduce risk during rolling upgrades.
- Back up /var/lib/ceph/mon/*/store.db and /etc/pve/priv/ceph.* before any Ceph upgrade.
- Do not purge all MONs at once: you may not be able to recover.
This feels like a critical regression that needs dev acknowledgement.
Last edited: