: Ceph auth=none required after reinstall on Proxmox 9 / Ceph 19 Squid — cephx breaks monitors, mgr, OSDs

n7qnm

Member
Jan 4, 2024
19
3
8
Prosser, WA, USA
www.n7qnm.net
Background: After a failed mixed Proxmox 8/9 upgrade broke Ceph, I did a full Ceph reinstall on a 5-node Proxmox 9.1 cluster running Ceph 19.2.3 (Squid). After reinstall I could not get cephx working — enabling it causes monitors to lose quorum, OSDs and managers fail to start.

Symptoms:

With cephx enabled, all daemons fail with either handle_auth_bad_method server allowed_methods [1] but i only support [2] or the reverse

Monitors lose quorum immediately when auth_*_required = none is removed from ceph.conf

OSDs fail under systemd with failed to fetch mon config but work fine when run in foreground as root

pvesm list <storage> returns rbd error: rbd: listing images failed: (95) Operation not supported with auth=none, or (13) Permission denied with cephx

What I found:

After reinstall, monitor keyrings were stored as [mon.] instead of [mon.pve0] etc, and were not registered in the cluster auth database — fixed by using ceph-authtool to create properly named keyrings and ceph auth add to register them

The ceph user was not in the www-data group, so it couldn't read /etc/pve/ceph.conf (owned by root:www-data, mode 640) — fixed with usermod -aG www-data ceph on each node

OSDs under systemd fail auth negotiation even with auth=none in ceph.conf — workaround is --no-mon-config in a systemd override

Proxmox RBD plugin (PVE::CephConfig) sets auth_supported=cephx if the storage keyring file exists, causing rados_connect to fail with error 95 — workaround is removing the keyring file so it falls back to auth_supported=none

Ceph config database auth settings (set via ceph config set) override ceph.conf and affect daemons differently — caused significant confusion during troubleshooting

Current workarounds in place:

auth_*_required = none in /etc/pve/ceph.conf

--no-mon-config systemd override for all OSDs on all nodes

Storage keyring files removed (renamed to .bak)

usermod -aG www-data ceph on all nodes

Questions:

Why do monitors fail to form quorum with cephx even after properly registering mon.pve0/pve1/pve4 keys in the auth database?
Is there a supported procedure for re-enabling cephx after a Ceph reinstall on an existing Proxmox cluster?

Is the www-data group membership for the ceph user supposed to be set automatically by Proxmox? If so, why wasn't it set during reinstall?

Should the OSD systemd unit be able to read /etc/pve/ceph.conf via the symlink at /etc/ceph/ceph.conf? The symlink exists but the file permissions prevent the ceph user from reading it without the www-data group.

Environment:

Proxmox VE 9.1.5

Ceph 19.2.3-pve4 (Squid)

5 nodes, 3 monitors (pve0, pve1, pve4), 6 OSDs across 3 hosts

Fresh Ceph reinstall (not upgrade)