Where does Proxmox store Ceph monitor settings?

nanocosm · Dec 2, 2020

I've a 5 node Ceph PVE Cluster. On node pve1 I tried to set up a ceph monitor which failed at some point. Now the WEB/Cli pveceph mon create always throws the message monitor 'pve1' already exists

I already removed the monitor successfully with ceph, double checked that the entries are removed from /etc/pve/ceph.conf and /etc/pve/storage.cfg even checked /var/lib/ceph

The last solution would be to re-install PVE on that node but I'd like to avoid this currently.

Where does Proxmox stores this information?

dylanw · Dec 2, 2020

Can you check if the systemd unit of the old monitor is still active by running systemctl list-units | grep ceph-mon?
If it is active and running, try stopping it with systemctl disable ceph-mon@pve1

nanocosm · Dec 2, 2020

I already tried this w/o success...

Code:

root@pve1:~# pveceph mon create
monitor 'pve1' already exists

Code:

root@pve1:~# cat /etc/pve/ceph.conf
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 192.168.10.0/24
         filestore_xattr_use_omap = true
         fsid = 430b3502-627b-441d-92bc-b31b381d8cc4
         mon_allow_pool_delete = true
         mon_host = 192.168.10.12 192.168.10.13 192.168.10.14 192.168.10.15
         osd_journal_size = 5120
         osd_pool_default_min_size = 1
         public_network = 192.168.10.0/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring
         osd_max_backfills = 2
         osd_recovery_max_active = 2


[mon.pve2]
         host = pve2
         mon_addr = 192.168.10.12:6789

[mon.pve3]
         host = pve3
         mon_addr = 192.168.10.13:6789

[mon.pve4]
         host = pve4
         mon_addr = 192.168.10.14:6789

[mon.pve5]
         host = pve5
         mon_addr = 192.168.10.15:6789

dylanw · Dec 3, 2020

Have you tried resetting all Ceph daemons on the node with systemctl restart ceph.target?
Replicating the issue has given me quite inconsistent results, but I can replicate sometimes by deleting /var/lib/ceph/mon/ceph-pve1, removing ceph storage in /etc/pve/storage.cfg and removing references to the monitor from /etc/pve/ceph.conf. In these cases, disabling the monitor systemd unit with systemctl disable ceph-mon@pve1 and restarting ceph on the affected node with systemctl restart ceph.target has always fixed the issue.

Edit:
Just to note, while replicating the issue, I was just making sure to follow every step you took. In practice, as long as another monitor is in place, I don't see any reason to delete ceph storage before recreating the monitor.

Search

Search

Where does Proxmox store Ceph monitor settings?

nanocosm

Active Member

dylanw

Proxmox Retired Staff

nanocosm

Active Member

dylanw

Proxmox Retired Staff