Where does Proxmox store Ceph monitor settings?

nanocosm

Active Member
Sep 20, 2016
9
3
43
54
I've a 5 node Ceph PVE Cluster. On node pve1 I tried to set up a ceph monitor which failed at some point. Now the WEB/Cli pveceph mon create always throws the message monitor 'pve1' already exists

I already removed the monitor successfully with ceph, double checked that the entries are removed from /etc/pve/ceph.conf and /etc/pve/storage.cfg even checked /var/lib/ceph

The last solution would be to re-install PVE on that node but I'd like to avoid this currently.

Where does Proxmox stores this information?
 
Can you check if the systemd unit of the old monitor is still active by running systemctl list-units | grep ceph-mon?
If it is active and running, try stopping it with systemctl disable ceph-mon@pve1
 
I already tried this w/o success...

Code:
root@pve1:~# pveceph mon create
monitor 'pve1' already exists

1606909232768.png

Code:
root@pve1:~# cat /etc/pve/ceph.conf
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 192.168.10.0/24
         filestore_xattr_use_omap = true
         fsid = 430b3502-627b-441d-92bc-b31b381d8cc4
         mon_allow_pool_delete = true
         mon_host = 192.168.10.12 192.168.10.13 192.168.10.14 192.168.10.15
         osd_journal_size = 5120
         osd_pool_default_min_size = 1
         public_network = 192.168.10.0/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring
         osd_max_backfills = 2
         osd_recovery_max_active = 2


[mon.pve2]
         host = pve2
         mon_addr = 192.168.10.12:6789

[mon.pve3]
         host = pve3
         mon_addr = 192.168.10.13:6789

[mon.pve4]
         host = pve4
         mon_addr = 192.168.10.14:6789

[mon.pve5]
         host = pve5
         mon_addr = 192.168.10.15:6789
 
Have you tried resetting all Ceph daemons on the node with systemctl restart ceph.target?
Replicating the issue has given me quite inconsistent results, but I can replicate sometimes by deleting /var/lib/ceph/mon/ceph-pve1, removing ceph storage in /etc/pve/storage.cfg and removing references to the monitor from /etc/pve/ceph.conf. In these cases, disabling the monitor systemd unit with systemctl disable ceph-mon@pve1 and restarting ceph on the affected node with systemctl restart ceph.target has always fixed the issue.

Edit:
Just to note, while replicating the issue, I was just making sure to follow every step you took. In practice, as long as another monitor is in place, I don't see any reason to delete ceph storage before recreating the monitor.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!