Proxmox Cluster 3 nodes, Monitors refuse to start

Danny-10-10

New Member
Sep 24, 2024
21
2
3
I posted this in the wrong section before so i am posting this here hoping is the right place.

Hi all, i am facing a strange issue, after using having a proxmox pc for my self hosted app I decided to play around and create a cluter to dive deeper into the HA topics, i dowloaded the latest ISO and build up a cluster from scratch. My Cluster works, i can see every node, my ceph storage says everythng is ok. Managers works on all 3 node, metadata is ok on all 3 nodes but Monitor started only on the first node. When i try to make it start on the others node, nothing happen.
This is the syslog of the second node

Oct 28 00:13:47 pve2 ceph-mon[1041]: 2025-10-28T00:13:47.531+0100 7265f2d4c6c0 -1 received signal: Hangup from killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw rbd-mirror cephfs-mirror (PID: 2949170) UID: 0
Oct 28 00:13:47 pve2 ceph-mon[1041]: 2025-10-28T00:13:47.531+0100 7265f2d4c6c0 -1 mon.pve2@0(leader) e1 *** Got Signal Hangup ***
Oct 28 00:13:47 pve2 ceph-mon[1041]: 2025-10-28T00:13:47.554+0100 7265f2d4c6c0 -1 received signal: Hangup from (PID: 2949171) UID: 0
Oct 28 00:13:47 pve2 ceph-mon[1041]: 2025-10-28T00:13:47.554+0100 7265f2d4c6c0 -1 mon.pve2@0(leader) e1 *** Got Signal Hangup ***

this is from the third node

ct 28 00:48:10 pve3 ceph-mon[1030]: 2025-10-28T00:48:10.850+0100 7f59362b76c0 -1 received signal: Hangup from killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw rbd-mirror cephfs-mirror (PID: 740342) UID: 0
Oct 28 00:48:10 pve3 ceph-mon[1030]: 2025-10-28T00:48:10.852+0100 7f59362b76c0 -1 mon.pve3@0(leader) e1 *** Got Signal Hangup ***
Oct 28 00:48:10 pve3 ceph-mon[1030]: 2025-10-28T00:48:10.871+0100 7f59362b76c0 -1 received signal: Hangup from (PID: 740343) UID: 0
Oct 28 00:48:10 pve3 ceph-mon[1030]: 2025-10-28T00:48:10.871+0100 7f59362b76c0 -1 mon.pve3@0(leader) e1 *** Got Signal Hangup ***

I am kinda stuck
 

Attachments

  • Screenshot 2025-10-21 103158.png
    Screenshot 2025-10-21 103158.png
    21 KB · Views: 9
  • Screenshot 2025-10-28 112840.png
    Screenshot 2025-10-28 112840.png
    127 KB · Views: 9
  • Screenshot 2025-10-28 112900.png
    Screenshot 2025-10-28 112900.png
    101.7 KB · Views: 9
Whats the output of
  • ceph -s
  • cat /etc/pve/ceph.conf
Please paste the output within [code][/code] tags or use the formatting buttons of the editor </>.
 
Thank you for your reply


ceph -s output

Code:
cluster:
    id:     b1e9e7bc-2ec5-4838-9702-7a66f1749bc3
    health: HEALTH_WARN
            2 OSD(s) experiencing slow operations in BlueStore
 
  services:
    mon: 1 daemons, quorum pve (age 13h)
    mgr: pve(active, since 13h), standbys: pve2, pve3
    mds: 1/1 daemons up, 2 standby
    osd: 3 osds: 3 up (since 13h), 3 in (since 7w)
 
  data:
    volumes: 1/1 healthy
    pools:   4 pools, 97 pgs
    objects: 23.00k objects, 88 GiB
    usage:   263 GiB used, 1.1 TiB / 1.4 TiB avail
    pgs:     97 active+clean
 
  io:
    client:   49 KiB/s wr, 0 op/s rd, 9 op/s wr

cat /etc/pve/ceph.conf output
Code:
[global]
        auth_client_required = cephx
        auth_cluster_required = cephx
        auth_service_required = cephx
        cluster_network = 192.168.1.210/24
        fsid = b1e9e7bc-2ec5-4838-9702-7a66f1749bc3
        mon_allow_pool_delete = true
        mon_host = 192.168.1.210
        ms_bind_ipv4 = true
        ms_bind_ipv6 = false
        osd_pool_default_min_size = 2
        osd_pool_default_size = 3
        public_network = 192.168.1.210/24

[client]
        keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
        keyring = /etc/pve/ceph/$cluster.$name.keyring

[mds]
        keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.pve]
        host = pve
        mds_standby_for_name = pve

[mds.pve2]
        host = pve2
        mds_standby_for_name = pve

[mds.pve3]
        host = pve3
        mds_standby_for_name = pve

[mon.pve]
        public_addr = 192.168.1.210



PVE has monitor working (192.168.1.210)
PVE2 (192.168.1.209)
PVE3 (192.168.1.208)
 
I don't know if this poses a problem, but the network is not 100% correct. Instead of 192.168.1.210/24, it should be 192.168.1.0/24
 
I modified the file according your suggestion when I try to start the monitor the situation stated above in my first post dosn't change
 
I don't know if this poses a problem, but the network is not 100% correct. Instead of 192.168.1.210/24, it should be 192.168.1.0/24
You mean in the ceph.conf file? that is no problem, as the /24 defines the subnet, so the last octet does not matter.

What is interesting is that, according to the ceph -s and the config file, only one MON is known to the running Ceph cluster.

The other MONs might be shown in the Proxmox VE UI because there are still parts of them around. Try to clean them up on the other two hosts and create them again.

The question is why they didn't show up for the Ceph cluster itself. Do you still have the task logs of the MON creation? You can navigate to NODE → Tasks and set the Task Type filter to cephcreatemon.

Is the network working as expected? Do you have configured a large MTU that might not work as expected?



If the destroy via the web UI doesn't work, try the following on PVE2 and PVE3:
Code:
systemctl disable ceph-mon@$(hostname)

mv /var/lib/ceph/mon-ceph-$(hostname) /root/mon.bkp

You can then later remove the backed up MON dir with rm -rf /root/mon.bkp