Hello All,
We have a proxmox cluster of 11 nodes and with 9 nodes installed with ceph ( 2 only used for compute). We have 4 ceph mon and 4 mgr , ceph mgr is switching between active to standby mgr ( mostly between the same two mgr nodes).
PVE version: pve-manager/8.3.3/f157a38b211595d6 (running kernel: 6.8.12-6-pve)
Ceph version: 19.2.0
Ceph conf:
Ceph log:
Even the journalctl logs doesn't have much information.
Thanks
Saran
We have a proxmox cluster of 11 nodes and with 9 nodes installed with ceph ( 2 only used for compute). We have 4 ceph mon and 4 mgr , ceph mgr is switching between active to standby mgr ( mostly between the same two mgr nodes).
PVE version: pve-manager/8.3.3/f157a38b211595d6 (running kernel: 6.8.12-6-pve)
Ceph version: 19.2.0
Ceph conf:
Code:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 172.17.2.96/27
fsid = abc3a412-6180-4u06-8sa6-71a06366f927
mon_allow_pool_delete = true
mon_host = 172.17.2.110 172.17.2.111 172.17.2.103 172.17.2.104
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 2
public_network = 172.17.2.96/27
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[client.crash]
keyring = /etc/pve/ceph/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.testfedprx03]
host = testfedprx03
mds_standby_for_name = pve
[mds.testfedprx04]
host = testfedprx04
mds_standby_for_name = pve
[mds.test1fedprx04]
host = test1fedprx04
mds_standby_for_name = pve
[mon.testfedprx03]
public_addr = 172.17.2.110
[mon.testfedprx04]
public_addr = 172.17.2.111
[mon.test1fedprx03]
public_addr = 172.17.2.103
[mon.test1fedprx04]
public_addr = 172.17.2.104
Ceph log:
Code:
Feb 18 14:21:36 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:21:36.046+0100 7d512be006c0 0 [balancer INFO root] prepared 0/10 upmap changes
Feb 18 14:22:50 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:50.680+0100 7d51070006c0 0 [rbd_support INFO root] MirrorSnapshotScheduleHandler: load_schedules
Feb 18 14:22:50 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:50.687+0100 7d51264006c0 0 log_channel(cluster) log [DBG] : pgmap v407: 6177 pgs: 1 active+clean+scrubbing+deep, 6176 active+clean; 73 TiB data, 217 TiB used, 272 >
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.341+0100 7d5141e006c0 -1 mgr handle_mgr_map I was active but no longer am
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.341+0100 7d5141e006c0 1 mgr respawn e: '/usr/bin/ceph-mgr'
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.341+0100 7d5141e006c0 1 mgr respawn 0: '/usr/bin/ceph-mgr'
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.341+0100 7d5141e006c0 1 mgr respawn 1: '-f'
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.341+0100 7d5141e006c0 1 mgr respawn 2: '--cluster'
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.341+0100 7d5141e006c0 1 mgr respawn 3: 'ceph'
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.341+0100 7d5141e006c0 1 mgr respawn 4: '--id'
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.341+0100 7d5141e006c0 1 mgr respawn 5: 'testfedprx04'
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.341+0100 7d5141e006c0 1 mgr respawn 6: '--setuser'
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.341+0100 7d5141e006c0 1 mgr respawn 7: 'ceph'
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.341+0100 7d5141e006c0 1 mgr respawn 8: '--setgroup'
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.341+0100 7d5141e006c0 1 mgr respawn 9: 'ceph'
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.341+0100 7d5141e006c0 1 mgr respawn respawning with exe /usr/bin/ceph-mgr
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.341+0100 7d5141e006c0 1 mgr respawn exe_path /proc/self/exe
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: did not load config file, using default settings.
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: ignoring --setuser ceph since I am not root
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: ignoring --setgroup ceph since I am not root
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.450+0100 72fc8dc7e280 -1 Errors while parsing config file!
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.450+0100 72fc8dc7e280 -1 can't open ceph.conf: (2) No such file or directory
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: unable to get monitor info from DNS SRV with service name: ceph-mon
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.455+0100 72fc8dc7e280 -1 failed for service _ceph-mon._tcp
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: 2025-02-18T14:22:51.455+0100 72fc8dc7e280 -1 monclient: get_monmap_and_config cannot identify monitors to contact
Feb 18 14:22:51 testfedprx04 ceph-mgr[345148]: failed to fetch mon config (--no-mon-config to skip)
Feb 18 14:22:51 testfedprx04 systemd[1]: ceph-mgr@testfedprx04.service: Main process exited, code=exited, status=1/FAILURE
Feb 18 14:22:51 testfedprx04 systemd[1]: ceph-mgr@testfedprx04.service: Failed with result 'exit-code'.
Even the journalctl logs doesn't have much information.
Thanks
Saran