Hi All,
We have a 3 node ceph cluster which handles storage for 5 hypervisor nodes.
We are in the process of upgrading to Proxmox 7.4 and have all of the hypervisors and two storage nodes now running 7.2 (and ceph 16.2.11); the upgrade was from 7.2 and ceph 16.2.9.
We have halted our upgrade because on our second storage node (PVEMTS2), the Ceph monitor is displayed as stopped. The two other monitors are working and the cluster is fully operational - we just obviously don't want to leave only two monitors.
We tried to destroy the failed monitor but received an error that it doesn't exist.
We then stopped the service on that node, disabled it and removed the directory from /var/lib/ceph/mon; we also removed the mon entry from ceph.conf and removed the IP from the mon_host line in ceph.conf.
This removed the monitor from the cluster.
We then created a new monitor using pveceph mon create
This updated the ceph.conf, started the service and things should have been good.
However, on the proxmox ceph monitor screen the service is displayed as stopped.
ceph -s gives us
systemctl status ceph-mon@PVEMTS2 gives us
So the service appears to be running but something isn't talking.
Any suggestions on where to go from here?
We have a 3 node ceph cluster which handles storage for 5 hypervisor nodes.
We are in the process of upgrading to Proxmox 7.4 and have all of the hypervisors and two storage nodes now running 7.2 (and ceph 16.2.11); the upgrade was from 7.2 and ceph 16.2.9.
We have halted our upgrade because on our second storage node (PVEMTS2), the Ceph monitor is displayed as stopped. The two other monitors are working and the cluster is fully operational - we just obviously don't want to leave only two monitors.
We tried to destroy the failed monitor but received an error that it doesn't exist.
We then stopped the service on that node, disabled it and removed the directory from /var/lib/ceph/mon; we also removed the mon entry from ceph.conf and removed the IP from the mon_host line in ceph.conf.
This removed the monitor from the cluster.
We then created a new monitor using pveceph mon create
This updated the ceph.conf, started the service and things should have been good.
However, on the proxmox ceph monitor screen the service is displayed as stopped.
ceph -s gives us
services:
mon: 2 daemons, quorum PVEMTS1,PVEMTS3 (age 9h)
mgr: PVEMTS3(active, since 4w), standbys: PVEMTS1, PVEMTS2
mds: 1/1 daemons up, 1 standby
osd: 15 osds: 15 up (since 40m), 15 in (since 18h)
systemctl status ceph-mon@PVEMTS2 gives us
● ceph-mon@PVEMTS2.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: active (running) since Fri 2023-05-12 11:43:09 EDT; 24min ago
Main PID: 1254978 (ceph-mon)
Tasks: 27
Memory: 93.6M
CPU: 6.864s
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@PVEMTS2.service
└─1254978 /usr/bin/ceph-mon -f --cluster ceph --id PVEMTS2 --setuser ceph --setgroup ceph
May 12 11:43:09 PVEMTS2 systemd[1]: Started Ceph cluster monitor daemon.
So the service appears to be running but something isn't talking.
Any suggestions on where to go from here?