Just upgraded a 3 node cluster to PVE 6.0 last night. I followed the excellent upgrade docs for PVE and Ceph Nautilus upgrades. Before the upgrade from 5.4 I had three ceph monitors called 'a', 'b', 'c'. Mon-a is on host pve01; Mon-b is on host pve02; Mon-c is on host pve03. After the upgrade I'm seeing monitors 'a', 'b', 'c' as well as additional monitors named after each host, i.e. mon-pve01, mon-pve02, mon-pve03. See screenshot attached.
Anyone know why I'm seeing this and how can I clean this up? Here is my monmap:
Here is my /etc/pve/ceph.conf:
Commenting (#) mons a-c doesn't seem to matter. As-is I see the extra monitor entries and if I uncomment them, nothing seems to change.
The syslog entries for the extra mon.pve01, etc all look similar to this:
Once upon a time, my monitor names did correspond to hostname, but I had to change my network config following ceph docs and in the process I renamed them to 'a', 'b', 'c'. Is this coming back to bite me? Any help in cleaning up the mess is appreciated. Thank you.
Mike
Anyone know why I'm seeing this and how can I clean this up? Here is my monmap:
Code:
$ sudo monmaptool --print /tmp/monitor_map.bin
monmaptool: monmap file /tmp/monitor_map.bin
epoch 23
fsid 6f9f5288-2d6f-4ec3-ad8c-eea28685d971
last_changed 2019-07-19 20:13:34.461795
created 2019-04-29 11:19:33.486951
min_mon_release 14 (nautilus)
0: [v2:192.168.10.2:3300/0,v1:192.168.10.2:6789/0] mon.c
1: [v2:192.168.10.3:3300/0,v1:192.168.10.3:6789/0] mon.a
2: [v2:192.168.10.9:3300/0,v1:192.168.10.9:6789/0] mon.b
Here is my /etc/pve/ceph.conf:
Code:
mihanson@pve02:~$ sudo cat /etc/pve/ceph.conf
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 192.168.70.0/24
fsid = 6f9f5288-2d6f-4ec3-ad8c-eea28685d971
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
public network = 192.168.10.0/24
mon_host = 192.168.10.3,192.168.10.9,192.168.10.2
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
[mds.pve01]
host = pve01
mds standby for name = pve
[mds.pve02]
host = pve02
mds standby for name = pve
[mds.pve03]
host = pve03
mds standby for name = pve
[mon]
mon clock drift warn backoff = 30
#[mon.a]
# host = pve01
# mon addr = 192.168.10.3
#
#[mon.b]
# host = pve02
# mon addr = 192.168.10.9
#
#[mon.c]
# host = pve03
# mon addr = 192.168.10.2
Commenting (#) mons a-c doesn't seem to matter. As-is I see the extra monitor entries and if I uncomment them, nothing seems to change.
The syslog entries for the extra mon.pve01, etc all look similar to this:
-- Logs begin at Fri 2019-07-19 21:06:27 PDT, end at Sat 2019-07-20 13:55:00 PDT. --
Jul 19 22:02:32 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 19 22:02:32 pve01 ceph-mon[25732]: 2019-07-19 22:02:32.305 7f9cf518e3c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 19 22:02:32 pve01 ceph-mon[25732]: 2019-07-19 22:02:32.305 7f9cf518e3c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 19 22:02:32 pve01 ceph-mon[25732]: 2019-07-19 22:02:32.305 7f9cf518e3c0 -1 failed to initialize
Jul 19 22:02:32 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 19 22:02:32 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 19 22:02:42 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 19 22:02:42 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 1.
Jul 19 22:02:42 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 19 22:02:42 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 19 22:02:42 pve01 ceph-mon[26018]: 2019-07-19 22:02:42.449 7f10651eb3c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 19 22:02:42 pve01 ceph-mon[26018]: 2019-07-19 22:02:42.449 7f10651eb3c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 19 22:02:42 pve01 ceph-mon[26018]: 2019-07-19 22:02:42.449 7f10651eb3c0 -1 failed to initialize
Jul 19 22:02:42 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 19 22:02:42 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 19 22:02:52 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 19 22:02:52 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 2.
Jul 19 22:02:52 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 19 22:02:52 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 19 22:02:52 pve01 ceph-mon[26232]: 2019-07-19 22:02:52.697 7f8b085e23c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 19 22:02:52 pve01 ceph-mon[26232]: 2019-07-19 22:02:52.697 7f8b085e23c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 19 22:02:52 pve01 ceph-mon[26232]: 2019-07-19 22:02:52.697 7f8b085e23c0 -1 failed to initialize
Jul 19 22:02:52 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 19 22:02:52 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 19 22:03:02 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 19 22:03:02 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 3.
Jul 19 22:03:02 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 19 22:03:02 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 19 22:03:02 pve01 ceph-mon[26451]: 2019-07-19 22:03:02.953 7f70640693c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 19 22:03:02 pve01 ceph-mon[26451]: 2019-07-19 22:03:02.953 7f70640693c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 19 22:03:02 pve01 ceph-mon[26451]: 2019-07-19 22:03:02.953 7f70640693c0 -1 failed to initialize
Jul 19 22:03:02 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 19 22:03:02 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 19 22:03:13 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 19 22:03:13 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 4.
Jul 19 22:03:13 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 19 22:03:13 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 19 22:03:13 pve01 ceph-mon[26659]: 2019-07-19 22:03:13.197 7f84dc78b3c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 19 22:03:13 pve01 ceph-mon[26659]: 2019-07-19 22:03:13.197 7f84dc78b3c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 19 22:03:13 pve01 ceph-mon[26659]: 2019-07-19 22:03:13.197 7f84dc78b3c0 -1 failed to initialize
Jul 19 22:03:13 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 19 22:03:13 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 19 22:03:23 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 19 22:03:23 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 5.
Jul 19 22:03:23 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 19 22:03:23 pve01 systemd[1]: ceph-mon@pve01.service: Start request repeated too quickly.
Jul 19 22:03:23 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 19 22:03:23 pve01 systemd[1]: Failed to start Ceph cluster monitor daemon.
Jul 20 13:09:44 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:09:44 pve01 ceph-mon[338271]: 2019-07-20 13:09:44.158 7f7b67bbf3c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:09:44 pve01 ceph-mon[338271]: 2019-07-20 13:09:44.158 7f7b67bbf3c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:09:44 pve01 ceph-mon[338271]: 2019-07-20 13:09:44.158 7f7b67bbf3c0 -1 failed to initialize
Jul 20 13:09:44 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:09:44 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:09:54 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:09:54 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 1.
Jul 20 13:09:54 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:09:54 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:09:54 pve01 ceph-mon[338359]: 2019-07-20 13:09:54.494 7fe73ae2d3c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:09:54 pve01 ceph-mon[338359]: 2019-07-20 13:09:54.494 7fe73ae2d3c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:09:54 pve01 ceph-mon[338359]: 2019-07-20 13:09:54.494 7fe73ae2d3c0 -1 failed to initialize
Jul 20 13:09:54 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:09:54 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:10:04 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:10:04 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 2.
Jul 20 13:10:04 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:10:04 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:10:04 pve01 ceph-mon[338447]: 2019-07-20 13:10:04.738 7f8984a3e3c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:10:04 pve01 ceph-mon[338447]: 2019-07-20 13:10:04.738 7f8984a3e3c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:10:04 pve01 ceph-mon[338447]: 2019-07-20 13:10:04.738 7f8984a3e3c0 -1 failed to initialize
Jul 20 13:10:04 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:10:04 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:10:14 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:10:14 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 3.
Jul 20 13:10:14 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:10:14 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:10:14 pve01 ceph-mon[338524]: 2019-07-20 13:10:14.990 7f58f0ae03c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:10:14 pve01 ceph-mon[338524]: 2019-07-20 13:10:14.990 7f58f0ae03c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:10:14 pve01 ceph-mon[338524]: 2019-07-20 13:10:14.990 7f58f0ae03c0 -1 failed to initialize
Jul 20 13:10:15 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:10:15 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:10:25 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:10:25 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 4.
Jul 20 13:10:25 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:10:25 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:10:25 pve01 ceph-mon[338613]: 2019-07-20 13:10:25.242 7f8302f923c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:10:25 pve01 ceph-mon[338613]: 2019-07-20 13:10:25.242 7f8302f923c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:10:25 pve01 ceph-mon[338613]: 2019-07-20 13:10:25.242 7f8302f923c0 -1 failed to initialize
Jul 20 13:10:25 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:10:25 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:10:35 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:10:35 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 5.
Jul 20 13:10:35 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:10:35 pve01 systemd[1]: ceph-mon@pve01.service: Start request repeated too quickly.
Jul 20 13:10:35 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:10:35 pve01 systemd[1]: Failed to start Ceph cluster monitor daemon.
Jul 20 13:37:31 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:37:31 pve01 ceph-mon[346321]: 2019-07-20 13:37:31.854 7f7cf3c533c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:37:31 pve01 ceph-mon[346321]: 2019-07-20 13:37:31.854 7f7cf3c533c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:37:31 pve01 ceph-mon[346321]: 2019-07-20 13:37:31.854 7f7cf3c533c0 -1 failed to initialize
Jul 20 13:37:31 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:37:31 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:37:41 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:37:41 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 1.
Jul 20 13:37:41 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:37:41 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:37:41 pve01 ceph-mon[346414]: 2019-07-20 13:37:41.986 7f5abac613c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:37:41 pve01 ceph-mon[346414]: 2019-07-20 13:37:41.986 7f5abac613c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:37:41 pve01 ceph-mon[346414]: 2019-07-20 13:37:41.986 7f5abac613c0 -1 failed to initialize
Jul 20 13:37:41 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:37:41 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:37:52 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:37:52 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 2.
Jul 20 13:37:52 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:37:52 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:37:52 pve01 ceph-mon[346503]: 2019-07-20 13:37:52.234 7fbb9fe703c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:37:52 pve01 ceph-mon[346503]: 2019-07-20 13:37:52.234 7fbb9fe703c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:37:52 pve01 ceph-mon[346503]: 2019-07-20 13:37:52.234 7fbb9fe703c0 -1 failed to initialize
Jul 20 13:37:52 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:37:52 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:38:02 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:38:02 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 3.
Jul 20 13:38:02 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:38:02 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:38:02 pve01 ceph-mon[346595]: 2019-07-20 13:38:02.506 7f4cd413c3c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:38:02 pve01 ceph-mon[346595]: 2019-07-20 13:38:02.506 7f4cd413c3c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:38:02 pve01 ceph-mon[346595]: 2019-07-20 13:38:02.506 7f4cd413c3c0 -1 failed to initialize
Jul 20 13:38:02 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:38:02 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:38:12 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:38:12 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 4.
Jul 20 13:38:12 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:38:12 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:38:12 pve01 ceph-mon[346672]: 2019-07-20 13:38:12.738 7feae00213c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:38:12 pve01 ceph-mon[346672]: 2019-07-20 13:38:12.738 7feae00213c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:38:12 pve01 ceph-mon[346672]: 2019-07-20 13:38:12.738 7feae00213c0 -1 failed to initialize
Jul 20 13:38:12 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:38:12 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:38:22 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:38:22 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 5.
Jul 20 13:38:22 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:38:22 pve01 systemd[1]: ceph-mon@pve01.service: Start request repeated too quickly.
Jul 20 13:38:22 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:38:22 pve01 systemd[1]: Failed to start Ceph cluster monitor daemon.
Once upon a time, my monitor names did correspond to hostname, but I had to change my network config following ceph docs and in the process I renamed them to 'a', 'b', 'c'. Is this coming back to bite me? Any help in cleaning up the mess is appreciated. Thank you.
Mike