[SOLVED] Proxmox VE 6.0: Ceph Nautilus Extraneous Monitors?

mihanson · Jul 20, 2019

Just upgraded a 3 node cluster to PVE 6.0 last night. I followed the excellent upgrade docs for PVE and Ceph Nautilus upgrades. Before the upgrade from 5.4 I had three ceph monitors called 'a', 'b', 'c'. Mon-a is on host pve01; Mon-b is on host pve02; Mon-c is on host pve03. After the upgrade I'm seeing monitors 'a', 'b', 'c' as well as additional monitors named after each host, i.e. mon-pve01, mon-pve02, mon-pve03. See screenshot attached.

Anyone know why I'm seeing this and how can I clean this up? Here is my monmap:

Code:

$ sudo monmaptool --print /tmp/monitor_map.bin
monmaptool: monmap file /tmp/monitor_map.bin
epoch 23
fsid 6f9f5288-2d6f-4ec3-ad8c-eea28685d971
last_changed 2019-07-19 20:13:34.461795
created 2019-04-29 11:19:33.486951
min_mon_release 14 (nautilus)
0: [v2:192.168.10.2:3300/0,v1:192.168.10.2:6789/0] mon.c
1: [v2:192.168.10.3:3300/0,v1:192.168.10.3:6789/0] mon.a
2: [v2:192.168.10.9:3300/0,v1:192.168.10.9:6789/0] mon.b

Here is my /etc/pve/ceph.conf:

Code:

mihanson@pve02:~$ sudo cat /etc/pve/ceph.conf
[global]
     auth client required = cephx
     auth cluster required = cephx
     auth service required = cephx
     cluster network = 192.168.70.0/24
     fsid = 6f9f5288-2d6f-4ec3-ad8c-eea28685d971
     mon allow pool delete = true
     osd journal size = 5120
     osd pool default min size = 2
     osd pool default size = 3
     public network = 192.168.10.0/24
         mon_host = 192.168.10.3,192.168.10.9,192.168.10.2

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring

[osd]
     keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mds.pve01]
     host = pve01
     mds standby for name = pve

[mds.pve02]
     host = pve02
     mds standby for name = pve

[mds.pve03]
     host = pve03
     mds standby for name = pve

[mon]
    mon clock drift warn backoff = 30

#[mon.a]
#     host = pve01
#     mon addr = 192.168.10.3
#
#[mon.b]
#     host = pve02
#     mon addr = 192.168.10.9
#
#[mon.c]
#     host = pve03
#     mon addr = 192.168.10.2

Commenting (#) mons a-c doesn't seem to matter. As-is I see the extra monitor entries and if I uncomment them, nothing seems to change.

The syslog entries for the extra mon.pve01, etc all look similar to this:

-- Logs begin at Fri 2019-07-19 21:06:27 PDT, end at Sat 2019-07-20 13:55:00 PDT. --
Jul 19 22:02:32 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 19 22:02:32 pve01 ceph-mon[25732]: 2019-07-19 22:02:32.305 7f9cf518e3c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 19 22:02:32 pve01 ceph-mon[25732]: 2019-07-19 22:02:32.305 7f9cf518e3c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 19 22:02:32 pve01 ceph-mon[25732]: 2019-07-19 22:02:32.305 7f9cf518e3c0 -1 failed to initialize
Jul 19 22:02:32 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 19 22:02:32 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 19 22:02:42 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 19 22:02:42 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 1.
Jul 19 22:02:42 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 19 22:02:42 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 19 22:02:42 pve01 ceph-mon[26018]: 2019-07-19 22:02:42.449 7f10651eb3c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 19 22:02:42 pve01 ceph-mon[26018]: 2019-07-19 22:02:42.449 7f10651eb3c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 19 22:02:42 pve01 ceph-mon[26018]: 2019-07-19 22:02:42.449 7f10651eb3c0 -1 failed to initialize
Jul 19 22:02:42 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 19 22:02:42 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 19 22:02:52 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 19 22:02:52 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 2.
Jul 19 22:02:52 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 19 22:02:52 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 19 22:02:52 pve01 ceph-mon[26232]: 2019-07-19 22:02:52.697 7f8b085e23c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 19 22:02:52 pve01 ceph-mon[26232]: 2019-07-19 22:02:52.697 7f8b085e23c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 19 22:02:52 pve01 ceph-mon[26232]: 2019-07-19 22:02:52.697 7f8b085e23c0 -1 failed to initialize
Jul 19 22:02:52 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 19 22:02:52 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 19 22:03:02 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 19 22:03:02 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 3.
Jul 19 22:03:02 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 19 22:03:02 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 19 22:03:02 pve01 ceph-mon[26451]: 2019-07-19 22:03:02.953 7f70640693c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 19 22:03:02 pve01 ceph-mon[26451]: 2019-07-19 22:03:02.953 7f70640693c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 19 22:03:02 pve01 ceph-mon[26451]: 2019-07-19 22:03:02.953 7f70640693c0 -1 failed to initialize
Jul 19 22:03:02 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 19 22:03:02 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 19 22:03:13 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 19 22:03:13 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 4.
Jul 19 22:03:13 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 19 22:03:13 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 19 22:03:13 pve01 ceph-mon[26659]: 2019-07-19 22:03:13.197 7f84dc78b3c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 19 22:03:13 pve01 ceph-mon[26659]: 2019-07-19 22:03:13.197 7f84dc78b3c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 19 22:03:13 pve01 ceph-mon[26659]: 2019-07-19 22:03:13.197 7f84dc78b3c0 -1 failed to initialize
Jul 19 22:03:13 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 19 22:03:13 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 19 22:03:23 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 19 22:03:23 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 5.
Jul 19 22:03:23 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 19 22:03:23 pve01 systemd[1]: ceph-mon@pve01.service: Start request repeated too quickly.
Jul 19 22:03:23 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 19 22:03:23 pve01 systemd[1]: Failed to start Ceph cluster monitor daemon.
Jul 20 13:09:44 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:09:44 pve01 ceph-mon[338271]: 2019-07-20 13:09:44.158 7f7b67bbf3c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:09:44 pve01 ceph-mon[338271]: 2019-07-20 13:09:44.158 7f7b67bbf3c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:09:44 pve01 ceph-mon[338271]: 2019-07-20 13:09:44.158 7f7b67bbf3c0 -1 failed to initialize
Jul 20 13:09:44 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:09:44 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:09:54 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:09:54 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 1.
Jul 20 13:09:54 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:09:54 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:09:54 pve01 ceph-mon[338359]: 2019-07-20 13:09:54.494 7fe73ae2d3c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:09:54 pve01 ceph-mon[338359]: 2019-07-20 13:09:54.494 7fe73ae2d3c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:09:54 pve01 ceph-mon[338359]: 2019-07-20 13:09:54.494 7fe73ae2d3c0 -1 failed to initialize
Jul 20 13:09:54 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:09:54 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:10:04 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:10:04 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 2.
Jul 20 13:10:04 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:10:04 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:10:04 pve01 ceph-mon[338447]: 2019-07-20 13:10:04.738 7f8984a3e3c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:10:04 pve01 ceph-mon[338447]: 2019-07-20 13:10:04.738 7f8984a3e3c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:10:04 pve01 ceph-mon[338447]: 2019-07-20 13:10:04.738 7f8984a3e3c0 -1 failed to initialize
Jul 20 13:10:04 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:10:04 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:10:14 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:10:14 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 3.
Jul 20 13:10:14 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:10:14 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:10:14 pve01 ceph-mon[338524]: 2019-07-20 13:10:14.990 7f58f0ae03c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:10:14 pve01 ceph-mon[338524]: 2019-07-20 13:10:14.990 7f58f0ae03c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:10:14 pve01 ceph-mon[338524]: 2019-07-20 13:10:14.990 7f58f0ae03c0 -1 failed to initialize
Jul 20 13:10:15 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:10:15 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:10:25 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:10:25 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 4.
Jul 20 13:10:25 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:10:25 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:10:25 pve01 ceph-mon[338613]: 2019-07-20 13:10:25.242 7f8302f923c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:10:25 pve01 ceph-mon[338613]: 2019-07-20 13:10:25.242 7f8302f923c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:10:25 pve01 ceph-mon[338613]: 2019-07-20 13:10:25.242 7f8302f923c0 -1 failed to initialize
Jul 20 13:10:25 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:10:25 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:10:35 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:10:35 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 5.
Jul 20 13:10:35 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:10:35 pve01 systemd[1]: ceph-mon@pve01.service: Start request repeated too quickly.
Jul 20 13:10:35 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:10:35 pve01 systemd[1]: Failed to start Ceph cluster monitor daemon.
Jul 20 13:37:31 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:37:31 pve01 ceph-mon[346321]: 2019-07-20 13:37:31.854 7f7cf3c533c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:37:31 pve01 ceph-mon[346321]: 2019-07-20 13:37:31.854 7f7cf3c533c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:37:31 pve01 ceph-mon[346321]: 2019-07-20 13:37:31.854 7f7cf3c533c0 -1 failed to initialize
Jul 20 13:37:31 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:37:31 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:37:41 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:37:41 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 1.
Jul 20 13:37:41 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:37:41 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:37:41 pve01 ceph-mon[346414]: 2019-07-20 13:37:41.986 7f5abac613c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:37:41 pve01 ceph-mon[346414]: 2019-07-20 13:37:41.986 7f5abac613c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:37:41 pve01 ceph-mon[346414]: 2019-07-20 13:37:41.986 7f5abac613c0 -1 failed to initialize
Jul 20 13:37:41 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:37:41 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:37:52 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:37:52 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 2.
Jul 20 13:37:52 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:37:52 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:37:52 pve01 ceph-mon[346503]: 2019-07-20 13:37:52.234 7fbb9fe703c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:37:52 pve01 ceph-mon[346503]: 2019-07-20 13:37:52.234 7fbb9fe703c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:37:52 pve01 ceph-mon[346503]: 2019-07-20 13:37:52.234 7fbb9fe703c0 -1 failed to initialize
Jul 20 13:37:52 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:37:52 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:38:02 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:38:02 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 3.
Jul 20 13:38:02 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:38:02 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:38:02 pve01 ceph-mon[346595]: 2019-07-20 13:38:02.506 7f4cd413c3c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:38:02 pve01 ceph-mon[346595]: 2019-07-20 13:38:02.506 7f4cd413c3c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:38:02 pve01 ceph-mon[346595]: 2019-07-20 13:38:02.506 7f4cd413c3c0 -1 failed to initialize
Jul 20 13:38:02 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:38:02 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:38:12 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:38:12 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 4.
Jul 20 13:38:12 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:38:12 pve01 systemd[1]: Started Ceph cluster monitor daemon.
Jul 20 13:38:12 pve01 ceph-mon[346672]: 2019-07-20 13:38:12.738 7feae00213c0 -1 mon.pve01@-1(???) e19 not in monmap and have been in a quorum before; must have been removed
Jul 20 13:38:12 pve01 ceph-mon[346672]: 2019-07-20 13:38:12.738 7feae00213c0 -1 mon.pve01@-1(???) e19 commit suicide!
Jul 20 13:38:12 pve01 ceph-mon[346672]: 2019-07-20 13:38:12.738 7feae00213c0 -1 failed to initialize
Jul 20 13:38:12 pve01 systemd[1]: ceph-mon@pve01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 13:38:12 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:38:22 pve01 systemd[1]: ceph-mon@pve01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 13:38:22 pve01 systemd[1]: ceph-mon@pve01.service: Scheduled restart job, restart counter is at 5.
Jul 20 13:38:22 pve01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 13:38:22 pve01 systemd[1]: ceph-mon@pve01.service: Start request repeated too quickly.
Jul 20 13:38:22 pve01 systemd[1]: ceph-mon@pve01.service: Failed with result 'exit-code'.
Jul 20 13:38:22 pve01 systemd[1]: Failed to start Ceph cluster monitor daemon.

Once upon a time, my monitor names did correspond to hostname, but I had to change my network config following ceph docs and in the process I renamed them to 'a', 'b', 'c'. Is this coming back to bite me? Any help in cleaning up the mess is appreciated. Thank you.

Mike

dcsapak · Jul 22, 2019

maybe you still have the systemd services enabled? if you are sure that you dont need them anymore you can disable them with:

Code:

systemctl disable ceph-mon@pve01.service

e.g. for pve01

mihanson · Jul 22, 2019

dcsapak said:
maybe you still have the systemd services enabled? if you are sure that you dont need them anymore you can disable them with:

Code:

systemctl disable ceph-mon@pve01.service

e.g. for pve01

I just double checked all 3 nodes and I only have the correct systemd services enabled for the monitors (ceph-mon@a.service on pve01; ceph-mon@b.service on pve02; ceph-mon@c.service on pve03). Any other ideas as to where else the extra monitors could be sourced from? I'm only seeing them on the web GUI, so is there a service I can restart that will reset the web GUI?

Code:

$ sudo ceph -s
  cluster:
    id:     6f9f5288-2d6f-4ec3-ad8c-eea28685d971
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum c,a,b (age 13h)
    mgr: pve03(active, since 20h), standbys: pve02, pve01
    mds: pve_cephfs:1 {0=pve03=up:active} 2 up:standby
    osd: 12 osds: 12 up (since 13h), 12 in (since 13h)
 
  data:
    pools:   3 pools, 552 pgs
    objects: 2.98M objects, 10 TiB
    usage:   30 TiB used, 32 TiB / 63 TiB avail
    pgs:     550 active+clean
             2   active+clean+scrubbing+deep
 
  io:
    client:   170 B/s rd, 31 KiB/s wr, 0 op/s rd, 5 op/s wr

dcsapak · Jul 23, 2019

can you please post the output of the following commands (on all of your nodes)

Code:

ls -lh /etc/systemd/system/ceph-mon.target.wants/
ls -lh /var/lib/ceph/mon/

mihanson · Jul 23, 2019

Code:

mihanson@pve01:~$ ls -lah /etc/systemd/system/ceph-mon.target.wants/
total 0
lrwxrwxrwx 1 root root 37 May  3 19:26 ceph-mon@a.service -> /lib/systemd/system/ceph-mon@.service
mihanson@pve01:~$ sudo ls -lah /var/lib/ceph/mon/
[sudo] password for mihanson:
total 16K
drwxr-xr-x  4 ceph ceph 4.0K May  3 18:49 .
drwxr-x--- 14 ceph ceph 4.0K Jul 19 19:59 ..
drwxr-xr-x  3 ceph ceph 4.0K Jul 19 19:59 ceph-a
drwxr-xr-x  3 ceph ceph 4.0K Apr 29 11:33 ceph-pve01

mihanson@pve02:~$ ls -lah /etc/systemd/system/ceph-mon.target.wants/
total 0
lrwxrwxrwx 1 root root 37 May  3 19:26 ceph-mon@b.service -> /lib/systemd/system/ceph-mon@.service
mihanson@pve02:~$ sudo ls -lah /var/lib/ceph/mon/
[sudo] password for mihanson:
total 16K
drwxr-xr-x  4 ceph ceph 4.0K May  3 18:49 .
drwxr-x--- 14 ceph ceph 4.0K Jul 19 19:59 ..
drwxr-xr-x  3 ceph ceph 4.0K Jul 19 19:59 ceph-b
drwxr-xr-x  3 ceph ceph 4.0K Apr 29 11:33 ceph-pve02

mihanson@pve03:~$ ls -lah /etc/systemd/system/ceph-mon.target.wants/
total 8.0K
drwxr-xr-x  2 root root 4.0K May  3 19:28 .
drwxr-xr-x 17 root root 4.0K Jul 19 20:04 ..
lrwxrwxrwx  1 root root   37 May  3 19:28 ceph-mon@c.service -> /lib/systemd/system/ceph-mon@.service
mihanson@pve03:~$ sudo ls -lah /var/lib/ceph/mon/
total 16K
drwxr-xr-x  4 ceph ceph 4.0K May  3 18:51 .
drwxr-x--- 14 ceph ceph 4.0K Jul 19 19:59 ..
drwxr-xr-x  3 ceph ceph 4.0K Jul 19 19:59 ceph-c
drwxr-xr-x  3 ceph ceph 4.0K Apr 29 11:34 ceph-pve03

I'm guessing those ceph-pve0X are the issue?

Romsch · Jul 23, 2019

maybe this? remove the dots . .

Your configfile ceph.conf:

dcsapak · Jul 23, 2019

mihanson said:
I'm guessing those ceph-pve0X are the issue?

yes it seems the old directories did not get cleaned up
if you are sure that those monitors will never be up in the future, you can remove those directories and they should vanish from the web interface

Romsch · Jul 23, 2019

And you have a mistake in the ceph.conf again:

change 70 to 10 and remove the commas (yellow marked)
reboot and it should work

dcsapak · Jul 23, 2019

Romsch said:
change 70 to 10 and remove the commas (yellow marked)

why?, that can be valid if the cluster network is on a different subnet (see his public network)
also the commas are correct syntax

Romsch · Jul 23, 2019

dcsapak said:
why?, that can be valid if the cluster network is on a different subnet (see his public network)
also the commas are correct syntax

I had this "issue" a long time ago, with comma it dont worked (pve 2.x)
ok, i have thought, that the cluster network - this is in our configuration - is the same as the monitors, and the monitors are not in the same cluster network.
After i installed ceph, the ceph cluster is the same as the monitors.

sorry, of course I can be wrong too.
I'll fix that, or rather i can config it like mihanson - and i will see what happens in the test environment

Romsch · Jul 23, 2019

I changed the config, ceph dont work "normal"

In my case, 1pve5to6 ist the active ceph, the first node dont work correct, manual start of osd fails

mihanson · Jul 23, 2019

dcsapak said:
yes it seems the old directories did not get cleaned up
if you are sure that those monitors will never be up in the future, you can remove those directories and they should vanish from the web interface

I removed these old directories and my problem has been solved. Thank you!

Search

Search

[SOLVED] Proxmox VE 6.0: Ceph Nautilus Extraneous Monitors?

mihanson

Active Member

Attachments

dcsapak

Proxmox Staff Member

mihanson

Active Member

dcsapak

Proxmox Staff Member

mihanson

Active Member

Romsch

Well-Known Member

dcsapak

Proxmox Staff Member

Romsch

Well-Known Member

dcsapak

Proxmox Staff Member

Romsch

Well-Known Member

Romsch

Well-Known Member

mihanson

Active Member