[SOLVED] Proxmox VE 6.0: Ceph Nautilus Extraneous Monitors?

Discussion in 'Proxmox VE: Installation and configuration' started by mihanson, Jul 20, 2019.

  1. mihanson

    mihanson New Member

    Joined:
    Nov 1, 2018
    Messages:
    19
    Likes Received:
    0
    Just upgraded a 3 node cluster to PVE 6.0 last night. I followed the excellent upgrade docs for PVE and Ceph Nautilus upgrades. Before the upgrade from 5.4 I had three ceph monitors called 'a', 'b', 'c'. Mon-a is on host pve01; Mon-b is on host pve02; Mon-c is on host pve03. After the upgrade I'm seeing monitors 'a', 'b', 'c' as well as additional monitors named after each host, i.e. mon-pve01, mon-pve02, mon-pve03. See screenshot attached.

    Anyone know why I'm seeing this and how can I clean this up? Here is my monmap:
    Code:
    $ sudo monmaptool --print /tmp/monitor_map.bin
    monmaptool: monmap file /tmp/monitor_map.bin
    epoch 23
    fsid 6f9f5288-2d6f-4ec3-ad8c-eea28685d971
    last_changed 2019-07-19 20:13:34.461795
    created 2019-04-29 11:19:33.486951
    min_mon_release 14 (nautilus)
    0: [v2:192.168.10.2:3300/0,v1:192.168.10.2:6789/0] mon.c
    1: [v2:192.168.10.3:3300/0,v1:192.168.10.3:6789/0] mon.a
    2: [v2:192.168.10.9:3300/0,v1:192.168.10.9:6789/0] mon.b
    Here is my /etc/pve/ceph.conf:
    Code:
    mihanson@pve02:~$ sudo cat /etc/pve/ceph.conf
    [global]
         auth client required = cephx
         auth cluster required = cephx
         auth service required = cephx
         cluster network = 192.168.70.0/24
         fsid = 6f9f5288-2d6f-4ec3-ad8c-eea28685d971
         mon allow pool delete = true
         osd journal size = 5120
         osd pool default min size = 2
         osd pool default size = 3
         public network = 192.168.10.0/24
             mon_host = 192.168.10.3,192.168.10.9,192.168.10.2
    
    [client]
         keyring = /etc/pve/priv/$cluster.$name.keyring
    
    [mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring
    
    [osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring
    
    [mds.pve01]
         host = pve01
         mds standby for name = pve
    
    [mds.pve02]
         host = pve02
         mds standby for name = pve
    
    [mds.pve03]
         host = pve03
         mds standby for name = pve
    
    [mon]
        mon clock drift warn backoff = 30
    
    #[mon.a]
    #     host = pve01
    #     mon addr = 192.168.10.3
    #
    #[mon.b]
    #     host = pve02
    #     mon addr = 192.168.10.9
    #
    #[mon.c]
    #     host = pve03
    #     mon addr = 192.168.10.2
    Commenting (#) mons a-c doesn't seem to matter. As-is I see the extra monitor entries and if I uncomment them, nothing seems to change.

    The syslog entries for the extra mon.pve01, etc all look similar to this:
    Once upon a time, my monitor names did correspond to hostname, but I had to change my network config following ceph docs and in the process I renamed them to 'a', 'b', 'c'. Is this coming back to bite me? Any help in cleaning up the mess is appreciated. Thank you.

    Mike
     

    Attached Files:

  2. dcsapak

    dcsapak Proxmox Staff Member
    Staff Member

    Joined:
    Feb 1, 2016
    Messages:
    3,700
    Likes Received:
    338
    maybe you still have the systemd services enabled? if you are sure that you dont need them anymore you can disable them with:
    Code:
    systemctl disable ceph-mon@pve01.service
    
    e.g. for pve01
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. mihanson

    mihanson New Member

    Joined:
    Nov 1, 2018
    Messages:
    19
    Likes Received:
    0
    I just double checked all 3 nodes and I only have the correct systemd services enabled for the monitors (ceph-mon@a.service on pve01; ceph-mon@b.service on pve02; ceph-mon@c.service on pve03). Any other ideas as to where else the extra monitors could be sourced from? I'm only seeing them on the web GUI, so is there a service I can restart that will reset the web GUI?
    Code:
    $ sudo ceph -s
      cluster:
        id:     6f9f5288-2d6f-4ec3-ad8c-eea28685d971
        health: HEALTH_OK
     
      services:
        mon: 3 daemons, quorum c,a,b (age 13h)
        mgr: pve03(active, since 20h), standbys: pve02, pve01
        mds: pve_cephfs:1 {0=pve03=up:active} 2 up:standby
        osd: 12 osds: 12 up (since 13h), 12 in (since 13h)
     
      data:
        pools:   3 pools, 552 pgs
        objects: 2.98M objects, 10 TiB
        usage:   30 TiB used, 32 TiB / 63 TiB avail
        pgs:     550 active+clean
                 2   active+clean+scrubbing+deep
     
      io:
        client:   170 B/s rd, 31 KiB/s wr, 0 op/s rd, 5 op/s wr
     
  4. dcsapak

    dcsapak Proxmox Staff Member
    Staff Member

    Joined:
    Feb 1, 2016
    Messages:
    3,700
    Likes Received:
    338
    can you please post the output of the following commands (on all of your nodes)

    Code:
    ls -lh /etc/systemd/system/ceph-mon.target.wants/
    ls -lh /var/lib/ceph/mon/
    
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. mihanson

    mihanson New Member

    Joined:
    Nov 1, 2018
    Messages:
    19
    Likes Received:
    0
    Code:
    mihanson@pve01:~$ ls -lah /etc/systemd/system/ceph-mon.target.wants/
    total 0
    lrwxrwxrwx 1 root root 37 May  3 19:26 ceph-mon@a.service -> /lib/systemd/system/ceph-mon@.service
    mihanson@pve01:~$ sudo ls -lah /var/lib/ceph/mon/
    [sudo] password for mihanson:
    total 16K
    drwxr-xr-x  4 ceph ceph 4.0K May  3 18:49 .
    drwxr-x--- 14 ceph ceph 4.0K Jul 19 19:59 ..
    drwxr-xr-x  3 ceph ceph 4.0K Jul 19 19:59 ceph-a
    drwxr-xr-x  3 ceph ceph 4.0K Apr 29 11:33 ceph-pve01
    
    mihanson@pve02:~$ ls -lah /etc/systemd/system/ceph-mon.target.wants/
    total 0
    lrwxrwxrwx 1 root root 37 May  3 19:26 ceph-mon@b.service -> /lib/systemd/system/ceph-mon@.service
    mihanson@pve02:~$ sudo ls -lah /var/lib/ceph/mon/
    [sudo] password for mihanson:
    total 16K
    drwxr-xr-x  4 ceph ceph 4.0K May  3 18:49 .
    drwxr-x--- 14 ceph ceph 4.0K Jul 19 19:59 ..
    drwxr-xr-x  3 ceph ceph 4.0K Jul 19 19:59 ceph-b
    drwxr-xr-x  3 ceph ceph 4.0K Apr 29 11:33 ceph-pve02
    
    mihanson@pve03:~$ ls -lah /etc/systemd/system/ceph-mon.target.wants/
    total 8.0K
    drwxr-xr-x  2 root root 4.0K May  3 19:28 .
    drwxr-xr-x 17 root root 4.0K Jul 19 20:04 ..
    lrwxrwxrwx  1 root root   37 May  3 19:28 ceph-mon@c.service -> /lib/systemd/system/ceph-mon@.service
    mihanson@pve03:~$ sudo ls -lah /var/lib/ceph/mon/
    total 16K
    drwxr-xr-x  4 ceph ceph 4.0K May  3 18:51 .
    drwxr-x--- 14 ceph ceph 4.0K Jul 19 19:59 ..
    drwxr-xr-x  3 ceph ceph 4.0K Jul 19 19:59 ceph-c
    drwxr-xr-x  3 ceph ceph 4.0K Apr 29 11:34 ceph-pve03
    I'm guessing those ceph-pve0X are the issue?
     
  6. Romsch

    Romsch Member
    Proxmox Subscriber

    Joined:
    Feb 14, 2019
    Messages:
    76
    Likes Received:
    2
    maybe this? remove the dots . .

    Your configfile ceph.conf:
    upload_2019-7-23_16-38-46.png
     
  7. dcsapak

    dcsapak Proxmox Staff Member
    Staff Member

    Joined:
    Feb 1, 2016
    Messages:
    3,700
    Likes Received:
    338
    yes it seems the old directories did not get cleaned up
    if you are sure that those monitors will never be up in the future, you can remove those directories and they should vanish from the web interface
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  8. Romsch

    Romsch Member
    Proxmox Subscriber

    Joined:
    Feb 14, 2019
    Messages:
    76
    Likes Received:
    2
    And you have a mistake in the ceph.conf again:

    change 70 to 10 and remove the commas (yellow marked)
    reboot and it should work
    upload_2019-7-23_16-53-14.png
     
  9. dcsapak

    dcsapak Proxmox Staff Member
    Staff Member

    Joined:
    Feb 1, 2016
    Messages:
    3,700
    Likes Received:
    338
    why?, that can be valid if the cluster network is on a different subnet (see his public network)
    also the commas are correct syntax
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  10. Romsch

    Romsch Member
    Proxmox Subscriber

    Joined:
    Feb 14, 2019
    Messages:
    76
    Likes Received:
    2

    I had this "issue" a long time ago, with comma it dont worked (pve 2.x)
    ok, i have thought, that the cluster network - this is in our configuration - is the same as the monitors, and the monitors are not in the same cluster network.
    After i installed ceph, the ceph cluster is the same as the monitors.
    upload_2019-7-23_17-3-43.png


    sorry, of course I can be wrong too.
    I'll fix that, or rather i can config it like mihanson - and i will see what happens in the test environment
     
  11. Romsch

    Romsch Member
    Proxmox Subscriber

    Joined:
    Feb 14, 2019
    Messages:
    76
    Likes Received:
    2
    I changed the config, ceph dont work "normal"
    upload_2019-7-23_17-9-50.png
    In my case, 1pve5to6 ist the active ceph, the first node dont work correct, manual start of osd fails
    upload_2019-7-23_17-10-44.png
     
  12. mihanson

    mihanson New Member

    Joined:
    Nov 1, 2018
    Messages:
    19
    Likes Received:
    0
    I removed these old directories and my problem has been solved. Thank you!
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice