[SOLVED] PVE 6: Duplicate monitors and managers in ceph overview

Leah

Active Member
Aug 1, 2019
53
2
28
Hey,

I've a fresh proxmox 6.0.5 cluster installation with no changes beside cluster creation, installation of ceph (via gui), and the addition of two more monitors and managers (also via gui). The list under Ceph -> Monitor seems correct but in the overview every monitor and manager has a duplicate with an unknown state.

The ms_bind values seems also correct:
Code:
ms_bind_ipv4 = false
ms_bind_ipv6 = true

And there is only one manager and monitor service running:
Code:
             ├─system-ceph\x2dmgr.slice
             │ └─ceph-mgr@node5.service
             │   └─5436 /usr/bin/ceph-mgr -f --cluster ceph --id node5 --setuser ceph --setgroup ceph
             ├─system-ceph\x2dmon.slice
             │ └─ceph-mon@node5.service
             │   └─8249 /usr/bin/ceph-mon -f --cluster ceph --id node5 --setuser ceph --setgroup ceph

Maybe someone has an idea whats the source of this glitch.
 

Attachments

  • Screenshot2019-08-01_14-07-54.png
    Screenshot2019-08-01_14-07-54.png
    10.3 KB · Views: 58
  • Screenshot2019-08-01_14-23-38.png
    Screenshot2019-08-01_14-23-38.png
    57.1 KB · Views: 57
Hey,

I've a fresh proxmox 6.0.5 cluster installation with no changes beside cluster creation, installation of ceph (via gui), and the addition of two more monitors and managers (also via gui). The list under Ceph -> Monitor seems correct but in the overview every monitor and manager has a duplicate with an unknown state.

The ms_bind values seems also correct:
Code:
ms_bind_ipv4 = false
ms_bind_ipv6 = true

And there is only one manager and monitor service running:
Code:
             ├─system-ceph\x2dmgr.slice
             │ └─ceph-mgr@node5.service
             │   └─5436 /usr/bin/ceph-mgr -f --cluster ceph --id node5 --setuser ceph --setgroup ceph
             ├─system-ceph\x2dmon.slice
             │ └─ceph-mon@node5.service
             │   └─8249 /usr/bin/ceph-mon -f --cluster ceph --id node5 --setuser ceph --setgroup ceph

Maybe someone has an idea whats the source of this glitch.

Some more information that could be helpful to look at, the output of these commands:

Code:
ceph versions
ceph mon dump --format json-pretty
ceph config dump
cat /etc/pve/ceph.conf
 
and also "ls -lh /etc/systemd/system/ceph-*.target.wants/*"
 
ceph versions:
Code:
root@node5:~# ceph versions
{
    "mon": {
        "ceph version 14.2.1 (9257126ffb439de1652793b3e29f4c0b97a47b47) nautilus (stable)": 3
    },
    "mgr": {
        "ceph version 14.2.1 (9257126ffb439de1652793b3e29f4c0b97a47b47) nautilus (stable)": 3
    },
    "osd": {},
    "mds": {},
    "overall": {
        "ceph version 14.2.1 (9257126ffb439de1652793b3e29f4c0b97a47b47) nautilus (stable)": 6
    }
}

ceph mon dump --format json-pretty:

Code:
dumped monmap epoch 3

{
    "epoch": 3,
    "fsid": "fc23f0e8-3c61-470c-a810-cdf17fac8fd2",
    "modified": "2019-08-01 14:07:11.708406",
    "created": "2019-08-01 14:04:07.187147",
    "min_mon_release": 14,
    "min_mon_release_name": "nautilus",
    "features": {
        "persistent": [
            "kraken",
            "luminous",
            "mimic",
            "osdmap-prune",
            "nautilus"
        ],
        "optional": []
    },
    "mons": [
        {
            "rank": 0,
            "name": "node5",
            "public_addrs": {
                "addrvec": [
                    {
                        "type": "v2",
                        "addr": "[2a0b:20c0:205:401::1]:3300",
                        "nonce": 0
                    },
                    {
                        "type": "v1",
                        "addr": "[2a0b:20c0:205:401::1]:6789",
                        "nonce": 0
                    }
                ]
            },
            "addr": "[2a0b:20c0:205:401::1]:6789/0",
            "public_addr": "[2a0b:20c0:205:401::1]:6789/0"
        },
        {
            "rank": 1,
            "name": "node6",
            "public_addrs": {
                "addrvec": [
                    {
                        "type": "v2",
                        "addr": "[2a0b:20c0:205:401::2]:3300",
                        "nonce": 0
                    },
                    {
                        "type": "v1",
                        "addr": "[2a0b:20c0:205:401::2]:6789",
                        "nonce": 0
                    }
                ]
            },
            "addr": "[2a0b:20c0:205:401::2]:6789/0",
            "public_addr": "[2a0b:20c0:205:401::2]:6789/0"
        },
        {
            "rank": 2,
            "name": "node7",
            "public_addrs": {
                "addrvec": [
                    {
                        "type": "v2",
                        "addr": "[2a0b:20c0:205:401::3]:3300",
                        "nonce": 0
                    },
                    {
                        "type": "v1",
                        "addr": "[2a0b:20c0:205:401::3]:6789",
                        "nonce": 0
                    }
                ]
            },
            "addr": "[2a0b:20c0:205:401::3]:6789/0",
            "public_addr": "[2a0b:20c0:205:401::3]:6789/0"
        }
    ],
    "quorum": [
        0,
        1,
        2
    ]
}


Code:
root@node5:~# ceph config dump
WHO MASK LEVEL OPTION VALUE RO
Code:
root@node5:~# cat /etc/pve/ceph.conf
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 2a0b:20c0:205:401::1/64
     fsid = fc23f0e8-3c61-470c-a810-cdf17fac8fd2
     mon_allow_pool_delete = true
     mon_host = 2a0b:20c0:205:401::1 2a0b:20c0:205:401::2 2a0b:20c0:205:401::3
     ms_bind_ipv4 = false
     ms_bind_ipv6 = true
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 2a0b:20c0:205:401::1/64

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

Code:
root@node5:~# ls -lh /etc/systemd/system/ceph-*.target.wants/*
lrwxrwxrwx 1 root root 37 Aug  1 14:04 /etc/systemd/system/ceph-mgr.target.wants/ceph-mgr@node5.service -> /lib/systemd/system/ceph-mgr@.service
lrwxrwxrwx 1 root root 37 Aug  1 14:04 /etc/systemd/system/ceph-mon.target.wants/ceph-mon@node5.service -> /lib/systemd/system/ceph-mon@.service
 
Last edited:
Ok, I think I've solved the problem. This is a display problem only and it happens if your hostname is set to the FQDN instead of the local part only. After fixing this and rebooting the host the duplicate monitors and managers disappeared.
 
Ok, I think I've solved the problem. This is a display problem only and it happens if your hostname is set to the FQDN instead of the local part only. After fixing this and rebooting the host the duplicate monitors and managers disappeared.

I'm running into something similar. I seem to have an "unknown monitor" displayed in the gui for one of my nodes that is NOT actually designated as a monitor. It also has a duplicate. Also having a bit of a hard time understanding the solution. Are you saying if you ssh into the node, you run "hostname" from the cli and it returns as FQDN instead of short name that is related?

Mine doesn't show FQDN.

Thanks
<D>
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!