[SOLVED] Duplicated ceph mon, mgr and mds in ceph overview

fluxX04

Renowned Member
Mar 14, 2018
159
72
68
Austria
Hey,

got the same problem as described in this thread (apart from i'm using ipv4):
https://forum.proxmox.com/threads/pve-6-duplicate-monitors-and-managers-in-ceph-overview.56569

Overview:

overview.PNG

the ones wich are active are the hosts with hostname + domain name - the others displays only the hostname and are in status unknown.

Already checked the file system, no services or directories/files leftovers.

ceph mon dump --format json-pretty:
JSON:
root@PVE04:~# ceph mon dump --format json-pretty
dumped monmap epoch 19
{
    "epoch": 19,
    "fsid": "bddf1bb5-e70c-4bcd-b01c-9ae81790995f",
    "modified": "2021-05-17T08:15:08.733940Z",
    "created": "2021-04-29T12:17:06.755588Z",
    "min_mon_release": 15,
    "min_mon_release_name": "octopus",
    "features": {
        "persistent": [
            "kraken",
            "luminous",
            "mimic",
            "osdmap-prune",
            "nautilus",
            "octopus"
        ],
        "optional": []
    },
    "mons": [{
            "rank": 0,
            "name": "PVE05",
            "public_addrs": {
                "addrvec": [{
                        "type": "v2",
                        "addr": "192.168.4.5:3300",
                        "nonce": 0
                    },
                    {
                        "type": "v1",
                        "addr": "192.168.4.5:6789",
                        "nonce": 0
                    }
                ]
            },
            "addr": "192.168.4.5:6789/0",
            "public_addr": "192.168.4.5:6789/0",
            "priority": 0,
            "weight": 0
        },
        {
            "rank": 1,
            "name": "PVE04",
            "public_addrs": {
                "addrvec": [{
                        "type": "v2",
                        "addr": "192.168.4.4:3300",
                        "nonce": 0
                    },
                    {
                        "type": "v1",
                        "addr": "192.168.4.4:6789",
                        "nonce": 0
                    }
                ]
            },
            "addr": "192.168.4.4:6789/0",
            "public_addr": "192.168.4.4:6789/0",
            "priority": 0,
            "weight": 0
        },
        {
            "rank": 2,
            "name": "PVE06",
            "public_addrs": {
                "addrvec": [{
                        "type": "v2",
                        "addr": "192.168.4.6:3300",
                        "nonce": 0
                    },
                    {
                        "type": "v1",
                        "addr": "192.168.4.6:6789",
                        "nonce": 0
                    }
                ]
            },
            "addr": "192.168.4.6:6789/0",
            "public_addr": "192.168.4.6:6789/0",
            "priority": 0,
            "weight": 0
        }
    ],
    "quorum": [
        0,
        1,
        2
    ]
}

On the ceph side it looks all normal.

Response from the api (https://pve01.mydomain.my:8006/api2/json/cluster/ceph/metadata):

node section:
JSON:
"node": {
      "PVE05": {
        "buildcommit": "64a4c04e6850c6d9086e4c37f57c4eada541b05e",
        "version": {
          "str": "15.2.11",
          "parts": ["15", "2", "11"]
        }
      },
      "PVE06": {
        "version": {
          "parts": ["15", "2", "11"],
          "str": "15.2.11"
        },
        "buildcommit": "64a4c04e6850c6d9086e4c37f57c4eada541b05e"
      },
      "PVE04": {
        "buildcommit": "64a4c04e6850c6d9086e4c37f57c4eada541b05e",
        "version": {
          "parts": ["15", "2", "11"],
          "str": "15.2.11"
        }
      },
      ....
    }

mon section:

JSON:
"mon": {
  "PVE05@PVE05": {
    "direxists": 1,
    "service": 1
  },
  "PVE06@PVE06": {
    "direxists": 1,
    "service": 1
  },
  "PVE04@PVE04": {
    "direxists": 1,
    "service": 1
  },
  "PVE05@PVE05.mydomain.my": {
    "compression_algorithms": "none, snappy, zlib, zstd, lz4",
    "kernel_description": "#1 SMP PVE 5.4.114-1 (Sun, 09 May 2021 17:13:05 +0200)",
    "hostname": "PVE05.mydomain.my",
    "device_paths": "",
    "mem_swap_kb": "8388604",
    "mem_total_kb": "65727980",
    "os": "Linux",
    "devices": "",
    "distro_description": "Debian GNU/Linux 10 (buster)",
    "name": "PVE05",
    "device_ids": "",
    "distro_version": "10",
    "ceph_version_short": "15.2.11",
    "addrs": "[v2:192.168.4.5:3300/0,v1:192.168.4.5:6789/0]",
    "ceph_version": "ceph version 15.2.11 (64a4c04e6850c6d9086e4c37f57c4eada541b05e) octopus (stable)",
    "distro": "debian",
    "arch": "x86_64",
    "ceph_release": "octopus",
    "cpu": "AMD EPYC 7232P 8-Core Processor",
    "kernel_version": "5.4.114-1-pve"
  },
  "PVE04@PVE04.mydomain.my": {
    "ceph_release": "octopus",
    "cpu": "AMD EPYC 7232P 8-Core Processor",
    "kernel_version": "5.4.114-1-pve",
    "distro": "debian",
    "arch": "x86_64",
    "distro_version": "10",
    "ceph_version_short": "15.2.11",
    "addrs": "[v2:192.168.4.4:3300/0,v1:192.168.4.4:6789/0]",
    "ceph_version": "ceph version 15.2.11 (64a4c04e6850c6d9086e4c37f57c4eada541b05e) octopus (stable)",
    "device_ids": "",
    "distro_description": "Debian GNU/Linux 10 (buster)",
    "name": "PVE04",
    "hostname": "PVE04.mydomain.my",
    "device_paths": "",
    "mem_swap_kb": "8388604",
    "mem_total_kb": "32746996",
    "os": "Linux",
    "devices": "",
    "kernel_description": "#1 SMP PVE 5.4.114-1 (Sun, 09 May 2021 17:13:05 +0200)",
    "compression_algorithms": "none, snappy, zlib, zstd, lz4"
  },
  "PVE06@PVE06.mydomain.my": {
    "distro_description": "Debian GNU/Linux 10 (buster)",
    "name": "PVE06",
    "kernel_description": "#1 SMP PVE 5.4.114-1 (Sun, 09 May 2021 17:13:05 +0200)",
    "compression_algorithms": "none, snappy, zlib, zstd, lz4",
    "hostname": "PVE06.mydomain.my",
    "device_paths": "",
    "mem_total_kb": "65727980",
    "mem_swap_kb": "8388604",
    "os": "Linux",
    "devices": "",
    "distro": "debian",
    "arch": "x86_64",
    "ceph_release": "octopus",
    "cpu": "AMD EPYC 7232P 8-Core Processor",
    "kernel_version": "5.4.114-1-pve",
    "device_ids": "",
    "distro_version": "10",
    "ceph_version_short": "15.2.11",
    "addrs": "[v2:192.168.4.6:3300/0,v1:192.168.4.6:6789/0]",
    "ceph_version": "ceph version 15.2.11 (64a4c04e6850c6d9086e4c37f57c4eada541b05e) octopus (stable)"
  }
}

The same with 'mds' and 'mgr'. As you can see in the response the host is twice defined - hostname and full fqdn.

Any ideas how to solve this?

Code:
proxmox-ve: 6.4-1 (running kernel: 5.4.114-1-pve)
pve-manager: 6.4-6 (running version: 6.4-6/be2fa32c)
pve-kernel-5.4: 6.4-2
pve-kernel-helper: 6.4-2
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.11-pve1
ceph-fuse: 15.2.11-pve1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.4-1
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-2
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.6-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.5-4
pve-cluster: 6.4-1
pve-container: 3.3-5
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-3
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

Thanks!
Greetz
 
Last edited:
Hi,

ok found the problem, the full fqdn was defined in /etc/hostname. After updating the file with only the hostname and restarting the ceph mon/mgr/mds services the duplicates are gone.

Greetz
 
  • Like
Reactions: Bent
Well, indeed on of the servers includes the domain in the hostname (px1.xxx.com) but after i chnaged it, and restarted the mon on px1 the problem was not solved. Now after a few tries the mon does not run. The config files are the same as before but the ceph-mon does not run.

Any ideas on how to troubleshoot, what is the problem with the monitor?

Thanx,
sp

UPDATE:
The ceph-mon finaly started. I don't really did anything. I was just restarting the service trying to figure out what to do... :)
 
Last edited:
Hi,

ok found the problem, the full fqdn was defined in /etc/hostname. After updating the file with only the hostname and restarting the ceph mon/mgr/mds services the duplicates are gone.

Greetz
Exactly my issue, get yourself a virtual cookie! Thanks :)
 
  • Like
Reactions: fluxX04

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!