pveceph monitor problem

Nov 23, 2023
12
1
3
Hi,

i have a problem, i can not re-add a mon in a 4 node cluster (all nodes have/had a running monitor).

The monitor was deleted because it was in the (overview) list twice, once with a green tickmark, once with a grey questionmark.

First it could not be deleted properly, it was still in the list (Node -> Ceph -> Monitor) but could not be deleted from there, with a lot of research i solved that problem

- delete mon from ceph.conf
- remove /var/lib/ceph/mon/$HOSTNAME
- disable systemd unit

after that i can add the mon again, but it stays in "stopped" and the systemd won't start

the logoutput from one systemd unit start attempt (hostnames were redacted), i can not see an error.

One thing - the node that has a problem now was updated in between, all other nodes are proxmox 8.0.4, this one is 8.1.3 - can that be the problem and we need to update the other nodes before continuing ?

Cheers
Soeren
 

Attachments

  • mon.log
    26.2 KB · Views: 2
i can not re-add a mon in a 4 node cluster (all nodes have/had a running monitor).
Never an even number of Mon / Mgr or HA nodes for PVE, always only odd. You won't get a clear majority like that!

Can you please give us an overall overview of the current status. So logs, PVE interface, CEPH status etc. pp.
 
Below the pveceph status, i am not sure what you mean by "PVE Interface", the network configuration ?
And which logfiles would help ? all from /var/log/ceph on the respective node ?

PVECEPH STATUS

cluster:
id: CLUSTERID
health: HEALTH_OK

services:
mon: 3 daemons, quorum MON-01,MON-02,MON-03 (age 2h)
mgr: MON-03(active, since 3h), standbys: MGR-01, MGR-02, MGR-04
osd: 32 osds: 32 up (since 2h), 32 in (since 3M)

data:
pools: 2 pools, 513 pgs
objects: 2.87M objects, 11 TiB
usage: 30 TiB used, 82 TiB / 112 TiB avail
pgs: 513 active+clean

io:
client: 2.4 MiB/s rd, 26 MiB/s wr, 114 op/s rd, 4.09k op/s wr
 
Below the pveceph status, i am not sure what you mean by "PVE Interface", the network configuration ?
You said you've seen the post several times. I would like to know what it looks like in the interface now, how PVE shows you the CEPH status.

And which logfiles would help ? all from /var/log/ceph on the respective node ?
We don't know anything about your setup, so help us and give us any information that you have available and might be relevant. Otherwise, adequate help is not possible.
health: HEALTH_OK

services:
mon: 3 daemons, quorum MON-01,MON-02,MON-03 (age 2h)
mgr: MON-03(active, since 3h), standbys: MGR-01, MGR-02, MGR-04
You should also delete MGR-04. Otherwise your CEPH looks good. Please definitely don't install it for another month. It also makes no sense to have more than 3 MGR / Mons in your cluster, this can lead to performance impact.
 
Network:

All hosts have an open-vswitch and one bridge connecting the LACP interface and the ovsint port

xxx.xxx.2.51
xxx.xxx.2.52
xxx.xxx.2.53
xxx.xxx.2.54

the DNS/hosts entries are for these IPS

Then there is a second bridge interface with an IP address

xxx.xxx.9.51
xxx.xxx.9.52
xxx.xxx.9.53
xxx.xxx.9.54

which is used for storage traffic and on which the MONs are configured
 
You said you've seen the post several times. I would like to know what it looks like in the interface now, how PVE shows you the CEPH status.


We don't know anything about your setup, so help us and give us any information that you have available and might be relevant. Otherwise, adequate help is not possible.

You should also delete MGR-04. Otherwise your CEPH looks good. Please definitely don't install it for another month. It also makes no sense to have more than 3 MGR / Mons in your cluster, this can lead to performance impact.
:) - ok i was totally on the wrong track with "interface" sorry.

Anyways, if it is not recommended or better discouraged to have 4 MONs/MGRs then i will just delete the last MGR and leave it at that.
Once i have more free hardware i will setup a similar cluster and test add and delete the MONs to learn better about the behaviour

Thanks alot for the fast answers and help.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!