Can't add monitor

Kaboom

Member
Mar 5, 2019
96
9
8
48
Dear All,

When i want to start a new monitor that I just added on node005, I get this error:

Mar 23 14:53:13 node005 systemd[1]: Started Ceph cluster monitor daemon.
Mar 23 14:53:14 node005 ceph-mon[3561]: 2020-03-23 14:53:14.762 7f58ca8e8700 -1 mon.node005@-1(probing) e0 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied

Does this ring a bell?

Thanks in advance!
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
3,867
361
88
Mar 23 14:53:13 node005 systemd[1]: Started Ceph cluster monitor daemon.
Mar 23 14:53:14 node005 ceph-mon[3561]: 2020-03-23 14:53:14.762 7f58ca8e8700 -1 mon.node005@-1(probing) e0 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
How did you add the MON?
 

Kaboom

Member
Mar 5, 2019
96
9
8
48
I added this monitor with the GUI (running latest Proxmox and Ceph versions).
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
3,867
361
88
Can you please post a ceph -s and the ceph.conf?
 

Kaboom

Member
Mar 5, 2019
96
9
8
48

ceph -s
cluster:
id: 09935360-cfe7-48d4-ac76-c02e0fdd95de
health: HEALTH_WARN
1 daemons have recently crashed

services:
mon: 2 daemons, quorum node003,node002 (age 2d)
mgr: node002(active, since 2d), standbys: node003, node004
osd: 36 osds: 36 up (since 10h), 36 in (since 4M)

data:
pools: 1 pools, 1024 pgs
objects: 842.82k objects, 3.0 TiB
usage: 8.2 TiB used, 7.5 TiB / 16 TiB avail
pgs: 1024 active+clean

io:
client: 91 MiB/s rd, 14 MiB/s wr, 2.13k op/s rd, 583 op/s wr
 

Kaboom

Member
Mar 5, 2019
96
9
8
48

cat ceph.conf
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.0.1.0/24
fsid = 09657777360-cfe7-48764-ac76-c02e4566
mon_allow_pool_delete = true
mon_host = 10.0.1.2 10.0.1.3 10.0.1.5
osd_journal_size = 5120
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.0.1.0/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.node002]
host = node002

[mon.node003]
host = node003

[mon.node005]
host = node005
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
3,867
361
88
node005 exists in the ceph.conf, but wasn't registered by the other MONs. The easiest is to try a pveceph destroy node005 and afterwards a pveceph create. Then hopefully the new MON starts working. If not, the log file /var/log/ceph/ceph-mon.node005.log should give some Clous.
 

Kaboom

Member
Mar 5, 2019
96
9
8
48
I found out this node had a different keyring, I don't understand why but I copied this from another node... double checked all the files (some with wrong owner and group rights) and now it starts.
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
3,867
361
88
I found out this node had a different keyring, I don't understand why but I copied this from another node...
I hope you didn't copy the keyring as well. Each new Ceph service will create a keyring by themselves.
 

Kaboom

Member
Mar 5, 2019
96
9
8
48
Talking about this keyring: /var/lib/ceph/mon/ceph-node005/keyring

When I use a unique keyring the monitor doesn't start. When I use all the same keyrings the monitor works. But I get this error:
mon.node005@0(electing) e16 failed to get devid for : fallback method has serial ''but no model
 
Last edited:

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
3,867
361
88
When I use a unique keyring the monitor doesn't start. When I use all the same keyrings the monitor works. But I get this error:
mon.node005@0(electing) e16 failed to get devid for : fallback method has serial ''but no model
That's message is from a running MON and doesn't prohibit it joining the other MONs.

Talking about this keyring: /var/lib/ceph/mon/ceph-node005/keyring
Yes, even though they are the same for the MONs, the are different for the other services like MGR, OSD, MDS or clients. But anyway you will not need to copy those, since they are created by the MON on bostrapping.
 

Kaboom

Member
Mar 5, 2019
96
9
8
48
Ok but I found out when I run different monitor keyrings the monitors doesn't start.

What is best to do now?
 

Kaboom

Member
Mar 5, 2019
96
9
8
48
I recreated all 3 monitors threw the GUI, but they all have the same keyring, is that correct?

And I still got this error on all 3 monitors, is this important?
2020-03-27 22:00:05.118 7ff15cea9700 -1 mon.node002@1(electing) e29 failed to get devid for : fallback method has serial ''but no model

=====

This cluster looks healthy:

ceph -s
cluster:
id: 09935360-cfe7-48d4-ac76-c02e0fdd95de
health: HEALTH_OK

services:
mon: 3 daemons, quorum node003,node002,node004 (age 7m)
mgr: node003(active, since 12m), standbys: node002, node004
osd: 36 osds: 36 up (since 36h), 36 in (since 4M)

data:
pools: 1 pools, 1024 pgs
objects: 848.70k objects, 3.0 TiB
usage: 8.3 TiB used, 7.5 TiB / 16 TiB avail
pgs: 1024 active+clean

io:
client: 519 KiB/s rd, 11 MiB/s wr, 37 op/s rd, 371 op/s wr
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
3,867
361
88
I recreated all 3 monitors threw the GUI, but they all have the same keyring, is that correct?
Yes they do.

And I still got this error on all 3 monitors, is this important?
2020-03-27 22:00:05.118 7ff15cea9700 -1 mon.node002@1(electing) e29 failed to get devid for : fallback method has serial ''but no model
This more an informational message.
 
  • Like
Reactions: Kaboom

Kaboom

Member
Mar 5, 2019
96
9
8
48
And every night at 00:00 I get this error message, is this something to worry about?

Mar 29 00:00:00 node002 ceph-mon[3003622]: 2020-03-29 00:00:00.129 7ff1636b6700 -1 Fail to open '/proc/2987489/cmdline' error = (2) No such file or directory
Mar 29 00:00:00 node002 ceph-mon[3003622]: 2020-03-29 00:00:00.133 7ff1636b6700 -1 received signal: Hangup from <unknown> (PID: 2987489) UID: 0
Mar 29 00:00:00 node002 ceph-mon[3003622]: 2020-03-29 00:00:00.133 7ff1636b6700 -1 Fail to open '/proc/2987489/cmdline' error = (2) No such file or directory
Mar 29 00:00:00 node002 ceph-mon[3003622]: 2020-03-29 00:00:00.133 7ff1636b6700 -1 received signal: Hangup from <unknown> (PID: 2987489) UID: 0
Mar 30 00:00:00 node002 ceph-mon[3003622]: 2020-03-30 00:00:00.074 7ff1636b6700 -1 received signal: Hangup from killall -q -1 ceph-mon ceph-mgr ceph-mds ceph-osd ceph-fuse radosgw rbd-mirror (PID: 2197612) UID: 0
Mar 30 00:00:00 node002 ceph-mon[3003622]: 2020-03-30 00:00:00.110 7ff1636b6700 -1 received signal: Hangup from (PID: 2197614) UID: 0
 

Alwin

Proxmox Staff Member
Staff member
Aug 1, 2017
3,867
361
88
And every night at 00:00 I get this error message, is this something to worry about?
What happens every day at midnight? :) Log rotation.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!