Ceph Error

Armando Ramos Roche

Well-Known Member
May 19, 2018
37
0
46
38
Good morning everyone,
I have a cluster of 3 Proxmox servers under version 6.4-13.
Last Friday I updated Ceph from nautilus to octupus, since it is one of the requirements to upgrade Proxmox to version 7.
At first everything worked wonders.
But today when I check I find that it is giving me the following error:
Error initializing cluster client: InvalidArgumentError ('RADOS invalid argument (error calling conf_read_file)')
I have no idea what could be happening because the configuration file exists on all 3 servers.
I've searched the internet, but can't find anything to help me.
 
Hi,
is /etc/ceph/ceph.conf a symlink to /etc/pve/ceph.conf on all nodes? Please share the content of that file and the output of pveversion -v. Where exactly does the error message appear?
 
Hello @Fabian_E
Thanks for answering.
Check what you told me in the 3 nodes, below the images of each one of them, their names are pve, pve1 and pve2.
PVE:
Cluster1.png
Cluster1-1.png

PVE1:
Node1.png
Node1-1.png

PVE2:
Node2.png
Node2-1.png
 

Attachments

  • Cluster1.png
    Cluster1.png
    51.1 KB · Views: 10
Please also share the contents of /etc/pve/ceph.conf. The file system in /etc/pve is shared among all nodes, so one of them is enough.
 
Please also share the contents of /etc/pve/ceph.conf. The file system in /etc/pve is shared among all nodes, so one of them is enough.
This is the contents:
Code:
root@pve:/etc/ceph# cat ceph.conf
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.12.17.0/24
         fsid = 8bfacf0e-e4e2-4c1e-a4b4-a3978cbc0bc5
         mon_allow_pool_delete = true
         mon_host = 10.12.17.25 10.12.17.22 10.12.17.23
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 10.12.17.0/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.pve]
         public_addr = 10.12.17.25

[mon.pve1]
         public_addr = 10.12.17.23

[mon.pve2]
         public_addr = 10.12.17.22
 
This is the contents:
Code:
root@pve:/etc/ceph# cat ceph.conf
<snip>

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

<snip>
You seem to be missing the stanza:
Code:
[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring

I recommend adding it and restarting Ceph services

EDIT: fix code blocks
 
Last edited:
You seem to be missing the stanza:
Code:
[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring

I recommend adding it and restarting Ceph services

EDIT: fix code blocks
Might not be needed (at least I don't have it in my test cluster), but I suppose adding it should hurt either.

@Armando Ramos Roche Could you also share the storage configuration /etc/pve/storage.cfg and the output of dpkg-query -l | grep rados? Where exactly do you see the error message? Does it work if you manually do something like rbd -p <poolname> ls?
 
Hey Fabian_E,
Thanks a lot for your support.
The content of /etc/pve/storage.cfg
Code:
root@pve:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content images,vztmpl,iso
        maxfiles 10
        shared 0

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

nfs: VMBackups1
        export /share/1000/PVE
        path /mnt/pve/VMBackups1
        server 10.12.17.28
        content backup,iso,vztmpl,images
        prune-backups keep-last=1
The ouput of dpkg-query -l | grep rados
Code:
root@pve:~# dpkg-query -l | grep rados
ii  librados2                            15.2.14-pve1~bpo10                      amd64        RADOS distributed object store client library
ii  librados2-perl                       1.1-2                                   amd64        Perl bindings for librados
ii  libradosstriper1                     15.2.14-pve1~bpo10                      amd64        RADOS striping interface
ii  python-rados                         14.2.22-pve1                            amd64        Python 2 libraries for the Ceph librados library
ii  python3-rados                        15.2.14-pve1~bpo10                      amd64        Python 3 libraries for the Ceph librados library
The command rbd -p pve ls does not show me anything at all, it keeps loading.
I got the error when I tried to execute a command with ceph, for example ceph -w or ceph -s.
Right now I'm not getting anything at all.
 
Hey Fabian_E,
Now i see the result of rbd -p pve ls, it is that, ok
root@pve:~# rbd -p pve ls
2021-09-20T08:04:27.223-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
2021-09-20T08:09:27.223-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
2021-09-20T08:14:27.224-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
 
Hey Fabian_E,
Thanks a lot for your support.
The content of /etc/pve/storage.cfg
Code:
root@pve:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content images,vztmpl,iso
        maxfiles 10
        shared 0

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

nfs: VMBackups1
        export /share/1000/PVE
        path /mnt/pve/VMBackups1
        server 10.12.17.28
        content backup,iso,vztmpl,images
        prune-backups keep-last=1
Ok, so the storage is not currently configured here. Was it removed because it didn't work?

The ouput of dpkg-query -l | grep rados
Code:
root@pve:~# dpkg-query -l | grep rados
ii  librados2                            15.2.14-pve1~bpo10                      amd64        RADOS distributed object store client library
ii  librados2-perl                       1.1-2                                   amd64        Perl bindings for librados
ii  libradosstriper1                     15.2.14-pve1~bpo10                      amd64        RADOS striping interface
ii  python-rados                         14.2.22-pve1                            amd64        Python 2 libraries for the Ceph librados library
ii  python3-rados                        15.2.14-pve1~bpo10                      amd64        Python 3 libraries for the Ceph librados library
Nothing suspicious here.

The command rbd -p pve ls does not show me anything at all, it keeps loading.
I got the error when I tried to execute a command with ceph, for example ceph -w or ceph -s.
Right now I'm not getting anything at all.
Hey Fabian_E,
Now i see the result of rbd -p pve ls, it is that, ok
root@pve:~# rbd -p pve ls
2021-09-20T08:04:27.223-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
2021-09-20T08:09:27.223-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
2021-09-20T08:14:27.224-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
That sounds like your monitors are down or unreachable. Please check the status systemctl status ceph-mon@<nodename>.service on all nodes which have a monitor. If there's an error, you should check /var/log/ceph/ceph-mon.<nodename>.log and journalctl -b0 -u ceph-mon@<nodename>.service.
 
Hey Fabian_E, execution of command systemctl status ceph-mon@<nodename>.service on all nodes get this:
node PVE:
Code:
root@pve:~# systemctl status ceph-mon@pve.service
● ceph-mon@pve.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Fri 2021-09-17 08:00:31 EDT; 3 days ago
 Main PID: 3906913 (ceph-mon)
    Tasks: 26
   Memory: 205.7M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve.service
           └─3906913 /usr/bin/ceph-mon -f --cluster ceph --id pve --setuser ceph --setgroup ceph

Sep 20 12:17:30 pve ceph-mon[3906913]: 2021-09-20T12:17:30.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:35 pve ceph-mon[3906913]: 2021-09-20T12:17:35.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:40 pve ceph-mon[3906913]: 2021-09-20T12:17:40.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:43 pve ceph-mon[3906913]: 2021-09-20T12:17:43.912-0400 7f4eb6082700 -1 mon.pve@0(probing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:17:45 pve ceph-mon[3906913]: 2021-09-20T12:17:45.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:50 pve ceph-mon[3906913]: 2021-09-20T12:17:50.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:55 pve ceph-mon[3906913]: 2021-09-20T12:17:55.328-0400 7f4eb407e700 -1 mon.pve@0(electing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:58 pve ceph-mon[3906913]: 2021-09-20T12:17:58.968-0400 7f4eb6082700 -1 mon.pve@0(electing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:18:00 pve ceph-mon[3906913]: 2021-09-20T12:18:00.328-0400 7f4eb407e700 -1 mon.pve@0(electing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:18:13 pve ceph-mon[3906913]: 2021-09-20T12:18:13.984-0400 7f4eb6082700 -1 mon.pve@0(leader) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
On PVE1:
Code:
root@pve1:~# systemctl status ceph-mon@pve1.service
● ceph-mon@pve1.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Mon 2021-09-20 12:17:08 CDT; 5min ago
 Main PID: 2608650 (ceph-mon)
    Tasks: 27
   Memory: 21.3M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve1.service
           └─2608650 /usr/bin/ceph-mon -f --cluster ceph --id pve1 --setuser ceph --setgroup ceph

Sep 20 12:17:08 pve1 systemd[1]: Started Ceph cluster monitor daemon.
On PVE2:
Code:
root@pve2:~# systemctl status ceph-mon@pve2.service
● ceph-mon@pve2.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Mon 2021-09-20 12:17:45 CDT; 5min ago
 Main PID: 1194228 (ceph-mon)
    Tasks: 27
   Memory: 94.6M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve2.service
           └─1194228 /usr/bin/ceph-mon -f --cluster ceph --id pve2 --setuser ceph --setgroup ceph

Sep 20 12:21:12 pve2 ceph-mon[1194228]: 2021-09-20T12:21:12.294-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:27 pve2 ceph-mon[1194228]: 2021-09-20T12:21:27.310-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:42 pve2 ceph-mon[1194228]: 2021-09-20T12:21:42.329-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:57 pve2 ceph-mon[1194228]: 2021-09-20T12:21:57.345-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:12 pve2 ceph-mon[1194228]: 2021-09-20T12:22:12.361-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:27 pve2 ceph-mon[1194228]: 2021-09-20T12:22:27.377-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:42 pve2 ceph-mon[1194228]: 2021-09-20T12:22:42.393-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:57 pve2 ceph-mon[1194228]: 2021-09-20T12:22:57.408-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:23:12 pve2 ceph-mon[1194228]: 2021-09-20T12:23:12.424-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:23:27 pve2 ceph-mon[1194228]: 2021-09-20T12:23:27.440-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
The files /var/log/ceph/ceph-mon.<nodename>.log in nodes PVE and PVE2 have data... but in PVE1 is empty.
The command journalctl -b0 -u ceph-mon@<nodename>.service:
Code:
root@pve1:/var/log/ceph# journalctl -b0 -u ceph-mon@pve1.service
-- Logs begin at Thu 2021-09-09 14:29:02 CDT, end at Mon 2021-09-20 12:26:01 CDT. --
Sep 16 13:05:16 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:16 pve1 ceph-mon[1660588]: global_init: error reading config file.
Sep 16 13:05:16 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:16 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 1.
Sep 16 13:05:26 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:26 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:26 pve1 ceph-mon[1660625]: global_init: error reading config file.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 2.
Sep 16 13:05:37 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:37 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:37 pve1 ceph-mon[1660660]: global_init: error reading config file.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 3.
Sep 16 13:05:47 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:47 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:47 pve1 ceph-mon[1660695]: global_init: error reading config file.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 4.
Sep 16 13:05:57 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:57 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:57 pve1 ceph-mon[1660731]: global_init: error reading config file.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 5.
Sep 16 13:06:07 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Start request repeated too quickly.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:06:07 pve1 systemd[1]: Failed to start Ceph cluster monitor daemon.
Sep 20 12:17:08 pve1 systemd[1]: Started Ceph cluster monitor daemon.
 
Hey Fabian_E, execution of command systemctl status ceph-mon@<nodename>.service on all nodes get this:
node PVE:
Code:
root@pve:~# systemctl status ceph-mon@pve.service
● ceph-mon@pve.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Fri 2021-09-17 08:00:31 EDT; 3 days ago
 Main PID: 3906913 (ceph-mon)
    Tasks: 26
   Memory: 205.7M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve.service
           └─3906913 /usr/bin/ceph-mon -f --cluster ceph --id pve --setuser ceph --setgroup ceph

Sep 20 12:17:30 pve ceph-mon[3906913]: 2021-09-20T12:17:30.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:35 pve ceph-mon[3906913]: 2021-09-20T12:17:35.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:40 pve ceph-mon[3906913]: 2021-09-20T12:17:40.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:43 pve ceph-mon[3906913]: 2021-09-20T12:17:43.912-0400 7f4eb6082700 -1 mon.pve@0(probing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:17:45 pve ceph-mon[3906913]: 2021-09-20T12:17:45.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:50 pve ceph-mon[3906913]: 2021-09-20T12:17:50.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:55 pve ceph-mon[3906913]: 2021-09-20T12:17:55.328-0400 7f4eb407e700 -1 mon.pve@0(electing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:58 pve ceph-mon[3906913]: 2021-09-20T12:17:58.968-0400 7f4eb6082700 -1 mon.pve@0(electing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:18:00 pve ceph-mon[3906913]: 2021-09-20T12:18:00.328-0400 7f4eb407e700 -1 mon.pve@0(electing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:18:13 pve ceph-mon[3906913]: 2021-09-20T12:18:13.984-0400 7f4eb6082700 -1 mon.pve@0(leader) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
On PVE1:
Code:
root@pve1:~# systemctl status ceph-mon@pve1.service
● ceph-mon@pve1.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Mon 2021-09-20 12:17:08 CDT; 5min ago
 Main PID: 2608650 (ceph-mon)
    Tasks: 27
   Memory: 21.3M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve1.service
           └─2608650 /usr/bin/ceph-mon -f --cluster ceph --id pve1 --setuser ceph --setgroup ceph

Sep 20 12:17:08 pve1 systemd[1]: Started Ceph cluster monitor daemon.
On PVE2:
Code:
root@pve2:~# systemctl status ceph-mon@pve2.service
● ceph-mon@pve2.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Mon 2021-09-20 12:17:45 CDT; 5min ago
 Main PID: 1194228 (ceph-mon)
    Tasks: 27
   Memory: 94.6M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve2.service
           └─1194228 /usr/bin/ceph-mon -f --cluster ceph --id pve2 --setuser ceph --setgroup ceph

Sep 20 12:21:12 pve2 ceph-mon[1194228]: 2021-09-20T12:21:12.294-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:27 pve2 ceph-mon[1194228]: 2021-09-20T12:21:27.310-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:42 pve2 ceph-mon[1194228]: 2021-09-20T12:21:42.329-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:57 pve2 ceph-mon[1194228]: 2021-09-20T12:21:57.345-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:12 pve2 ceph-mon[1194228]: 2021-09-20T12:22:12.361-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:27 pve2 ceph-mon[1194228]: 2021-09-20T12:22:27.377-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:42 pve2 ceph-mon[1194228]: 2021-09-20T12:22:42.393-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:57 pve2 ceph-mon[1194228]: 2021-09-20T12:22:57.408-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:23:12 pve2 ceph-mon[1194228]: 2021-09-20T12:23:12.424-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:23:27 pve2 ceph-mon[1194228]: 2021-09-20T12:23:27.440-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
The files /var/log/ceph/ceph-mon.<nodename>.log in nodes PVE and PVE2 have data... but in PVE1 is empty.
The command journalctl -b0 -u ceph-mon@<nodename>.service:
Code:
root@pve1:/var/log/ceph# journalctl -b0 -u ceph-mon@pve1.service
-- Logs begin at Thu 2021-09-09 14:29:02 CDT, end at Mon 2021-09-20 12:26:01 CDT. --
Sep 16 13:05:16 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:16 pve1 ceph-mon[1660588]: global_init: error reading config file.
Sep 16 13:05:16 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:16 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 1.
Sep 16 13:05:26 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:26 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:26 pve1 ceph-mon[1660625]: global_init: error reading config file.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 2.
Sep 16 13:05:37 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:37 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:37 pve1 ceph-mon[1660660]: global_init: error reading config file.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 3.
Sep 16 13:05:47 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:47 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:47 pve1 ceph-mon[1660695]: global_init: error reading config file.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 4.
Sep 16 13:05:57 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:57 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:57 pve1 ceph-mon[1660731]: global_init: error reading config file.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 5.
Sep 16 13:06:07 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Start request repeated too quickly.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:06:07 pve1 systemd[1]: Failed to start Ceph cluster monitor daemon.
Sep 20 12:17:08 pve1 systemd[1]: Started Ceph cluster monitor daemon.
It seems like the two monitors on pve and pve2 started complaining once the monitor on pve1 was started. Please check if the files /var/lib/ceph/mon/ceph-<nodename>/keyring has the same contents on each node (don't post the keys here!).

Fabian_E, ty a lot for your support..
Everything seems to be better.
I only have this:
View attachment 29624
There are no OSDs shown. Please do the same you did for the monitors for the OSD services, i.e. systemctl status ceph-osd@<ID>.service and try starting them if they are simply stopped. Did you already try @RokaKen 's suggestion?

I'd go without the monitor on node pve1 for now if it causes problems (simply stop it), and try to get the OSDs up and running.
 
Fabian_E, Good morning again,
You're right, the keys in /var/lib/ceph/mon/ceph-<nodename>/ keyring, they are different. pve and pve2 have the same, but pve1 does not.
The @RokaKen suggestion dont show.. said: This member limits who may view their full profile. :oops:
I have not created any OSD first I need the monitors working.
 
Fabian_E, Good morning again,
You're right, the keys in /var/lib/ceph/mon/ceph-<nodename>/ keyring, they are different. pve and pve2 have the same, but pve1 does not.
Maybe the easiest is to stop the monitor on pve1 and try and try to re-create it (or manually copy the key and restart the service).

The @RokaKen suggestion dont show.. said: This member limits who may view their full profile. :oops:
I meant the suggestion posted earlier in this thread:
You seem to be missing the stanza:
Code:
[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring

I recommend adding it and restarting Ceph services

EDIT: fix code blocks

I have not created any OSD first I need the monitors working.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!