Ceph Error

Armando Ramos Roche

Well-Known Member
May 19, 2018
40
0
46
40
Good morning everyone,
I have a cluster of 3 Proxmox servers under version 6.4-13.
Last Friday I updated Ceph from nautilus to octupus, since it is one of the requirements to upgrade Proxmox to version 7.
At first everything worked wonders.
But today when I check I find that it is giving me the following error:
Error initializing cluster client: InvalidArgumentError ('RADOS invalid argument (error calling conf_read_file)')
I have no idea what could be happening because the configuration file exists on all 3 servers.
I've searched the internet, but can't find anything to help me.
 
Hi,
is /etc/ceph/ceph.conf a symlink to /etc/pve/ceph.conf on all nodes? Please share the content of that file and the output of pveversion -v. Where exactly does the error message appear?
 
Hello @Fabian_E
Thanks for answering.
Check what you told me in the 3 nodes, below the images of each one of them, their names are pve, pve1 and pve2.
PVE:
Cluster1.png
Cluster1-1.png

PVE1:
Node1.png
Node1-1.png

PVE2:
Node2.png
Node2-1.png
 

Attachments

  • Cluster1.png
    Cluster1.png
    51.1 KB · Views: 10
Please also share the contents of /etc/pve/ceph.conf. The file system in /etc/pve is shared among all nodes, so one of them is enough.
 
Please also share the contents of /etc/pve/ceph.conf. The file system in /etc/pve is shared among all nodes, so one of them is enough.
This is the contents:
Code:
root@pve:/etc/ceph# cat ceph.conf
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.12.17.0/24
         fsid = 8bfacf0e-e4e2-4c1e-a4b4-a3978cbc0bc5
         mon_allow_pool_delete = true
         mon_host = 10.12.17.25 10.12.17.22 10.12.17.23
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 10.12.17.0/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.pve]
         public_addr = 10.12.17.25

[mon.pve1]
         public_addr = 10.12.17.23

[mon.pve2]
         public_addr = 10.12.17.22
 
You seem to be missing the stanza:
Code:
[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring

I recommend adding it and restarting Ceph services

EDIT: fix code blocks
Might not be needed (at least I don't have it in my test cluster), but I suppose adding it should hurt either.

@Armando Ramos Roche Could you also share the storage configuration /etc/pve/storage.cfg and the output of dpkg-query -l | grep rados? Where exactly do you see the error message? Does it work if you manually do something like rbd -p <poolname> ls?
 
Hey Fabian_E,
Thanks a lot for your support.
The content of /etc/pve/storage.cfg
Code:
root@pve:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content images,vztmpl,iso
        maxfiles 10
        shared 0

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

nfs: VMBackups1
        export /share/1000/PVE
        path /mnt/pve/VMBackups1
        server 10.12.17.28
        content backup,iso,vztmpl,images
        prune-backups keep-last=1
The ouput of dpkg-query -l | grep rados
Code:
root@pve:~# dpkg-query -l | grep rados
ii  librados2                            15.2.14-pve1~bpo10                      amd64        RADOS distributed object store client library
ii  librados2-perl                       1.1-2                                   amd64        Perl bindings for librados
ii  libradosstriper1                     15.2.14-pve1~bpo10                      amd64        RADOS striping interface
ii  python-rados                         14.2.22-pve1                            amd64        Python 2 libraries for the Ceph librados library
ii  python3-rados                        15.2.14-pve1~bpo10                      amd64        Python 3 libraries for the Ceph librados library
The command rbd -p pve ls does not show me anything at all, it keeps loading.
I got the error when I tried to execute a command with ceph, for example ceph -w or ceph -s.
Right now I'm not getting anything at all.
 
Hey Fabian_E,
Now i see the result of rbd -p pve ls, it is that, ok
root@pve:~# rbd -p pve ls
2021-09-20T08:04:27.223-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
2021-09-20T08:09:27.223-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
2021-09-20T08:14:27.224-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
 
Hey Fabian_E,
Thanks a lot for your support.
The content of /etc/pve/storage.cfg
Code:
root@pve:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content images,vztmpl,iso
        maxfiles 10
        shared 0

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

nfs: VMBackups1
        export /share/1000/PVE
        path /mnt/pve/VMBackups1
        server 10.12.17.28
        content backup,iso,vztmpl,images
        prune-backups keep-last=1
Ok, so the storage is not currently configured here. Was it removed because it didn't work?

The ouput of dpkg-query -l | grep rados
Code:
root@pve:~# dpkg-query -l | grep rados
ii  librados2                            15.2.14-pve1~bpo10                      amd64        RADOS distributed object store client library
ii  librados2-perl                       1.1-2                                   amd64        Perl bindings for librados
ii  libradosstriper1                     15.2.14-pve1~bpo10                      amd64        RADOS striping interface
ii  python-rados                         14.2.22-pve1                            amd64        Python 2 libraries for the Ceph librados library
ii  python3-rados                        15.2.14-pve1~bpo10                      amd64        Python 3 libraries for the Ceph librados library
Nothing suspicious here.

The command rbd -p pve ls does not show me anything at all, it keeps loading.
I got the error when I tried to execute a command with ceph, for example ceph -w or ceph -s.
Right now I'm not getting anything at all.
Hey Fabian_E,
Now i see the result of rbd -p pve ls, it is that, ok
root@pve:~# rbd -p pve ls
2021-09-20T08:04:27.223-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
2021-09-20T08:09:27.223-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
2021-09-20T08:14:27.224-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
That sounds like your monitors are down or unreachable. Please check the status systemctl status ceph-mon@<nodename>.service on all nodes which have a monitor. If there's an error, you should check /var/log/ceph/ceph-mon.<nodename>.log and journalctl -b0 -u ceph-mon@<nodename>.service.
 
Hey Fabian_E, execution of command systemctl status ceph-mon@<nodename>.service on all nodes get this:
node PVE:
Code:
root@pve:~# systemctl status ceph-mon@pve.service
● ceph-mon@pve.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Fri 2021-09-17 08:00:31 EDT; 3 days ago
 Main PID: 3906913 (ceph-mon)
    Tasks: 26
   Memory: 205.7M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve.service
           └─3906913 /usr/bin/ceph-mon -f --cluster ceph --id pve --setuser ceph --setgroup ceph

Sep 20 12:17:30 pve ceph-mon[3906913]: 2021-09-20T12:17:30.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:35 pve ceph-mon[3906913]: 2021-09-20T12:17:35.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:40 pve ceph-mon[3906913]: 2021-09-20T12:17:40.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:43 pve ceph-mon[3906913]: 2021-09-20T12:17:43.912-0400 7f4eb6082700 -1 mon.pve@0(probing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:17:45 pve ceph-mon[3906913]: 2021-09-20T12:17:45.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:50 pve ceph-mon[3906913]: 2021-09-20T12:17:50.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:55 pve ceph-mon[3906913]: 2021-09-20T12:17:55.328-0400 7f4eb407e700 -1 mon.pve@0(electing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:58 pve ceph-mon[3906913]: 2021-09-20T12:17:58.968-0400 7f4eb6082700 -1 mon.pve@0(electing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:18:00 pve ceph-mon[3906913]: 2021-09-20T12:18:00.328-0400 7f4eb407e700 -1 mon.pve@0(electing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:18:13 pve ceph-mon[3906913]: 2021-09-20T12:18:13.984-0400 7f4eb6082700 -1 mon.pve@0(leader) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
On PVE1:
Code:
root@pve1:~# systemctl status ceph-mon@pve1.service
● ceph-mon@pve1.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Mon 2021-09-20 12:17:08 CDT; 5min ago
 Main PID: 2608650 (ceph-mon)
    Tasks: 27
   Memory: 21.3M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve1.service
           └─2608650 /usr/bin/ceph-mon -f --cluster ceph --id pve1 --setuser ceph --setgroup ceph

Sep 20 12:17:08 pve1 systemd[1]: Started Ceph cluster monitor daemon.
On PVE2:
Code:
root@pve2:~# systemctl status ceph-mon@pve2.service
● ceph-mon@pve2.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Mon 2021-09-20 12:17:45 CDT; 5min ago
 Main PID: 1194228 (ceph-mon)
    Tasks: 27
   Memory: 94.6M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve2.service
           └─1194228 /usr/bin/ceph-mon -f --cluster ceph --id pve2 --setuser ceph --setgroup ceph

Sep 20 12:21:12 pve2 ceph-mon[1194228]: 2021-09-20T12:21:12.294-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:27 pve2 ceph-mon[1194228]: 2021-09-20T12:21:27.310-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:42 pve2 ceph-mon[1194228]: 2021-09-20T12:21:42.329-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:57 pve2 ceph-mon[1194228]: 2021-09-20T12:21:57.345-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:12 pve2 ceph-mon[1194228]: 2021-09-20T12:22:12.361-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:27 pve2 ceph-mon[1194228]: 2021-09-20T12:22:27.377-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:42 pve2 ceph-mon[1194228]: 2021-09-20T12:22:42.393-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:57 pve2 ceph-mon[1194228]: 2021-09-20T12:22:57.408-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:23:12 pve2 ceph-mon[1194228]: 2021-09-20T12:23:12.424-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:23:27 pve2 ceph-mon[1194228]: 2021-09-20T12:23:27.440-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
The files /var/log/ceph/ceph-mon.<nodename>.log in nodes PVE and PVE2 have data... but in PVE1 is empty.
The command journalctl -b0 -u ceph-mon@<nodename>.service:
Code:
root@pve1:/var/log/ceph# journalctl -b0 -u ceph-mon@pve1.service
-- Logs begin at Thu 2021-09-09 14:29:02 CDT, end at Mon 2021-09-20 12:26:01 CDT. --
Sep 16 13:05:16 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:16 pve1 ceph-mon[1660588]: global_init: error reading config file.
Sep 16 13:05:16 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:16 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 1.
Sep 16 13:05:26 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:26 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:26 pve1 ceph-mon[1660625]: global_init: error reading config file.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 2.
Sep 16 13:05:37 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:37 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:37 pve1 ceph-mon[1660660]: global_init: error reading config file.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 3.
Sep 16 13:05:47 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:47 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:47 pve1 ceph-mon[1660695]: global_init: error reading config file.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 4.
Sep 16 13:05:57 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:57 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:57 pve1 ceph-mon[1660731]: global_init: error reading config file.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 5.
Sep 16 13:06:07 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Start request repeated too quickly.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:06:07 pve1 systemd[1]: Failed to start Ceph cluster monitor daemon.
Sep 20 12:17:08 pve1 systemd[1]: Started Ceph cluster monitor daemon.
 
Hey Fabian_E, execution of command systemctl status ceph-mon@<nodename>.service on all nodes get this:
node PVE:
Code:
root@pve:~# systemctl status ceph-mon@pve.service
● ceph-mon@pve.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Fri 2021-09-17 08:00:31 EDT; 3 days ago
 Main PID: 3906913 (ceph-mon)
    Tasks: 26
   Memory: 205.7M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve.service
           └─3906913 /usr/bin/ceph-mon -f --cluster ceph --id pve --setuser ceph --setgroup ceph

Sep 20 12:17:30 pve ceph-mon[3906913]: 2021-09-20T12:17:30.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:35 pve ceph-mon[3906913]: 2021-09-20T12:17:35.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:40 pve ceph-mon[3906913]: 2021-09-20T12:17:40.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:43 pve ceph-mon[3906913]: 2021-09-20T12:17:43.912-0400 7f4eb6082700 -1 mon.pve@0(probing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:17:45 pve ceph-mon[3906913]: 2021-09-20T12:17:45.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:50 pve ceph-mon[3906913]: 2021-09-20T12:17:50.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:55 pve ceph-mon[3906913]: 2021-09-20T12:17:55.328-0400 7f4eb407e700 -1 mon.pve@0(electing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:58 pve ceph-mon[3906913]: 2021-09-20T12:17:58.968-0400 7f4eb6082700 -1 mon.pve@0(electing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:18:00 pve ceph-mon[3906913]: 2021-09-20T12:18:00.328-0400 7f4eb407e700 -1 mon.pve@0(electing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:18:13 pve ceph-mon[3906913]: 2021-09-20T12:18:13.984-0400 7f4eb6082700 -1 mon.pve@0(leader) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
On PVE1:
Code:
root@pve1:~# systemctl status ceph-mon@pve1.service
● ceph-mon@pve1.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Mon 2021-09-20 12:17:08 CDT; 5min ago
 Main PID: 2608650 (ceph-mon)
    Tasks: 27
   Memory: 21.3M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve1.service
           └─2608650 /usr/bin/ceph-mon -f --cluster ceph --id pve1 --setuser ceph --setgroup ceph

Sep 20 12:17:08 pve1 systemd[1]: Started Ceph cluster monitor daemon.
On PVE2:
Code:
root@pve2:~# systemctl status ceph-mon@pve2.service
● ceph-mon@pve2.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Mon 2021-09-20 12:17:45 CDT; 5min ago
 Main PID: 1194228 (ceph-mon)
    Tasks: 27
   Memory: 94.6M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve2.service
           └─1194228 /usr/bin/ceph-mon -f --cluster ceph --id pve2 --setuser ceph --setgroup ceph

Sep 20 12:21:12 pve2 ceph-mon[1194228]: 2021-09-20T12:21:12.294-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:27 pve2 ceph-mon[1194228]: 2021-09-20T12:21:27.310-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:42 pve2 ceph-mon[1194228]: 2021-09-20T12:21:42.329-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:57 pve2 ceph-mon[1194228]: 2021-09-20T12:21:57.345-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:12 pve2 ceph-mon[1194228]: 2021-09-20T12:22:12.361-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:27 pve2 ceph-mon[1194228]: 2021-09-20T12:22:27.377-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:42 pve2 ceph-mon[1194228]: 2021-09-20T12:22:42.393-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:57 pve2 ceph-mon[1194228]: 2021-09-20T12:22:57.408-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:23:12 pve2 ceph-mon[1194228]: 2021-09-20T12:23:12.424-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:23:27 pve2 ceph-mon[1194228]: 2021-09-20T12:23:27.440-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
The files /var/log/ceph/ceph-mon.<nodename>.log in nodes PVE and PVE2 have data... but in PVE1 is empty.
The command journalctl -b0 -u ceph-mon@<nodename>.service:
Code:
root@pve1:/var/log/ceph# journalctl -b0 -u ceph-mon@pve1.service
-- Logs begin at Thu 2021-09-09 14:29:02 CDT, end at Mon 2021-09-20 12:26:01 CDT. --
Sep 16 13:05:16 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:16 pve1 ceph-mon[1660588]: global_init: error reading config file.
Sep 16 13:05:16 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:16 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 1.
Sep 16 13:05:26 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:26 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:26 pve1 ceph-mon[1660625]: global_init: error reading config file.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 2.
Sep 16 13:05:37 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:37 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:37 pve1 ceph-mon[1660660]: global_init: error reading config file.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 3.
Sep 16 13:05:47 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:47 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:47 pve1 ceph-mon[1660695]: global_init: error reading config file.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 4.
Sep 16 13:05:57 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:57 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:57 pve1 ceph-mon[1660731]: global_init: error reading config file.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 5.
Sep 16 13:06:07 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Start request repeated too quickly.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:06:07 pve1 systemd[1]: Failed to start Ceph cluster monitor daemon.
Sep 20 12:17:08 pve1 systemd[1]: Started Ceph cluster monitor daemon.
It seems like the two monitors on pve and pve2 started complaining once the monitor on pve1 was started. Please check if the files /var/lib/ceph/mon/ceph-<nodename>/keyring has the same contents on each node (don't post the keys here!).

Fabian_E, ty a lot for your support..
Everything seems to be better.
I only have this:
View attachment 29624
There are no OSDs shown. Please do the same you did for the monitors for the OSD services, i.e. systemctl status ceph-osd@<ID>.service and try starting them if they are simply stopped. Did you already try @RokaKen 's suggestion?

I'd go without the monitor on node pve1 for now if it causes problems (simply stop it), and try to get the OSDs up and running.
 
Fabian_E, Good morning again,
You're right, the keys in /var/lib/ceph/mon/ceph-<nodename>/ keyring, they are different. pve and pve2 have the same, but pve1 does not.
The @RokaKen suggestion dont show.. said: This member limits who may view their full profile. :oops:
I have not created any OSD first I need the monitors working.
 
Fabian_E, Good morning again,
You're right, the keys in /var/lib/ceph/mon/ceph-<nodename>/ keyring, they are different. pve and pve2 have the same, but pve1 does not.
Maybe the easiest is to stop the monitor on pve1 and try and try to re-create it (or manually copy the key and restart the service).

The @RokaKen suggestion dont show.. said: This member limits who may view their full profile. :oops:
I meant the suggestion posted earlier in this thread:
You seem to be missing the stanza:
Code:
[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring

I recommend adding it and restarting Ceph services

EDIT: fix code blocks

I have not created any OSD first I need the monitors working.