Ceph Error

Armando Ramos Roche · Sep 13, 2021

Good morning everyone,
I have a cluster of 3 Proxmox servers under version 6.4-13.
Last Friday I updated Ceph from nautilus to octupus, since it is one of the requirements to upgrade Proxmox to version 7.
At first everything worked wonders.
But today when I check I find that it is giving me the following error:

Error initializing cluster client: InvalidArgumentError ('RADOS invalid argument (error calling conf_read_file)')

I have no idea what could be happening because the configuration file exists on all 3 servers.
I've searched the internet, but can't find anything to help me.

fiona · Sep 14, 2021

Hi,
is /etc/ceph/ceph.conf a symlink to /etc/pve/ceph.conf on all nodes? Please share the content of that file and the output of pveversion -v. Where exactly does the error message appear?

Armando Ramos Roche · Sep 14, 2021

Hello @Fabian_E
Thanks for answering.
Check what you told me in the 3 nodes, below the images of each one of them, their names are pve, pve1 and pve2.
PVE:

PVE1:

PVE2:

fiona · Sep 14, 2021

Please also share the contents of /etc/pve/ceph.conf. The file system in /etc/pve is shared among all nodes, so one of them is enough.

Armando Ramos Roche · Sep 16, 2021

Fabian_E said:
Please also share the contents of /etc/pve/ceph.conf. The file system in /etc/pve is shared among all nodes, so one of them is enough.

This is the contents:

Code:

root@pve:/etc/ceph# cat ceph.conf
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.12.17.0/24
         fsid = 8bfacf0e-e4e2-4c1e-a4b4-a3978cbc0bc5
         mon_allow_pool_delete = true
         mon_host = 10.12.17.25 10.12.17.22 10.12.17.23
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         public_network = 10.12.17.0/24

[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.pve]
         public_addr = 10.12.17.25

[mon.pve1]
         public_addr = 10.12.17.23

[mon.pve2]
         public_addr = 10.12.17.22

RokaKen · Sep 16, 2021

Armando Ramos Roche said:
This is the contents:

Code:

root@pve:/etc/ceph# cat ceph.conf <snip> [client] keyring = /etc/pve/priv/$cluster.$name.keyring <snip>

You seem to be missing the stanza:

Code:

[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring

I recommend adding it and restarting Ceph services

EDIT: fix code blocks

fiona · Sep 17, 2021

RokaKen said:
You seem to be missing the stanza:

Code:

[osd] keyring = /var/lib/ceph/osd/ceph-$id/keyring

I recommend adding it and restarting Ceph services

EDIT: fix code blocks

Might not be needed (at least I don't have it in my test cluster), but I suppose adding it should hurt either.

@Armando Ramos Roche Could you also share the storage configuration /etc/pve/storage.cfg and the output of dpkg-query -l | grep rados? Where exactly do you see the error message? Does it work if you manually do something like rbd -p <poolname> ls?

Armando Ramos Roche · Sep 20, 2021

Hey Fabian_E,
Thanks a lot for your support.
The content of /etc/pve/storage.cfg

Code:

root@pve:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content images,vztmpl,iso
        maxfiles 10
        shared 0

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

nfs: VMBackups1
        export /share/1000/PVE
        path /mnt/pve/VMBackups1
        server 10.12.17.28
        content backup,iso,vztmpl,images
        prune-backups keep-last=1

The ouput of dpkg-query -l | grep rados

Code:

root@pve:~# dpkg-query -l | grep rados
ii  librados2                            15.2.14-pve1~bpo10                      amd64        RADOS distributed object store client library
ii  librados2-perl                       1.1-2                                   amd64        Perl bindings for librados
ii  libradosstriper1                     15.2.14-pve1~bpo10                      amd64        RADOS striping interface
ii  python-rados                         14.2.22-pve1                            amd64        Python 2 libraries for the Ceph librados library
ii  python3-rados                        15.2.14-pve1~bpo10                      amd64        Python 3 libraries for the Ceph librados library

The command rbd -p pve ls does not show me anything at all, it keeps loading.
I got the error when I tried to execute a command with ceph, for example ceph -w or ceph -s.
Right now I'm not getting anything at all.

Armando Ramos Roche · Sep 20, 2021

Hey Fabian_E,
Now i see the result of rbd -p pve ls, it is that, ok
root@pve:~# rbd -p pve ls
2021-09-20T08:04:27.223-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
2021-09-20T08:09:27.223-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
2021-09-20T08:14:27.224-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300

fiona · Sep 20, 2021

Armando Ramos Roche said:

Hey Fabian_E,
Thanks a lot for your support.
The content of /etc/pve/storage.cfg

Code:

root@pve:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content images,vztmpl,iso
        maxfiles 10
        shared 0

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

nfs: VMBackups1
        export /share/1000/PVE
        path /mnt/pve/VMBackups1
        server 10.12.17.28
        content backup,iso,vztmpl,images
        prune-backups keep-last=1

Ok, so the storage is not currently configured here. Was it removed because it didn't work?

Armando Ramos Roche said:

The ouput of dpkg-query -l | grep rados

Code:

root@pve:~# dpkg-query -l | grep rados
ii  librados2                            15.2.14-pve1~bpo10                      amd64        RADOS distributed object store client library
ii  librados2-perl                       1.1-2                                   amd64        Perl bindings for librados
ii  libradosstriper1                     15.2.14-pve1~bpo10                      amd64        RADOS striping interface
ii  python-rados                         14.2.22-pve1                            amd64        Python 2 libraries for the Ceph librados library
ii  python3-rados                        15.2.14-pve1~bpo10                      amd64        Python 3 libraries for the Ceph librados library

Nothing suspicious here.

Armando Ramos Roche said:
The command rbd -p pve ls does not show me anything at all, it keeps loading.
I got the error when I tried to execute a command with ceph, for example ceph -w or ceph -s.
Right now I'm not getting anything at all.

Armando Ramos Roche said:
Hey Fabian_E,
Now i see the result of rbd -p pve ls, it is that, ok
root@pve:~# rbd -p pve ls
2021-09-20T08:04:27.223-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
2021-09-20T08:09:27.223-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300
2021-09-20T08:14:27.224-0400 7f2b4bd283c0 0 monclient(hunting): authenticate timed out after 300

That sounds like your monitors are down or unreachable. Please check the status systemctl status ceph-mon@<nodename>.service on all nodes which have a monitor. If there's an error, you should check /var/log/ceph/ceph-mon.<nodename>.log and journalctl -b0 -u ceph-mon@<nodename>.service.

Armando Ramos Roche · Sep 20, 2021

Hey Fabian_E, execution of command systemctl status ceph-mon@<nodename>.service on all nodes get this:
node PVE:

Code:

root@pve:~# systemctl status ceph-mon@pve.service
● ceph-mon@pve.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Fri 2021-09-17 08:00:31 EDT; 3 days ago
 Main PID: 3906913 (ceph-mon)
    Tasks: 26
   Memory: 205.7M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve.service
           └─3906913 /usr/bin/ceph-mon -f --cluster ceph --id pve --setuser ceph --setgroup ceph

Sep 20 12:17:30 pve ceph-mon[3906913]: 2021-09-20T12:17:30.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:35 pve ceph-mon[3906913]: 2021-09-20T12:17:35.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:40 pve ceph-mon[3906913]: 2021-09-20T12:17:40.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:43 pve ceph-mon[3906913]: 2021-09-20T12:17:43.912-0400 7f4eb6082700 -1 mon.pve@0(probing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:17:45 pve ceph-mon[3906913]: 2021-09-20T12:17:45.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:50 pve ceph-mon[3906913]: 2021-09-20T12:17:50.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:55 pve ceph-mon[3906913]: 2021-09-20T12:17:55.328-0400 7f4eb407e700 -1 mon.pve@0(electing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:58 pve ceph-mon[3906913]: 2021-09-20T12:17:58.968-0400 7f4eb6082700 -1 mon.pve@0(electing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:18:00 pve ceph-mon[3906913]: 2021-09-20T12:18:00.328-0400 7f4eb407e700 -1 mon.pve@0(electing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:18:13 pve ceph-mon[3906913]: 2021-09-20T12:18:13.984-0400 7f4eb6082700 -1 mon.pve@0(leader) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied

On PVE1:

Code:

root@pve1:~# systemctl status ceph-mon@pve1.service
● ceph-mon@pve1.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Mon 2021-09-20 12:17:08 CDT; 5min ago
 Main PID: 2608650 (ceph-mon)
    Tasks: 27
   Memory: 21.3M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve1.service
           └─2608650 /usr/bin/ceph-mon -f --cluster ceph --id pve1 --setuser ceph --setgroup ceph

Sep 20 12:17:08 pve1 systemd[1]: Started Ceph cluster monitor daemon.

On PVE2:

Code:

root@pve2:~# systemctl status ceph-mon@pve2.service
● ceph-mon@pve2.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Mon 2021-09-20 12:17:45 CDT; 5min ago
 Main PID: 1194228 (ceph-mon)
    Tasks: 27
   Memory: 94.6M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve2.service
           └─1194228 /usr/bin/ceph-mon -f --cluster ceph --id pve2 --setuser ceph --setgroup ceph

Sep 20 12:21:12 pve2 ceph-mon[1194228]: 2021-09-20T12:21:12.294-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:27 pve2 ceph-mon[1194228]: 2021-09-20T12:21:27.310-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:42 pve2 ceph-mon[1194228]: 2021-09-20T12:21:42.329-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:57 pve2 ceph-mon[1194228]: 2021-09-20T12:21:57.345-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:12 pve2 ceph-mon[1194228]: 2021-09-20T12:22:12.361-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:27 pve2 ceph-mon[1194228]: 2021-09-20T12:22:27.377-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:42 pve2 ceph-mon[1194228]: 2021-09-20T12:22:42.393-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:57 pve2 ceph-mon[1194228]: 2021-09-20T12:22:57.408-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:23:12 pve2 ceph-mon[1194228]: 2021-09-20T12:23:12.424-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:23:27 pve2 ceph-mon[1194228]: 2021-09-20T12:23:27.440-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied

The files /var/log/ceph/ceph-mon.<nodename>.log in nodes PVE and PVE2 have data... but in PVE1 is empty.
The command journalctl -b0 -u ceph-mon@<nodename>.service:

Code:

root@pve1:/var/log/ceph# journalctl -b0 -u ceph-mon@pve1.service
-- Logs begin at Thu 2021-09-09 14:29:02 CDT, end at Mon 2021-09-20 12:26:01 CDT. --
Sep 16 13:05:16 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:16 pve1 ceph-mon[1660588]: global_init: error reading config file.
Sep 16 13:05:16 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:16 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 1.
Sep 16 13:05:26 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:26 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:26 pve1 ceph-mon[1660625]: global_init: error reading config file.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 2.
Sep 16 13:05:37 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:37 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:37 pve1 ceph-mon[1660660]: global_init: error reading config file.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 3.
Sep 16 13:05:47 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:47 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:47 pve1 ceph-mon[1660695]: global_init: error reading config file.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 4.
Sep 16 13:05:57 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:57 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:57 pve1 ceph-mon[1660731]: global_init: error reading config file.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 5.
Sep 16 13:06:07 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Start request repeated too quickly.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:06:07 pve1 systemd[1]: Failed to start Ceph cluster monitor daemon.
Sep 20 12:17:08 pve1 systemd[1]: Started Ceph cluster monitor daemon.

Armando Ramos Roche · Sep 20, 2021

Fabian_E, ty a lot for your support..
Everything seems to be better.
I only have this:

fiona · Sep 21, 2021

Armando Ramos Roche said:

Hey Fabian_E, execution of command systemctl status ceph-mon@<nodename>.service on all nodes get this:
node PVE:

Code:

root@pve:~# systemctl status ceph-mon@pve.service
● ceph-mon@pve.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Fri 2021-09-17 08:00:31 EDT; 3 days ago
 Main PID: 3906913 (ceph-mon)
    Tasks: 26
   Memory: 205.7M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve.service
           └─3906913 /usr/bin/ceph-mon -f --cluster ceph --id pve --setuser ceph --setgroup ceph

Sep 20 12:17:30 pve ceph-mon[3906913]: 2021-09-20T12:17:30.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:35 pve ceph-mon[3906913]: 2021-09-20T12:17:35.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:40 pve ceph-mon[3906913]: 2021-09-20T12:17:40.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:43 pve ceph-mon[3906913]: 2021-09-20T12:17:43.912-0400 7f4eb6082700 -1 mon.pve@0(probing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:17:45 pve ceph-mon[3906913]: 2021-09-20T12:17:45.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:50 pve ceph-mon[3906913]: 2021-09-20T12:17:50.328-0400 7f4eb407e700 -1 mon.pve@0(probing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:55 pve ceph-mon[3906913]: 2021-09-20T12:17:55.328-0400 7f4eb407e700 -1 mon.pve@0(electing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:17:58 pve ceph-mon[3906913]: 2021-09-20T12:17:58.968-0400 7f4eb6082700 -1 mon.pve@0(electing) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:18:00 pve ceph-mon[3906913]: 2021-09-20T12:18:00.328-0400 7f4eb407e700 -1 mon.pve@0(electing) e4 get_health_metrics reporting 1 slow ops, oldest is auth(proto 0 28 bytes epoch 0)
Sep 20 12:18:13 pve ceph-mon[3906913]: 2021-09-20T12:18:13.984-0400 7f4eb6082700 -1 mon.pve@0(leader) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied

On PVE1:

Code:

root@pve1:~# systemctl status ceph-mon@pve1.service
● ceph-mon@pve1.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Mon 2021-09-20 12:17:08 CDT; 5min ago
 Main PID: 2608650 (ceph-mon)
    Tasks: 27
   Memory: 21.3M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve1.service
           └─2608650 /usr/bin/ceph-mon -f --cluster ceph --id pve1 --setuser ceph --setgroup ceph

Sep 20 12:17:08 pve1 systemd[1]: Started Ceph cluster monitor daemon.

On PVE2:

Code:

root@pve2:~# systemctl status ceph-mon@pve2.service
● ceph-mon@pve2.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: active (running) since Mon 2021-09-20 12:17:45 CDT; 5min ago
 Main PID: 1194228 (ceph-mon)
    Tasks: 27
   Memory: 94.6M
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@pve2.service
           └─1194228 /usr/bin/ceph-mon -f --cluster ceph --id pve2 --setuser ceph --setgroup ceph

Sep 20 12:21:12 pve2 ceph-mon[1194228]: 2021-09-20T12:21:12.294-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:27 pve2 ceph-mon[1194228]: 2021-09-20T12:21:27.310-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:42 pve2 ceph-mon[1194228]: 2021-09-20T12:21:42.329-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:21:57 pve2 ceph-mon[1194228]: 2021-09-20T12:21:57.345-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:12 pve2 ceph-mon[1194228]: 2021-09-20T12:22:12.361-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:27 pve2 ceph-mon[1194228]: 2021-09-20T12:22:27.377-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:42 pve2 ceph-mon[1194228]: 2021-09-20T12:22:42.393-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:22:57 pve2 ceph-mon[1194228]: 2021-09-20T12:22:57.408-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:23:12 pve2 ceph-mon[1194228]: 2021-09-20T12:23:12.424-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied
Sep 20 12:23:27 pve2 ceph-mon[1194228]: 2021-09-20T12:23:27.440-0400 7fee07ce0700 -1 mon.pve2@1(peon) e4 handle_auth_bad_method hmm, they didn't like 2 result (13) Permission denied

The files /var/log/ceph/ceph-mon.<nodename>.log in nodes PVE and PVE2 have data... but in PVE1 is empty.
The command journalctl -b0 -u ceph-mon@<nodename>.service:

Code:

root@pve1:/var/log/ceph# journalctl -b0 -u ceph-mon@pve1.service
-- Logs begin at Thu 2021-09-09 14:29:02 CDT, end at Mon 2021-09-20 12:26:01 CDT. --
Sep 16 13:05:16 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:16 pve1 ceph-mon[1660588]: global_init: error reading config file.
Sep 16 13:05:16 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:16 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 1.
Sep 16 13:05:26 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:26 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:26 pve1 ceph-mon[1660625]: global_init: error reading config file.
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:26 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 2.
Sep 16 13:05:37 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:37 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:37 pve1 ceph-mon[1660660]: global_init: error reading config file.
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:37 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 3.
Sep 16 13:05:47 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:47 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:47 pve1 ceph-mon[1660695]: global_init: error reading config file.
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:47 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 4.
Sep 16 13:05:57 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:05:57 pve1 systemd[1]: Started Ceph cluster monitor daemon.
Sep 16 13:05:57 pve1 ceph-mon[1660731]: global_init: error reading config file.
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Main process exited, code=exited, status=1/FAILURE
Sep 16 13:05:57 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Service RestartSec=10s expired, scheduling restart.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 5.
Sep 16 13:06:07 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Start request repeated too quickly.
Sep 16 13:06:07 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'exit-code'.
Sep 16 13:06:07 pve1 systemd[1]: Failed to start Ceph cluster monitor daemon.
Sep 20 12:17:08 pve1 systemd[1]: Started Ceph cluster monitor daemon.

It seems like the two monitors on pve and pve2 started complaining once the monitor on pve1 was started. Please check if the files /var/lib/ceph/mon/ceph-<nodename>/keyring has the same contents on each node (don't post the keys here!).

Armando Ramos Roche said:
Fabian_E, ty a lot for your support..
Everything seems to be better.
I only have this:
View attachment 29624

There are no OSDs shown. Please do the same you did for the monitors for the OSD services, i.e. systemctl status ceph-osd@<ID>.service and try starting them if they are simply stopped. Did you already try @RokaKen 's suggestion?

I'd go without the monitor on node pve1 for now if it causes problems (simply stop it), and try to get the OSDs up and running.

Armando Ramos Roche · Sep 21, 2021

Fabian_E, Good morning again,
You're right, the keys in /var/lib/ceph/mon/ceph-<nodename>/ keyring, they are different. pve and pve2 have the same, but pve1 does not.
The @RokaKen suggestion dont show.. said: This member limits who may view their full profile.

I have not created any OSD first I need the monitors working.

fiona · Sep 21, 2021

Armando Ramos Roche said:
Fabian_E, Good morning again,
You're right, the keys in /var/lib/ceph/mon/ceph-<nodename>/ keyring, they are different. pve and pve2 have the same, but pve1 does not.

Maybe the easiest is to stop the monitor on pve1 and try and try to re-create it (or manually copy the key and restart the service).

Armando Ramos Roche said:
The @RokaKen suggestion dont show.. said: This member limits who may view their full profile.

I meant the suggestion posted earlier in this thread:

RokaKen said:
You seem to be missing the stanza:

Code:

[osd] keyring = /var/lib/ceph/osd/ceph-$id/keyring

I recommend adding it and restarting Ceph services

EDIT: fix code blocks

Armando Ramos Roche said:
I have not created any OSD first I need the monitors working.

Armando Ramos Roche · Sep 21, 2021

Fabian_E:
Now all monitors node all up and mgr... but look all error, can you help me?

fiona · Sep 22, 2021

For CephFS, you need to create a metadata server.

Armando Ramos Roche · Sep 24, 2021

Hey Fabian_E.
I create yesterday the mds on 1 of my nodes... and today have the status of creating !!! It is normal?

Rafale83 · Sep 24, 2021

updating from nautilus to octupus is not easy, even following the documentation.

Ceph Error

Well-Known Member

Proxmox Staff Member

Well-Known Member

Attachments

Proxmox Staff Member

Well-Known Member

Active Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Member

We value your privacy