ceph issue after upgrade

akanarya · Jul 1, 2022

Hi,
I have 3 nodes.
Today I updated 1 node: pve-kernel 6.4.-15 to 6.4-18, pve-manager 6.4-14 to 6.4.-15.
There was no ceph update at this time.
I checked the ceph to validate the status before update other servers.
Ceph seems working but the updated node failed the its own monitor and manager.

Here some logs:
Jul 01 16:24:39 vs5 ceph-mon[120912]: 2022-07-01T16:24:39.113+0300 7ff3dca6e5c0 -1 monitor data directory at '/var/lib/ceph/mon/ceph-vs5' does not exist: have you run 'mkfs'?
Jul 01 17:17:33 vs5 ceph-mgr[159946]: 2022-07-01T17:17:33.363+0300 7f167ed000c0 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-vs5/keyring: (2) No such file or directory
Jul 01 17:17:33 vs5 ceph-mgr[159946]: 2022-07-01T17:17:33.363+0300 7f167ed000c0 -1 AuthRegistry(0x55e39602c140) no keyring found at /var/lib/ceph/mgr/ceph-vs5/keyring, disabling cephx

root@vs5:~# systemctl status ceph-mon*
● ceph-mon@vs5.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: failed (Result: exit-code) since Fri 2022-07-01 16:24:49 +03; 1h 31min ago
Process: 120912 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id vs5 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 120912 (code=exited, status=1/FAILURE)

Jul 01 16:24:39 vs5 systemd[1]: ceph-mon@vs5.service: Failed with result 'exit-code'.
Jul 01 16:24:49 vs5 systemd[1]: ceph-mon@vs5.service: Service RestartSec=10s expired, scheduling restart.
Jul 01 16:24:49 vs5 systemd[1]: ceph-mon@vs5.service: Scheduled restart job, restart counter is at 5.
Jul 01 16:24:49 vs5 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 01 16:24:49 vs5 systemd[1]: ceph-mon@vs5.service: Start request repeated too quickly.
Jul 01 16:24:49 vs5 systemd[1]: ceph-mon@vs5.service: Failed with result 'exit-code'.
Jul 01 16:24:49 vs5 systemd[1]: Failed to start Ceph cluster monitor daemon.

● ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
Loaded: loaded (/lib/systemd/system/ceph-mon.target; enabled; vendor preset: enabled)
Active: active since Fri 2022-07-01 11:48:56 +03; 6h ago

Jul 01 11:48:56 vs5 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mon@.service instances at once.

I checked that following folders are empty at the updated node (vs5):
/var/lib/ceph/mon/
/var/lib/ceph/mgr/
there are no "ceph-vs5" folders inside these folders.

Since they are empty, pve manager doesnt allow me to destroy the monitor of that node, so that I could recreate again.

I searched the issue at forums but I am confused and dont want to mess up.
Any help is appreciated.
Thank you
Ali

akanarya · Jul 2, 2022

OK
Luckily, I found an old backup and copied "ceph-vs5" folders of "/var/lib/ceph/mon/" & "/var/lib/ceph/mgr/" to the server, rebooted the server.
After that manager worked but monitor not worked.
Becuse there is its folder inside /lib/mon I could destroy the monitor an recreate it again.
Now problem resolved.

But what if I couldnt have a backup, how could I resolve it?
any suggesitons?

Search

Search

ceph issue after upgrade

akanarya

Member

akanarya

Member

We value your privacy