Hi,
I have 3 nodes.
Today I updated 1 node: pve-kernel 6.4.-15 to 6.4-18, pve-manager 6.4-14 to 6.4.-15.
There was no ceph update at this time.
I checked the ceph to validate the status before update other servers.
Ceph seems working but the updated node failed the its own monitor and manager.
Here some logs:
Jul 01 16:24:39 vs5 ceph-mon[120912]: 2022-07-01T16:24:39.113+0300 7ff3dca6e5c0 -1 monitor data directory at '/var/lib/ceph/mon/ceph-vs5' does not exist: have you run 'mkfs'?
Jul 01 17:17:33 vs5 ceph-mgr[159946]: 2022-07-01T17:17:33.363+0300 7f167ed000c0 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-vs5/keyring: (2) No such file or directory
Jul 01 17:17:33 vs5 ceph-mgr[159946]: 2022-07-01T17:17:33.363+0300 7f167ed000c0 -1 AuthRegistry(0x55e39602c140) no keyring found at /var/lib/ceph/mgr/ceph-vs5/keyring, disabling cephx
root@vs5:~# systemctl status ceph-mon*
● ceph-mon@vs5.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: failed (Result: exit-code) since Fri 2022-07-01 16:24:49 +03; 1h 31min ago
Process: 120912 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id vs5 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 120912 (code=exited, status=1/FAILURE)
Jul 01 16:24:39 vs5 systemd[1]: ceph-mon@vs5.service: Failed with result 'exit-code'.
Jul 01 16:24:49 vs5 systemd[1]: ceph-mon@vs5.service: Service RestartSec=10s expired, scheduling restart.
Jul 01 16:24:49 vs5 systemd[1]: ceph-mon@vs5.service: Scheduled restart job, restart counter is at 5.
Jul 01 16:24:49 vs5 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 01 16:24:49 vs5 systemd[1]: ceph-mon@vs5.service: Start request repeated too quickly.
Jul 01 16:24:49 vs5 systemd[1]: ceph-mon@vs5.service: Failed with result 'exit-code'.
Jul 01 16:24:49 vs5 systemd[1]: Failed to start Ceph cluster monitor daemon.
● ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
Loaded: loaded (/lib/systemd/system/ceph-mon.target; enabled; vendor preset: enabled)
Active: active since Fri 2022-07-01 11:48:56 +03; 6h ago
Jul 01 11:48:56 vs5 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mon@.service instances at once.
I checked that following folders are empty at the updated node (vs5):
/var/lib/ceph/mon/
/var/lib/ceph/mgr/
there are no "ceph-vs5" folders inside these folders.
Since they are empty, pve manager doesnt allow me to destroy the monitor of that node, so that I could recreate again.
I searched the issue at forums but I am confused and dont want to mess up.
Any help is appreciated.
Thank you
Ali
I have 3 nodes.
Today I updated 1 node: pve-kernel 6.4.-15 to 6.4-18, pve-manager 6.4-14 to 6.4.-15.
There was no ceph update at this time.
I checked the ceph to validate the status before update other servers.
Ceph seems working but the updated node failed the its own monitor and manager.
Here some logs:
Jul 01 16:24:39 vs5 ceph-mon[120912]: 2022-07-01T16:24:39.113+0300 7ff3dca6e5c0 -1 monitor data directory at '/var/lib/ceph/mon/ceph-vs5' does not exist: have you run 'mkfs'?
Jul 01 17:17:33 vs5 ceph-mgr[159946]: 2022-07-01T17:17:33.363+0300 7f167ed000c0 -1 auth: unable to find a keyring on /var/lib/ceph/mgr/ceph-vs5/keyring: (2) No such file or directory
Jul 01 17:17:33 vs5 ceph-mgr[159946]: 2022-07-01T17:17:33.363+0300 7f167ed000c0 -1 AuthRegistry(0x55e39602c140) no keyring found at /var/lib/ceph/mgr/ceph-vs5/keyring, disabling cephx
root@vs5:~# systemctl status ceph-mon*
● ceph-mon@vs5.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
└─ceph-after-pve-cluster.conf
Active: failed (Result: exit-code) since Fri 2022-07-01 16:24:49 +03; 1h 31min ago
Process: 120912 ExecStart=/usr/bin/ceph-mon -f --cluster ${CLUSTER} --id vs5 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE)
Main PID: 120912 (code=exited, status=1/FAILURE)
Jul 01 16:24:39 vs5 systemd[1]: ceph-mon@vs5.service: Failed with result 'exit-code'.
Jul 01 16:24:49 vs5 systemd[1]: ceph-mon@vs5.service: Service RestartSec=10s expired, scheduling restart.
Jul 01 16:24:49 vs5 systemd[1]: ceph-mon@vs5.service: Scheduled restart job, restart counter is at 5.
Jul 01 16:24:49 vs5 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 01 16:24:49 vs5 systemd[1]: ceph-mon@vs5.service: Start request repeated too quickly.
Jul 01 16:24:49 vs5 systemd[1]: ceph-mon@vs5.service: Failed with result 'exit-code'.
Jul 01 16:24:49 vs5 systemd[1]: Failed to start Ceph cluster monitor daemon.
● ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
Loaded: loaded (/lib/systemd/system/ceph-mon.target; enabled; vendor preset: enabled)
Active: active since Fri 2022-07-01 11:48:56 +03; 6h ago
Jul 01 11:48:56 vs5 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mon@.service instances at once.
I checked that following folders are empty at the updated node (vs5):
/var/lib/ceph/mon/
/var/lib/ceph/mgr/
there are no "ceph-vs5" folders inside these folders.
Since they are empty, pve manager doesnt allow me to destroy the monitor of that node, so that I could recreate again.
I searched the issue at forums but I am confused and dont want to mess up.
Any help is appreciated.
Thank you
Ali