[SOLVED] CEPH MON fail after upgrade

Oct 17, 2008
96
6
73
48
Netherlands
Hi,

on my test-cluster i upgraded all my nodes from 7.0 to 7.1
CEPH went to pacific 16.2.7 (was 16.2.5?)

Now the monitors and managers won't start.

I had a pool and cephFS configured with MDS.

I've read somewhere that a pool in combination with an old cephFS (i came from PVE6) it could happen and a downgrade would be the way to go.


Any idea howto downgrade / what's the road ahead?
 
Last edited:
Hello,

Same problem here. After upgrade CEPH jammed. Monitors and managers won't start anymore.

Proxmox 7.0 to 7.1
CEPH pacific 16.2.5 to 16.2.7

Code:
root@pve1:/etc/pve# systemctl status ceph\*.service ceph\*.target
● ceph-mon@pve1.service - Ceph cluster monitor daemon
     Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
             └─ceph-after-pve-cluster.conf
     Active: failed (Result: signal) since Thu 2021-12-16 06:36:35 EET; 15min ago
   Main PID: 1391 (code=killed, signal=ABRT)
        CPU: 61ms

Dec 16 06:36:35 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 5.
Dec 16 06:36:35 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Dec 16 06:36:35 pve1 systemd[1]: ceph-mon@pve1.service: Start request repeated too quickly.
Dec 16 06:36:35 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'signal'.
Dec 16 06:36:35 pve1 systemd[1]: Failed to start Ceph cluster monitor daemon.

● ceph.target - ceph target allowing to start/stop all ceph*@.service instances at once
     Loaded: loaded (/lib/systemd/system/ceph.target; enabled; vendor preset: enabled)
     Active: active since Thu 2021-12-16 06:35:44 EET; 16min ago

Dec 16 06:35:44 pve1 systemd[1]: Reached target ceph target allowing to start/stop all ceph*@.service instances at once.

● ceph-mds@pve1.service - Ceph metadata server daemon
     Loaded: loaded (/lib/systemd/system/ceph-mds@.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-mds@.service.d
             └─ceph-after-pve-cluster.conf
     Active: active (running) since Thu 2021-12-16 06:35:44 EET; 16min ago
   Main PID: 1007 (ceph-mds)
      Tasks: 8
     Memory: 15.6M
        CPU: 470ms
     CGroup: /system.slice/system-ceph\x2dmds.slice/ceph-mds@pve1.service
             └─1007 /usr/bin/ceph-mds -f --cluster ceph --id pve1 --setuser ceph --setgroup ceph

Dec 16 06:35:44 pve1 systemd[1]: Started Ceph metadata server daemon.

● ceph-osd@0.service - Ceph object storage daemon osd.0
     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
             └─ceph-after-pve-cluster.conf
     Active: active (running) since Thu 2021-12-16 06:35:44 EET; 16min ago
   Main PID: 1040 (ceph-osd)
      Tasks: 8
     Memory: 35.5M
        CPU: 473ms
     CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
             └─1040 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph

Dec 16 06:35:44 pve1 systemd[1]: Starting Ceph object storage daemon osd.0...
Dec 16 06:35:44 pve1 systemd[1]: Started Ceph object storage daemon osd.0.

● ceph-mgr@pve1.service - Ceph cluster manager daemon
     Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-mgr@.service.d
             └─ceph-after-pve-cluster.conf
     Active: active (running) since Thu 2021-12-16 06:35:44 EET; 16min ago
   Main PID: 1009 (ceph-mgr)
      Tasks: 8 (limit: 18985)
     Memory: 15.0M
        CPU: 439ms
     CGroup: /system.slice/system-ceph\x2dmgr.slice/ceph-mgr@pve1.service
             └─1009 /usr/bin/ceph-mgr -f --cluster ceph --id pve1 --setuser ceph --setgroup ceph

Dec 16 06:35:44 pve1 systemd[1]: Started Ceph cluster manager daemon.

● ceph-mgr.target - ceph target allowing to start/stop all ceph-mgr@.service instances at once
     Loaded: loaded (/lib/systemd/system/ceph-mgr.target; enabled; vendor preset: enabled)
     Active: active since Thu 2021-12-16 06:35:44 EET; 16min ago

Dec 16 06:35:44 pve1 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mgr@.service instances at once.

● ceph-fuse.target - ceph target allowing to start/stop all ceph-fuse@.service instances at once
     Loaded: loaded (/lib/systemd/system/ceph-fuse.target; enabled; vendor preset: enabled)
     Active: active since Thu 2021-12-16 06:35:41 EET; 16min ago

Warning: journal has been rotated since unit was started, output may be incomplete.

● ceph-crash.service - Ceph crash dump collector
     Loaded: loaded (/lib/systemd/system/ceph-crash.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2021-12-16 06:35:42 EET; 16min ago
   Main PID: 558 (ceph-crash)
      Tasks: 11 (limit: 18985)
     Memory: 37.1M
        CPU: 8.346s
     CGroup: /system.slice/ceph-crash.service
             ├─ 558 /usr/bin/python3.9 /usr/bin/ceph-crash
             ├─5531 timeout 30 ceph -n client.admin crash post -i -
             └─5532 /usr/bin/python3.9 /usr/bin/ceph -n client.admin crash post -i -

Dec 16 06:50:17 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T04:04:37.552042Z_261f1545-41f8-468a-a326-0af076dac0c1 as client.crash failed: b'2021-12-16T06:50:17.524+0200 7f3b0464a700 -1 auth: unable to find a keyring on />
Dec 16 06:50:47 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T04:04:37.552042Z_261f1545-41f8-468a-a326-0af076dac0c1 as client.admin failed: b''
Dec 16 06:50:47 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T02:58:56.716461Z_d5aee8bc-c3ff-4588-9f86-a858feba9f98 as client.crash.pve1 failed: b'2021-12-16T06:50:47.612+0200 7fb4639bb700 -1 auth: unable to find a keyring>
Dec 16 06:50:47 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T02:58:56.716461Z_d5aee8bc-c3ff-4588-9f86-a858feba9f98 as client.crash failed: b'2021-12-16T06:50:47.688+0200 7f2c77a2c700 -1 auth: unable to find a keyring on />
Dec 16 06:51:17 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T02:58:56.716461Z_d5aee8bc-c3ff-4588-9f86-a858feba9f98 as client.admin failed: b''
Dec 16 06:51:17 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T04:04:16.862386Z_e846de2b-be44-4f78-a41b-42ea9772fba8 as client.crash.pve1 failed: b'2021-12-16T06:51:17.769+0200 7f3e133a7700 -1 auth: unable to find a keyring>
Dec 16 06:51:17 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T04:04:16.862386Z_e846de2b-be44-4f78-a41b-42ea9772fba8 as client.crash failed: b'2021-12-16T06:51:17.845+0200 7fc3b28a0700 -1 auth: unable to find a keyring on />
Dec 16 06:51:47 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T04:04:16.862386Z_e846de2b-be44-4f78-a41b-42ea9772fba8 as client.admin failed: b''
Dec 16 06:51:47 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T00:37:51.938165Z_79a59b2a-51c1-43bb-abf0-6f6f82bcd539 as client.crash.pve1 failed: b'2021-12-16T06:51:47.925+0200 7fdc1d717700 -1 auth: unable to find a keyring>
Dec 16 06:51:48 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T00:37:51.938165Z_79a59b2a-51c1-43bb-abf0-6f6f82bcd539 as client.crash failed: b'2021-12-16T06:51:48.001+0200 7f670a281700 -1 auth: unable to find a keyring on />

● ceph-osd.target - ceph target allowing to start/stop all ceph-osd@.service instances at once
     Loaded: loaded (/lib/systemd/system/ceph-osd.target; enabled; vendor preset: enabled)
     Active: active since Thu 2021-12-16 06:35:44 EET; 16min ago

Dec 16 06:35:44 pve1 systemd[1]: Reached target ceph target allowing to start/stop all ceph-osd@.service instances at once.

● ceph-mds.target - ceph target allowing to start/stop all ceph-mds@.service instances at once
     Loaded: loaded (/lib/systemd/system/ceph-mds.target; enabled; vendor preset: enabled)
     Active: active since Thu 2021-12-16 06:35:44 EET; 16min ago

Dec 16 06:35:44 pve1 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mds@.service instances at once.

● ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
     Loaded: loaded (/lib/systemd/system/ceph-mon.target; enabled; vendor preset: enabled)
     Active: active since Thu 2021-12-16 06:35:44 EET; 16min ago

Dec 16 06:35:44 pve1 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mon@.service instances at once.
root@pve1:/etc/pve#

Code:
root@pve1:/etc/pve# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve)
pve-manager: 7.1-8 (running version: 7.1-8/5b267f33)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.11: 7.0-10
pve-kernel-5.0: 6.0-11
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 16.2.7
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-2
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
root@pve1:/etc/pve#
 
Last edited: