[SOLVED] CEPH MON fail after upgrade

godfather007

Renowned Member
Oct 17, 2008
91
4
73
48
Netherlands
Hi,

on my test-cluster i upgraded all my nodes from 7.0 to 7.1
CEPH went to pacific 16.2.7 (was 16.2.5?)

Now the monitors and managers won't start.

I had a pool and cephFS configured with MDS.

I've read somewhere that a pool in combination with an old cephFS (i came from PVE6) it could happen and a downgrade would be the way to go.


Any idea howto downgrade / what's the road ahead?
 
Last edited:
Hello,

Same problem here. After upgrade CEPH jammed. Monitors and managers won't start anymore.

Proxmox 7.0 to 7.1
CEPH pacific 16.2.5 to 16.2.7

Code:
root@pve1:/etc/pve# systemctl status ceph\*.service ceph\*.target
● ceph-mon@pve1.service - Ceph cluster monitor daemon
     Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
             └─ceph-after-pve-cluster.conf
     Active: failed (Result: signal) since Thu 2021-12-16 06:36:35 EET; 15min ago
   Main PID: 1391 (code=killed, signal=ABRT)
        CPU: 61ms

Dec 16 06:36:35 pve1 systemd[1]: ceph-mon@pve1.service: Scheduled restart job, restart counter is at 5.
Dec 16 06:36:35 pve1 systemd[1]: Stopped Ceph cluster monitor daemon.
Dec 16 06:36:35 pve1 systemd[1]: ceph-mon@pve1.service: Start request repeated too quickly.
Dec 16 06:36:35 pve1 systemd[1]: ceph-mon@pve1.service: Failed with result 'signal'.
Dec 16 06:36:35 pve1 systemd[1]: Failed to start Ceph cluster monitor daemon.

● ceph.target - ceph target allowing to start/stop all ceph*@.service instances at once
     Loaded: loaded (/lib/systemd/system/ceph.target; enabled; vendor preset: enabled)
     Active: active since Thu 2021-12-16 06:35:44 EET; 16min ago

Dec 16 06:35:44 pve1 systemd[1]: Reached target ceph target allowing to start/stop all ceph*@.service instances at once.

● ceph-mds@pve1.service - Ceph metadata server daemon
     Loaded: loaded (/lib/systemd/system/ceph-mds@.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-mds@.service.d
             └─ceph-after-pve-cluster.conf
     Active: active (running) since Thu 2021-12-16 06:35:44 EET; 16min ago
   Main PID: 1007 (ceph-mds)
      Tasks: 8
     Memory: 15.6M
        CPU: 470ms
     CGroup: /system.slice/system-ceph\x2dmds.slice/ceph-mds@pve1.service
             └─1007 /usr/bin/ceph-mds -f --cluster ceph --id pve1 --setuser ceph --setgroup ceph

Dec 16 06:35:44 pve1 systemd[1]: Started Ceph metadata server daemon.

● ceph-osd@0.service - Ceph object storage daemon osd.0
     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
             └─ceph-after-pve-cluster.conf
     Active: active (running) since Thu 2021-12-16 06:35:44 EET; 16min ago
   Main PID: 1040 (ceph-osd)
      Tasks: 8
     Memory: 35.5M
        CPU: 473ms
     CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd@0.service
             └─1040 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph

Dec 16 06:35:44 pve1 systemd[1]: Starting Ceph object storage daemon osd.0...
Dec 16 06:35:44 pve1 systemd[1]: Started Ceph object storage daemon osd.0.

● ceph-mgr@pve1.service - Ceph cluster manager daemon
     Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; enabled; vendor preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-mgr@.service.d
             └─ceph-after-pve-cluster.conf
     Active: active (running) since Thu 2021-12-16 06:35:44 EET; 16min ago
   Main PID: 1009 (ceph-mgr)
      Tasks: 8 (limit: 18985)
     Memory: 15.0M
        CPU: 439ms
     CGroup: /system.slice/system-ceph\x2dmgr.slice/ceph-mgr@pve1.service
             └─1009 /usr/bin/ceph-mgr -f --cluster ceph --id pve1 --setuser ceph --setgroup ceph

Dec 16 06:35:44 pve1 systemd[1]: Started Ceph cluster manager daemon.

● ceph-mgr.target - ceph target allowing to start/stop all ceph-mgr@.service instances at once
     Loaded: loaded (/lib/systemd/system/ceph-mgr.target; enabled; vendor preset: enabled)
     Active: active since Thu 2021-12-16 06:35:44 EET; 16min ago

Dec 16 06:35:44 pve1 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mgr@.service instances at once.

● ceph-fuse.target - ceph target allowing to start/stop all ceph-fuse@.service instances at once
     Loaded: loaded (/lib/systemd/system/ceph-fuse.target; enabled; vendor preset: enabled)
     Active: active since Thu 2021-12-16 06:35:41 EET; 16min ago

Warning: journal has been rotated since unit was started, output may be incomplete.

● ceph-crash.service - Ceph crash dump collector
     Loaded: loaded (/lib/systemd/system/ceph-crash.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2021-12-16 06:35:42 EET; 16min ago
   Main PID: 558 (ceph-crash)
      Tasks: 11 (limit: 18985)
     Memory: 37.1M
        CPU: 8.346s
     CGroup: /system.slice/ceph-crash.service
             ├─ 558 /usr/bin/python3.9 /usr/bin/ceph-crash
             ├─5531 timeout 30 ceph -n client.admin crash post -i -
             └─5532 /usr/bin/python3.9 /usr/bin/ceph -n client.admin crash post -i -

Dec 16 06:50:17 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T04:04:37.552042Z_261f1545-41f8-468a-a326-0af076dac0c1 as client.crash failed: b'2021-12-16T06:50:17.524+0200 7f3b0464a700 -1 auth: unable to find a keyring on />
Dec 16 06:50:47 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T04:04:37.552042Z_261f1545-41f8-468a-a326-0af076dac0c1 as client.admin failed: b''
Dec 16 06:50:47 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T02:58:56.716461Z_d5aee8bc-c3ff-4588-9f86-a858feba9f98 as client.crash.pve1 failed: b'2021-12-16T06:50:47.612+0200 7fb4639bb700 -1 auth: unable to find a keyring>
Dec 16 06:50:47 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T02:58:56.716461Z_d5aee8bc-c3ff-4588-9f86-a858feba9f98 as client.crash failed: b'2021-12-16T06:50:47.688+0200 7f2c77a2c700 -1 auth: unable to find a keyring on />
Dec 16 06:51:17 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T02:58:56.716461Z_d5aee8bc-c3ff-4588-9f86-a858feba9f98 as client.admin failed: b''
Dec 16 06:51:17 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T04:04:16.862386Z_e846de2b-be44-4f78-a41b-42ea9772fba8 as client.crash.pve1 failed: b'2021-12-16T06:51:17.769+0200 7f3e133a7700 -1 auth: unable to find a keyring>
Dec 16 06:51:17 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T04:04:16.862386Z_e846de2b-be44-4f78-a41b-42ea9772fba8 as client.crash failed: b'2021-12-16T06:51:17.845+0200 7fc3b28a0700 -1 auth: unable to find a keyring on />
Dec 16 06:51:47 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T04:04:16.862386Z_e846de2b-be44-4f78-a41b-42ea9772fba8 as client.admin failed: b''
Dec 16 06:51:47 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T00:37:51.938165Z_79a59b2a-51c1-43bb-abf0-6f6f82bcd539 as client.crash.pve1 failed: b'2021-12-16T06:51:47.925+0200 7fdc1d717700 -1 auth: unable to find a keyring>
Dec 16 06:51:48 pve1 ceph-crash[558]: WARNING:ceph-crash:post /var/lib/ceph/crash/2021-12-16T00:37:51.938165Z_79a59b2a-51c1-43bb-abf0-6f6f82bcd539 as client.crash failed: b'2021-12-16T06:51:48.001+0200 7f670a281700 -1 auth: unable to find a keyring on />

● ceph-osd.target - ceph target allowing to start/stop all ceph-osd@.service instances at once
     Loaded: loaded (/lib/systemd/system/ceph-osd.target; enabled; vendor preset: enabled)
     Active: active since Thu 2021-12-16 06:35:44 EET; 16min ago

Dec 16 06:35:44 pve1 systemd[1]: Reached target ceph target allowing to start/stop all ceph-osd@.service instances at once.

● ceph-mds.target - ceph target allowing to start/stop all ceph-mds@.service instances at once
     Loaded: loaded (/lib/systemd/system/ceph-mds.target; enabled; vendor preset: enabled)
     Active: active since Thu 2021-12-16 06:35:44 EET; 16min ago

Dec 16 06:35:44 pve1 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mds@.service instances at once.

● ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
     Loaded: loaded (/lib/systemd/system/ceph-mon.target; enabled; vendor preset: enabled)
     Active: active since Thu 2021-12-16 06:35:44 EET; 16min ago

Dec 16 06:35:44 pve1 systemd[1]: Reached target ceph target allowing to start/stop all ceph-mon@.service instances at once.
root@pve1:/etc/pve#

Code:
root@pve1:/etc/pve# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-2-pve)
pve-manager: 7.1-8 (running version: 7.1-8/5b267f33)
pve-kernel-helper: 7.1-6
pve-kernel-5.13: 7.1-5
pve-kernel-5.11: 7.0-10
pve-kernel-5.0: 6.0-11
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph: 16.2.7
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-storage-perl: 7.0-15
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-2
pve-container: 4.1-3
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
root@pve1:/etc/pve#
 
Last edited:
Ahum, sorry .. 2 of 4 machines have upgrade 16.2.7 OSD's.
The other 2 are still on 16.2.5 and have the upgrade symbol.

Anything to do or just wait?
 
Restarting all nodes, monitors and managers doesn't do the trick.

Systemctl restart osd-ceph.target does not fix it either.

Now I'm seeing osd's going offline.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!