[SOLVED] ceph mon not starting - unable to read magic

Oct 13, 2020
42
2
28
44
Hello Forum,

since a crash of the root disk of the virtualization host of one cluster here, we experience issues with the ceph mon service on that node:
Jul 20 15:11:29 delbgpm01 ceph-mon[208969]: 2021-07-20T15:11:29.712+0200 7fd4dbde25c0 -1 unable to read magic from mon data
Jul 20 15:11:29 delbgpm01 systemd[1]: ceph-mon@delbgpm01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 15:11:29 delbgpm01 systemd[1]: ceph-mon@delbgpm01.service: Failed with result 'exit-code'.
Jul 20 15:11:39 delbgpm01 systemd[1]: ceph-mon@delbgpm01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 15:11:39 delbgpm01 systemd[1]: ceph-mon@delbgpm01.service: Scheduled restart job, restart counter is at 14.
Jul 20 15:11:39 delbgpm01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 15:11:39 delbgpm01 systemd[1]: ceph-mon@delbgpm01.service: Start request repeated too quickly.
Jul 20 15:11:39 delbgpm01 systemd[1]: ceph-mon@delbgpm01.service: Failed with result 'exit-code'.
Jul 20 15:11:39 delbgpm01 systemd[1]: Failed to start Ceph cluster monitor daemon.
We are running the following versions:
pveversion --verbose
proxmox-ve: 6.4-1 (running kernel: 5.4.124-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-4
pve-kernel-helper: 6.4-4
pve-kernel-5.4.124-1-pve: 5.4.124-1
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 15.2.13-pve1~bpo10
ceph-fuse: 15.2.13-pve1~bpo10
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.10-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1
Do you have any idea how to bring back the "magic" ? ;-)

Thank you and best regards,

Nico
 
I suggest you verify the node's ceph.conf is a symlink to the cluster version and that the keyring and rdbmap exist like:
Code:
# ls -lh /etc/ceph/
total 12K
-rw------- 1 ceph ceph  159 <date> ceph.client.admin.keyring
lrwxrwxrwx 1 root root   18 <date> ceph.conf -> /etc/pve/ceph.conf
drwxr-xr-x 2 root root 4.0K <date> osd
-rw-r--r-- 1 root root   92 <date> rbdmap

If not, the keyring and map can be copied from another node in the cluster. The existence of /etc/pve/ceph.conf depends on the pve-cluster.service and pmxcfs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!