[SOLVED] ceph mon not starting - unable to read magic

hoffmn01

Well-Known Member
Oct 13, 2020
42
3
48
45
Hello Forum,

since a crash of the root disk of the virtualization host of one cluster here, we experience issues with the ceph mon service on that node:
Jul 20 15:11:29 delbgpm01 ceph-mon[208969]: 2021-07-20T15:11:29.712+0200 7fd4dbde25c0 -1 unable to read magic from mon data
Jul 20 15:11:29 delbgpm01 systemd[1]: ceph-mon@delbgpm01.service: Main process exited, code=exited, status=1/FAILURE
Jul 20 15:11:29 delbgpm01 systemd[1]: ceph-mon@delbgpm01.service: Failed with result 'exit-code'.
Jul 20 15:11:39 delbgpm01 systemd[1]: ceph-mon@delbgpm01.service: Service RestartSec=10s expired, scheduling restart.
Jul 20 15:11:39 delbgpm01 systemd[1]: ceph-mon@delbgpm01.service: Scheduled restart job, restart counter is at 14.
Jul 20 15:11:39 delbgpm01 systemd[1]: Stopped Ceph cluster monitor daemon.
Jul 20 15:11:39 delbgpm01 systemd[1]: ceph-mon@delbgpm01.service: Start request repeated too quickly.
Jul 20 15:11:39 delbgpm01 systemd[1]: ceph-mon@delbgpm01.service: Failed with result 'exit-code'.
Jul 20 15:11:39 delbgpm01 systemd[1]: Failed to start Ceph cluster monitor daemon.
We are running the following versions:
pveversion --verbose
proxmox-ve: 6.4-1 (running kernel: 5.4.124-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-4
pve-kernel-helper: 6.4-4
pve-kernel-5.4.124-1-pve: 5.4.124-1
pve-kernel-5.4.119-1-pve: 5.4.119-1
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 15.2.13-pve1~bpo10
ceph-fuse: 15.2.13-pve1~bpo10
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.10-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1
Do you have any idea how to bring back the "magic" ? ;-)

Thank you and best regards,

Nico
 
I suggest you verify the node's ceph.conf is a symlink to the cluster version and that the keyring and rdbmap exist like:
Code:
# ls -lh /etc/ceph/
total 12K
-rw------- 1 ceph ceph  159 <date> ceph.client.admin.keyring
lrwxrwxrwx 1 root root   18 <date> ceph.conf -> /etc/pve/ceph.conf
drwxr-xr-x 2 root root 4.0K <date> osd
-rw-r--r-- 1 root root   92 <date> rbdmap

If not, the keyring and map can be copied from another node in the cluster. The existence of /etc/pve/ceph.conf depends on the pve-cluster.service and pmxcfs.