[SOLVED] All OSDs down on one node after cluster restart

Wazaari

New Member
Dec 28, 2023
19
6
3
We had to take down our cluster of 3 today due to a regular electrical equipment check. Its a cluster of three nodes with 4 OSDs each. After all nodes were started again, the OSDs on one node did not come back up, the OSDs on the other two servers are fine.

I'm a little bit lost on what to do here.. it looks like one disk on this node didn't survive the outage (SMART is FAILED) but that shouldn't cause all OSDs to fail. Any idea what to do, how to fix this issue? Thanks!

This is the current OSD tree:

Code:
root@proxmox01:~# ceph osd tree
ID  CLASS  WEIGHT    TYPE NAME           STATUS  REWEIGHT  PRI-AFF
-1         10.47949  root default                                 
-3          3.49316      host proxmox01                           
 0    ssd   0.87329          osd.0         down         0  1.00000
 1    ssd   0.87329          osd.1         down         0  1.00000
 2    ssd   0.87329          osd.2         down         0  1.00000
 3    ssd   0.87329          osd.3         down         0  1.00000
-5          3.49316      host proxmox02                           
 4    ssd   0.87329          osd.4           up   1.00000  1.00000
 5    ssd   0.87329          osd.5           up   1.00000  1.00000
 6    ssd   0.87329          osd.6           up   1.00000  1.00000
 7    ssd   0.87329          osd.7           up   1.00000  1.00000
-7          3.49316      host proxmox03                           
 8    ssd   0.87329          osd.8           up   1.00000  1.00000
 9    ssd   0.87329          osd.9           up   1.00000  1.00000
10    ssd   0.87329          osd.10          up   1.00000  1.00000
11    ssd   0.87329          osd.11          up   1.00000  1.00000

The logs for all the OSD daemons show various stack traces:

Code:
Dec 28 17:13:51 proxmox01 systemd[1]: Starting ceph-osd@1.service - Ceph object storage daemon osd.1...
Dec 28 17:13:51 proxmox01 systemd[1]: Started ceph-osd@1.service - Ceph object storage daemon osd.1.
Dec 28 17:13:53 proxmox01 ceph-osd[3095]: *** Caught signal (Bus error) **
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  in thread 7c518f8376c0 thread_name:ceph-osd
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x7c519025b050]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  2: /lib64/ld-linux-x86-64.so.2(+0x22f58) [0x7c51912b6f58]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  3: /lib64/ld-linux-x86-64.so.2(+0x8bd7) [0x7c519129cbd7]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  4: /lib64/ld-linux-x86-64.so.2(+0x90ca) [0x7c519129d0ca]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  5: /lib64/ld-linux-x86-64.so.2(+0x9a48) [0x7c519129da48]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  6: /lib64/ld-linux-x86-64.so.2(+0xe411) [0x7c51912a2411]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  7: /lib64/ld-linux-x86-64.so.2(+0xbc0a) [0x7c519129fc0a]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  8: _dl_catch_exception()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  9: /lib64/ld-linux-x86-64.so.2(+0xb1c6) [0x7c519129f1c6]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  10: _dl_catch_exception()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  11: /lib64/ld-linux-x86-64.so.2(+0xb5b8) [0x7c519129f5b8]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  12: /lib/x86_64-linux-gnu/libc.so.6(+0x85438) [0x7c51902a4438]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  13: _dl_catch_exception()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  14: _dl_catch_error()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  15: /lib/x86_64-linux-gnu/libc.so.6(+0x84f27) [0x7c51902a3f27]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  16: dlopen()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  17: (ceph::ErasureCodePluginRegistry::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::ErasureCodePlugin**, std::ostream*)+0x1f7) [0x633951fc21c7]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  18: (ceph::ErasureCodePluginRegistry::preload(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*)+0x9f) [0x633951fc293f]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  19: (global_init_preload_erasure_code(ceph::common::CephContext const*)+0x8a8) [0x633951607848]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  20: main()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  21: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7c519024624a]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  22: __libc_start_main()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  23: _start()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]: 2024-12-28T17:13:53.583+0100 7c518f8376c0 -1 *** Caught signal (Bus error) **
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  in thread 7c518f8376c0 thread_name:ceph-osd
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x7c519025b050]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  2: /lib64/ld-linux-x86-64.so.2(+0x22f58) [0x7c51912b6f58]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  3: /lib64/ld-linux-x86-64.so.2(+0x8bd7) [0x7c519129cbd7]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  4: /lib64/ld-linux-x86-64.so.2(+0x90ca) [0x7c519129d0ca]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  5: /lib64/ld-linux-x86-64.so.2(+0x9a48) [0x7c519129da48]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  6: /lib64/ld-linux-x86-64.so.2(+0xe411) [0x7c51912a2411]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  7: /lib64/ld-linux-x86-64.so.2(+0xbc0a) [0x7c519129fc0a]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  8: _dl_catch_exception()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  9: /lib64/ld-linux-x86-64.so.2(+0xb1c6) [0x7c519129f1c6]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  10: _dl_catch_exception()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  11: /lib64/ld-linux-x86-64.so.2(+0xb5b8) [0x7c519129f5b8]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  12: /lib/x86_64-linux-gnu/libc.so.6(+0x85438) [0x7c51902a4438]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  13: _dl_catch_exception()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  14: _dl_catch_error()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  15: /lib/x86_64-linux-gnu/libc.so.6(+0x84f27) [0x7c51902a3f27]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  16: dlopen()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  17: (ceph::ErasureCodePluginRegistry::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::ErasureCodePlugin**, std::ostream*)+0x1f7) [0x633951fc21c7]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  18: (ceph::ErasureCodePluginRegistry::preload(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*)+0x9f) [0x633951fc293f]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  19: (global_init_preload_erasure_code(ceph::common::CephContext const*)+0x8a8) [0x633951607848]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  20: main()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  21: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7c519024624a]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  22: __libc_start_main()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  23: _start()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:      0> 2024-12-28T17:13:53.583+0100 7c518f8376c0 -1 *** Caught signal (Bus error) **
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  in thread 7c518f8376c0 thread_name:ceph-osd
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x7c519025b050]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  2: /lib64/ld-linux-x86-64.so.2(+0x22f58) [0x7c51912b6f58]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  3: /lib64/ld-linux-x86-64.so.2(+0x8bd7) [0x7c519129cbd7]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  4: /lib64/ld-linux-x86-64.so.2(+0x90ca) [0x7c519129d0ca]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  5: /lib64/ld-linux-x86-64.so.2(+0x9a48) [0x7c519129da48]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  6: /lib64/ld-linux-x86-64.so.2(+0xe411) [0x7c51912a2411]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  7: /lib64/ld-linux-x86-64.so.2(+0xbc0a) [0x7c519129fc0a]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  8: _dl_catch_exception()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  9: /lib64/ld-linux-x86-64.so.2(+0xb1c6) [0x7c519129f1c6]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  10: _dl_catch_exception()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  11: /lib64/ld-linux-x86-64.so.2(+0xb5b8) [0x7c519129f5b8]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  12: /lib/x86_64-linux-gnu/libc.so.6(+0x85438) [0x7c51902a4438]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  13: _dl_catch_exception()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  14: _dl_catch_error()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  15: /lib/x86_64-linux-gnu/libc.so.6(+0x84f27) [0x7c51902a3f27]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  16: dlopen()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  17: (ceph::ErasureCodePluginRegistry::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::ErasureCodePlugin**, std::ostream*)+0x1f7) [0x633951fc21c7]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  18: (ceph::ErasureCodePluginRegistry::preload(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*)+0x9f) [0x633951fc293f]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  19: (global_init_preload_erasure_code(ceph::common::CephContext const*)+0x8a8) [0x633951607848]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  20: main()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  21: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7c519024624a]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  22: __libc_start_main()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  23: _start()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:      0> 2024-12-28T17:13:53.583+0100 7c518f8376c0 -1 *** Caught signal (Bus error) **
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  in thread 7c518f8376c0 thread_name:ceph-osd
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  ceph version 18.2.2 (e9fe820e7fffd1b7cde143a9f77653b73fcec748) reef (stable)
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  1: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x7c519025b050]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  2: /lib64/ld-linux-x86-64.so.2(+0x22f58) [0x7c51912b6f58]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  3: /lib64/ld-linux-x86-64.so.2(+0x8bd7) [0x7c519129cbd7]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  4: /lib64/ld-linux-x86-64.so.2(+0x90ca) [0x7c519129d0ca]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  5: /lib64/ld-linux-x86-64.so.2(+0x9a48) [0x7c519129da48]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  6: /lib64/ld-linux-x86-64.so.2(+0xe411) [0x7c51912a2411]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  7: /lib64/ld-linux-x86-64.so.2(+0xbc0a) [0x7c519129fc0a]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  8: _dl_catch_exception()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  9: /lib64/ld-linux-x86-64.so.2(+0xb1c6) [0x7c519129f1c6]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  10: _dl_catch_exception()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  11: /lib64/ld-linux-x86-64.so.2(+0xb5b8) [0x7c519129f5b8]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  12: /lib/x86_64-linux-gnu/libc.so.6(+0x85438) [0x7c51902a4438]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  13: _dl_catch_exception()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  14: _dl_catch_error()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  15: /lib/x86_64-linux-gnu/libc.so.6(+0x84f27) [0x7c51902a3f27]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  16: dlopen()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  17: (ceph::ErasureCodePluginRegistry::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::ErasureCodePlugin**, std::ostream*)+0x1f7) [0x633951fc21c7]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  18: (ceph::ErasureCodePluginRegistry::preload(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*)+0x9f) [0x633951fc293f]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  19: (global_init_preload_erasure_code(ceph::common::CephContext const*)+0x8a8) [0x633951607848]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  20: main()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  21: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7c519024624a]
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  22: __libc_start_main()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  23: _start()
Dec 28 17:13:53 proxmox01 ceph-osd[3095]:  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
Dec 28 17:13:53 proxmox01 systemd[1]: ceph-osd@1.service: Main process exited, code=killed, status=7/BUS
Dec 28 17:13:53 proxmox01 systemd[1]: ceph-osd@1.service: Failed with result 'signal'.

The entire log is attached.

Versions (pveversion -v):

Code:
pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-2-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.8-2
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5: 6.5.13-5
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
ceph: 18.2.2-pve1
ceph-fuse: 18.2.2-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
dnsmasq: 2.89-1
frr-pythontools: 8.5.2-1+pve1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 9.0.0-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1
 
I wanted to replace the broken disk today, however didn't end up with much progress. Basically followed this procedure:

- Set global flags norebalance, norecover and nobackfill
- Destroy OSD from UI (worked fine)
- Destroy LVM and Wipe Disk
- Physically Replace Disk
- Create new OSD

Getting stuck at this one because creating a new OSD fails. I've got a suspicion that something is wrong with this node.. Logs from Create OSD attached. What stands out to me:

Code:
 stderr: 2024-12-29T09:41:47.045+0100 7361904006c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory

Indeed, there is no such file:

Code:
root@proxmox01:~# ls /etc/pve/priv/
acme         authorized_keys  ceph.client.admin.keyring  ipam.db      lock     metricserver     pve-root-ca.srl  token.cfg
authkey.key  ceph             ceph.mon.keyring           known_hosts  macs.db  pve-root-ca.key  tfa.cfg

This one as well:

Code:
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 12 --monmap /var/lib/ceph/osd/ceph-12/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-12/ --osd-uuid 84fb842b-f678-4b51-accd-ac719496f993 --setuser ceph --setgroup ceph
 stderr: 2024-12-29T09:41:47.485+0100 78421a9ef6c0 -1 bluestore(/var/lib/ceph/osd/ceph-12//block) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
 stderr: 2024-12-29T09:41:47.487+0100 78421a9ef6c0 -1 bluestore(/var/lib/ceph/osd/ceph-12//block) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
 stderr: 2024-12-29T09:41:47.487+0100 78421a9ef6c0 -1 bluestore(/var/lib/ceph/osd/ceph-12//block) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
 stderr: 2024-12-29T09:41:47.487+0100 78421a9ef6c0 -1 bluestore(/var/lib/ceph/osd/ceph-12/) _read_fsid unparsable uuid

No idea what this means.... full log below.

Code:
create OSD on /dev/sda (bluestore)
wiping block device /dev/sda
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.615715 s, 341 MB/s
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 84fb842b-f678-4b51-accd-ac719496f993
Running command: vgcreate --force --yes ceph-03a70ec4-2d22-412f-ab25-d6d16ea04518 /dev/sda
 stdout: Physical volume "/dev/sda" successfully created.
 stdout: Volume group "ceph-03a70ec4-2d22-412f-ab25-d6d16ea04518" successfully created
Running command: lvcreate --yes -l 228928 -n osd-block-84fb842b-f678-4b51-accd-ac719496f993 ceph-03a70ec4-2d22-412f-ab25-d6d16ea04518
 stdout: Logical volume "osd-block-84fb842b-f678-4b51-accd-ac719496f993" created.
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-12
--> Executable selinuxenabled not in PATH: /sbin:/bin:/usr/sbin:/usr/bin
Running command: /bin/chown -h ceph:ceph /dev/ceph-03a70ec4-2d22-412f-ab25-d6d16ea04518/osd-block-84fb842b-f678-4b51-accd-ac719496f993
Running command: /bin/chown -R ceph:ceph /dev/dm-1
Running command: /bin/ln -s /dev/ceph-03a70ec4-2d22-412f-ab25-d6d16ea04518/osd-block-84fb842b-f678-4b51-accd-ac719496f993 /var/lib/ceph/osd/ceph-12/block
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /var/lib/ceph/osd/ceph-12/activate.monmap
 stderr: 2024-12-29T09:41:47.045+0100 7361904006c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2024-12-29T09:41:47.045+0100 7361904006c0 -1 AuthRegistry(0x736188063ec8) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
 stderr: got monmap epoch 3
--> Creating keyring file for osd.12
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-12/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-12/
Running command: /bin/ceph-osd --cluster ceph --osd-objectstore bluestore --mkfs -i 12 --monmap /var/lib/ceph/osd/ceph-12/activate.monmap --keyfile - --osd-data /var/lib/ceph/osd/ceph-12/ --osd-uuid 84fb842b-f678-4b51-accd-ac719496f993 --setuser ceph --setgroup ceph
 stderr: 2024-12-29T09:41:47.485+0100 78421a9ef6c0 -1 bluestore(/var/lib/ceph/osd/ceph-12//block) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
 stderr: 2024-12-29T09:41:47.487+0100 78421a9ef6c0 -1 bluestore(/var/lib/ceph/osd/ceph-12//block) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
 stderr: 2024-12-29T09:41:47.487+0100 78421a9ef6c0 -1 bluestore(/var/lib/ceph/osd/ceph-12//block) _read_bdev_label unable to decode label at offset 102: void bluestore_bdev_label_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) decode past end of struct encoding: Malformed input [buffer:3]
 stderr: 2024-12-29T09:41:47.487+0100 78421a9ef6c0 -1 bluestore(/var/lib/ceph/osd/ceph-12/) _read_fsid unparsable uuid
--> ceph-volume lvm prepare successful for: /dev/sda
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-12
Running command: /bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-03a70ec4-2d22-412f-ab25-d6d16ea04518/osd-block-84fb842b-f678-4b51-accd-ac719496f993 --path /var/lib/ceph/osd/ceph-12 --no-mon-config
Running command: /bin/ln -snf /dev/ceph-03a70ec4-2d22-412f-ab25-d6d16ea04518/osd-block-84fb842b-f678-4b51-accd-ac719496f993 /var/lib/ceph/osd/ceph-12/block
Running command: /bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-12/block
Running command: /bin/chown -R ceph:ceph /dev/dm-1
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-12
Running command: /bin/systemctl enable ceph-volume@lvm-12-84fb842b-f678-4b51-accd-ac719496f993
 stderr: Created symlink /etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-12-84fb842b-f678-4b51-accd-ac719496f993.service -> /lib/systemd/system/ceph-volume@.service.
Running command: /bin/systemctl enable --runtime ceph-osd@12
 stderr: Created symlink /run/systemd/system/ceph-osd.target.wants/ceph-osd@12.service -> /lib/systemd/system/ceph-osd@.service.
Running command: /bin/systemctl start ceph-osd@12
--> ceph-volume lvm activate successful for osd ID: 12
--> ceph-volume lvm create successful for: /dev/sda
TASK OK
 
hi, i have in my notes this fix for your error:

Code:
 stderr: 2024-12-29T09:41:47.045+0100 7361904006c0 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory

ceph auth get client.bootstrap-osd > /etc/pve/priv/ceph.client.bootstrap-osd.keyring
ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring
 
Hi, thanks for your response! Turned out it wasn't only the OSD that broke, also the system disk broke down and threw I/O errors. It somehow wasn't broken enough to prevent the host from booting, but any and all operations on the host itself failed afterwards. Essentially I fully removed the entire node from CEPH and the proxmox cluster, re-installed Proxmox, re-added the host to the cluster and recreated all OSDs. After the Ceph rebalance, everything is back to green now.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!