Ceph: Import OSD to a new node

fcarucci

New Member
May 13, 2023
26
10
3
Hello,

I'm trying to import two OSDs from a brand new node.
I can see the OSDs in the interface, but when I start the OSD, I get an error:

Code:
Job for ceph-osd@2.service failed because the control process exited with error code.
See "systemctl status ceph-osd@2.service" and "journalctl -xeu ceph-osd@2.service" for details.
TASK ERROR: command '/bin/systemctl start ceph-osd@2' failed: exit code 1

systemctl status ceph-osd@2.service shows me this:
Code:
ceph-osd@2.service - Ceph object storage daemon osd.2
     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; disabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
             └─ceph-after-pve-cluster.conf
     Active: failed (Result: exit-code) since Wed 2024-03-13 21:17:05 PDT; 3min 24s ago
    Process: 2878 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUST>
        CPU: 13ms


Mar 13 21:17:04 pve systemd[1]: Failed to start ceph-osd@2.service - Ceph object storag>
Mar 13 21:17:05 pve systemd[1]: ceph-osd@2.service: Start request repeated too quickly.
Mar 13 21:17:05 pve systemd[1]: ceph-osd@2.service: Failed with result 'exit-code'.
Mar 13 21:17:05 pve systemd[1]: Failed to start ceph-osd@2.service - Ceph object storag>
Mar 13 21:18:25 pve systemd[1]: ceph-osd@2.service: Start request repeated too quickly.
Mar 13 21:18:25 pve systemd[1]: ceph-osd@2.service: Failed with result 'exit-code'.
Mar 13 21:18:25 pve systemd[1]: Failed to start ceph-osd@2.service - Ceph object stora

journalctl -xeu ceph-osd@2.service also has no useful information.

This is my ceph.conf
Code:
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.0.20.4/24
         err_to_syslog = true
         fsid = 5ce42d57-4371-475a-94fb-eac8acefe72e
         mon_allow_pool_delete = true
         mon_allow_pool_size_one = false
#        mon_cluster_log_file_level = info
#        mon_cluster_log_to_file = false
         mon_host = 10.0.20.3 10.0.20.4 10.0.20.1
         ms_bind_ipv4 = true
         ms_bind_ipv6 = false
         osd_deep_scrub_interval = 1209600
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         osd_scrub_begin_hour = 23
         osd_scrub_end_hour = 7
         osd_scrub_sleep = 0.1
         public_network = 10.0.20.4/24


[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring


[mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring


[mds.pve-ceph1-1]
         host = pve-ceph1
         mds_standby_for_name = pve


[mds.pve-ceph1-2]
         host = pve-ceph1
         mds_standby_for_name = pve


[mds.pve-ceph1-3]
         host = pve-ceph1
         mds_standby_for_name = pve


[mds.pve2-1]
         host = pve2
         mds_standby_for_name = pve


[mds.pve3-1]
         host = pve3
         mds_standby_for_name = pve


[mon.pve]
         public_addr = 10.0.20.1


[mon.pve-ceph1]
#        debug_mon = 0/5
         public_addr = 10.0.20.4


[mon.pve3]
#        debug_mon = 0/5
         public_addr = 10.0.20.3

ceph-volume inventory /dev/sdb

Code:
====== Device report /dev/sdb ======


     path                      /dev/sdb
     ceph device               True
     lsm data                  {}
     available                 False
     rejected reasons          LVM detected, Has a FileSystem, Insufficient space (<10 extents) on vgs
     device id                 CT4000MX500SSD1_2339E879E476
     removable                 0
     ro                        0
     vendor                    ATA
     model                     CT4000MX500SSD1
     sas address               
     rotational                0
     actuators                 None
     scheduler mode            mq-deadline
     human readable size       3.64 TB
    --- Logical Volume ---
     name                      osd-block-f1e4ea87-d73e-47ca-8648-c6373110f6ea
     osd id                    2
     cluster name              ceph
     type                      block
     osd fsid                  f1e4ea87-d73e-47ca-8648-c6373110f6ea
     cluster fsid              5ce42d57-4371-475a-94fb-eac8acefe72e
     osdspec affinity         
     block uuid                Ub2tvB-R9Hu-bQHW-92Aa-Hcyq-QP7P-dwZzi0


Where can I find any log that tells me why the OSDs are not starting? What can I possibly be doing wrong? Thanks!
 
Last edited:
Here's my notes for my moving OSD between the hosts that a did a couple of years ago, you might try the lvm activate command and see if that helps...

Code:
On the old host

- Stop OSD
- Mark it OUT

lvchange -a n $VG/$LV
vgexport $VG

Move the disk
lsblk
vgscan
vgimport $VG
vgchange -a y $VG
ceph-volume lvm activate --all
 
Thanks! Unfortunately after I run Ceph-volume lvm activate --all, OSDs crash with this error:
Code:
2024-03-14T07:39:12.914-0700 7c935b4ed6c0 4 rocksdb: (Original Log Time 2024/03/14-07:39:12.916046) EVENT_LOG_v1 {"time_micros": 1710427152916040, "job": 5, "event": "compaction_finished", "compaction_time_micros": 2265438, "compaction_time_cpu_micros": 459968, "output_level": 1, "num_output_files": 3, "total_output_size": 197791593, "num_input_records": 1591371, "num_output_records": 392106, "num_subcompactions": 1, "output_compression": "NoCompression", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [12, 10, 0, 0, 0, 0, 0]}
2024-03-14T07:39:12.914-0700 7c935b4ed6c0 2 rocksdb: [db/db_impl/db_impl_compaction_flush.cc:2986] Waiting after background compaction error: Corruption: block checksum mismatch: stored = 1378299632, computed = 2199829759, type = 4 in db/013715.sst offset 2123208 size 3933, Accumulated background error counts: 2
2024-03-14T07:39:13.234-0700 7c9368faf6c0 0 osd.2 5954 load_pgs
2024-03-14T07:39:13.242-0700 7c93554d16c0 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: stored = 2023635806, computed = 1513301452, type = 4 in db/013726.sst offset 46498093 size 4132 code = Rocksdb transaction:
PutCF( prefix = O key = 0x7F800000000000002EF0000000'!!='0xFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFF6F value size = 35)
PutCF( prefix = S key = 'nid_max' value size = 8)
PutCF( prefix = S key = 'blobid_max' value size = 8)
2024-03-14T07:39:13.246-0700 7c93554d16c0 -1 ./src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_apply_kv(TransContext*, bool)' thread 7c93554d16c0 time 2024-03-14T07:39:13.246969-0700
./src/os/bluestore/BlueStore.cc: 12887: FAILED ceph_assert(r == 0)
ceph version 18.2.1 (850293cdaae6621945e1191aa8c28ea2918269c3) reef (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12a) [0x58481f4098c3]
2: /usr/bin/ceph-osd(+0x61aa5e) [0x58481f409a5e]
3: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x3f5) [0x58481fa0de75]
4: (BlueStore::_kv_sync_thread()+0xed3) [0x58481fa758e3]
5: (BlueStore::KVSyncThread::entry()+0xd) [0x58481faa0f0d]
6: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7c9369bfa134]
7: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7c9369c7a7dc]