Ceph: Import OSD to a new node

fcarucci

New Member
May 13, 2023
22
9
3
Hello,

I'm trying to import two OSDs from a brand new node.
I can see the OSDs in the interface, but when I start the OSD, I get an error:

Code:
Job for ceph-osd@2.service failed because the control process exited with error code.
See "systemctl status ceph-osd@2.service" and "journalctl -xeu ceph-osd@2.service" for details.
TASK ERROR: command '/bin/systemctl start ceph-osd@2' failed: exit code 1

systemctl status ceph-osd@2.service shows me this:
Code:
ceph-osd@2.service - Ceph object storage daemon osd.2
     Loaded: loaded (/lib/systemd/system/ceph-osd@.service; disabled; preset: enabled)
    Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d
             └─ceph-after-pve-cluster.conf
     Active: failed (Result: exit-code) since Wed 2024-03-13 21:17:05 PDT; 3min 24s ago
    Process: 2878 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUST>
        CPU: 13ms


Mar 13 21:17:04 pve systemd[1]: Failed to start ceph-osd@2.service - Ceph object storag>
Mar 13 21:17:05 pve systemd[1]: ceph-osd@2.service: Start request repeated too quickly.
Mar 13 21:17:05 pve systemd[1]: ceph-osd@2.service: Failed with result 'exit-code'.
Mar 13 21:17:05 pve systemd[1]: Failed to start ceph-osd@2.service - Ceph object storag>
Mar 13 21:18:25 pve systemd[1]: ceph-osd@2.service: Start request repeated too quickly.
Mar 13 21:18:25 pve systemd[1]: ceph-osd@2.service: Failed with result 'exit-code'.
Mar 13 21:18:25 pve systemd[1]: Failed to start ceph-osd@2.service - Ceph object stora

journalctl -xeu ceph-osd@2.service also has no useful information.

This is my ceph.conf
Code:
[global]
         auth_client_required = cephx
         auth_cluster_required = cephx
         auth_service_required = cephx
         cluster_network = 10.0.20.4/24
         err_to_syslog = true
         fsid = 5ce42d57-4371-475a-94fb-eac8acefe72e
         mon_allow_pool_delete = true
         mon_allow_pool_size_one = false
#        mon_cluster_log_file_level = info
#        mon_cluster_log_to_file = false
         mon_host = 10.0.20.3 10.0.20.4 10.0.20.1
         ms_bind_ipv4 = true
         ms_bind_ipv6 = false
         osd_deep_scrub_interval = 1209600
         osd_pool_default_min_size = 2
         osd_pool_default_size = 3
         osd_scrub_begin_hour = 23
         osd_scrub_end_hour = 7
         osd_scrub_sleep = 0.1
         public_network = 10.0.20.4/24


[client]
         keyring = /etc/pve/priv/$cluster.$name.keyring


[mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring


[mds.pve-ceph1-1]
         host = pve-ceph1
         mds_standby_for_name = pve


[mds.pve-ceph1-2]
         host = pve-ceph1
         mds_standby_for_name = pve


[mds.pve-ceph1-3]
         host = pve-ceph1
         mds_standby_for_name = pve


[mds.pve2-1]
         host = pve2
         mds_standby_for_name = pve


[mds.pve3-1]
         host = pve3
         mds_standby_for_name = pve


[mon.pve]
         public_addr = 10.0.20.1


[mon.pve-ceph1]
#        debug_mon = 0/5
         public_addr = 10.0.20.4


[mon.pve3]
#        debug_mon = 0/5
         public_addr = 10.0.20.3

ceph-volume inventory /dev/sdb

Code:
====== Device report /dev/sdb ======


     path                      /dev/sdb
     ceph device               True
     lsm data                  {}
     available                 False
     rejected reasons          LVM detected, Has a FileSystem, Insufficient space (<10 extents) on vgs
     device id                 CT4000MX500SSD1_2339E879E476
     removable                 0
     ro                        0
     vendor                    ATA
     model                     CT4000MX500SSD1
     sas address               
     rotational                0
     actuators                 None
     scheduler mode            mq-deadline
     human readable size       3.64 TB
    --- Logical Volume ---
     name                      osd-block-f1e4ea87-d73e-47ca-8648-c6373110f6ea
     osd id                    2
     cluster name              ceph
     type                      block
     osd fsid                  f1e4ea87-d73e-47ca-8648-c6373110f6ea
     cluster fsid              5ce42d57-4371-475a-94fb-eac8acefe72e
     osdspec affinity         
     block uuid                Ub2tvB-R9Hu-bQHW-92Aa-Hcyq-QP7P-dwZzi0


Where can I find any log that tells me why the OSDs are not starting? What can I possibly be doing wrong? Thanks!
 
Last edited:
Here's my notes for my moving OSD between the hosts that a did a couple of years ago, you might try the lvm activate command and see if that helps...

Code:
On the old host

- Stop OSD
- Mark it OUT

lvchange -a n $VG/$LV
vgexport $VG

Move the disk
lsblk
vgscan
vgimport $VG
vgchange -a y $VG
ceph-volume lvm activate --all
 
Thanks! Unfortunately after I run Ceph-volume lvm activate --all, OSDs crash with this error:
Code:
2024-03-14T07:39:12.914-0700 7c935b4ed6c0 4 rocksdb: (Original Log Time 2024/03/14-07:39:12.916046) EVENT_LOG_v1 {"time_micros": 1710427152916040, "job": 5, "event": "compaction_finished", "compaction_time_micros": 2265438, "compaction_time_cpu_micros": 459968, "output_level": 1, "num_output_files": 3, "total_output_size": 197791593, "num_input_records": 1591371, "num_output_records": 392106, "num_subcompactions": 1, "output_compression": "NoCompression", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [12, 10, 0, 0, 0, 0, 0]}
2024-03-14T07:39:12.914-0700 7c935b4ed6c0 2 rocksdb: [db/db_impl/db_impl_compaction_flush.cc:2986] Waiting after background compaction error: Corruption: block checksum mismatch: stored = 1378299632, computed = 2199829759, type = 4 in db/013715.sst offset 2123208 size 3933, Accumulated background error counts: 2
2024-03-14T07:39:13.234-0700 7c9368faf6c0 0 osd.2 5954 load_pgs
2024-03-14T07:39:13.242-0700 7c93554d16c0 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: stored = 2023635806, computed = 1513301452, type = 4 in db/013726.sst offset 46498093 size 4132 code = Rocksdb transaction:
PutCF( prefix = O key = 0x7F800000000000002EF0000000'!!='0xFFFFFFFFFFFFFFFEFFFFFFFFFFFFFFFF6F value size = 35)
PutCF( prefix = S key = 'nid_max' value size = 8)
PutCF( prefix = S key = 'blobid_max' value size = 8)
2024-03-14T07:39:13.246-0700 7c93554d16c0 -1 ./src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_apply_kv(TransContext*, bool)' thread 7c93554d16c0 time 2024-03-14T07:39:13.246969-0700
./src/os/bluestore/BlueStore.cc: 12887: FAILED ceph_assert(r == 0)
ceph version 18.2.1 (850293cdaae6621945e1191aa8c28ea2918269c3) reef (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12a) [0x58481f4098c3]
2: /usr/bin/ceph-osd(+0x61aa5e) [0x58481f409a5e]
3: (BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x3f5) [0x58481fa0de75]
4: (BlueStore::_kv_sync_thread()+0xed3) [0x58481fa758e3]
5: (BlueStore::KVSyncThread::entry()+0xd) [0x58481faa0f0d]
6: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7c9369bfa134]
7: /lib/x86_64-linux-gnu/libc.so.6(+0x1097dc) [0x7c9369c7a7dc]
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!