[SOLVED] Ceph osd won't start

TheGeka

Member
Dec 2, 2020
3
1
8
26
Hey,

one of my osd's wont start anymore
already tried the bitmap allocator still the same error

Crash dump:
Code:
2022-04-07T23:35:39.127+0200 7f3f1403af00  1 bluefs _allocate unable to allocate 0x400000 on bdev 1, allocator name block, allocator type bitmap, capacity 0x3a38800000, block size 0x1000, free 0x5e0658000, fragmentation 1, allocated 0x320000
    -2> 2022-04-07T23:35:39.127+0200 7f3f1403af00 -1 bluefs _allocate allocation failed, needed 0x400000
    -1> 2022-04-07T23:35:39.143+0200 7f3f1403af00 -1 ./src/os/bluestore/BlueFS.cc: In function 'void BlueFS::_compact_log_async(std::unique_lock<std::mutex>&)' thread 7f3f1403af00 time 2022-04-07T23:35:39.134059+0200
./src/os/bluestore/BlueFS.cc: 2352: FAILED ceph_assert(r == 0)

 ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x124) [0x555844da0ae6]
 2: /usr/bin/ceph-osd(+0xabcc71) [0x555844da0c71]
 3: (BlueFS::_compact_log_async(std::unique_lock<std::mutex>&)+0x1a13) [0x55584549b243]
 4: (BlueFS::_flush(BlueFS::FileWriter*, bool, std::unique_lock<std::mutex>&)+0x67) [0x55584549b497]
 5: (BlueRocksWritableFile::Append(rocksdb::Slice const&)+0x100) [0x5558454b37d0]
 6: (rocksdb::LegacyWritableFileWrapper::Append(rocksdb::Slice const&, rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x48) [0x55584597a24e]
 7: (rocksdb::WritableFileWriter::WriteBuffered(char const*, unsigned long)+0x338) [0x555845b54d18]
 8: (rocksdb::WritableFileWriter::Append(rocksdb::Slice const&)+0x5d7) [0x555845b5329b]
 9: (rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const&, rocksdb::CompressionType, rocksdb::BlockHandle*, bool)+0x11d) [0x555845d1d2d7]
 10: (rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::Slice const&, rocksdb::BlockHandle*, bool)+0x7d0) [0x555845d1d0be]
 11: (rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::BlockBuilder*, rocksdb::BlockHandle*, bool)+0x48) [0x555845d1c8da]
 12: (rocksdb::BlockBasedTableBuilder::Flush()+0x9a) [0x555845d1c88a]
 13: (rocksdb::BlockBasedTableBuilder::Add(rocksdb::Slice const&, rocksdb::Slice const&)+0x197) [0x555845d1c3bf]
 14: (rocksdb::BuildTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::Env*, rocksdb::FileSystem*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::FileOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase<rocksdb::Slice>*, std::vector<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> >, std::allocator<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> > > >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> >, std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, unsigned long, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned long, rocksdb::Env::WriteLifeTimeHint, unsigned long)+0x782) [0x555845c9f732]
 15: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0x5ea) [0x555845a18226]
 16: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool, bool*)+0x1ad1) [0x555845a16e9d]
 17: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool, unsigned long*)+0x159e) [0x555845a143d4]
 18: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool)+0x677) [0x555845a196cd]
 19: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0x52) [0x555845a18aa4]
 20: (RocksDBStore::do_open(std::ostream&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x10a6) [0x5558459298b6]
 21: (BlueStore::_open_db(bool, bool, bool)+0xa19) [0x5558453a7b19]
 22: (BlueStore::_open_db_and_around(bool, bool)+0x332) [0x5558453ecb92]
 23: (BlueStore::_mount()+0x191) [0x5558453ef531]
 24: (OSD::init()+0x58d) [0x555844e965ed]
 25: main()
 26: __libc_start_main()
 27: _start()

     0> 2022-04-07T23:35:39.159+0200 7f3f1403af00 -1 *** Caught signal (Aborted) **
 in thread 7f3f1403af00 thread_name:ceph-osd

 ceph version 16.2.7 (f9aa029788115b5df5eeee328f584156565ee5b7) pacific (stable)
 1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7f3f14692140]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x16e) [0x555844da0b30]
 5: /usr/bin/ceph-osd(+0xabcc71) [0x555844da0c71]
 6: (BlueFS::_compact_log_async(std::unique_lock<std::mutex>&)+0x1a13) [0x55584549b243]
 7: (BlueFS::_flush(BlueFS::FileWriter*, bool, std::unique_lock<std::mutex>&)+0x67) [0x55584549b497]
 8: (BlueRocksWritableFile::Append(rocksdb::Slice const&)+0x100) [0x5558454b37d0]
 9: (rocksdb::LegacyWritableFileWrapper::Append(rocksdb::Slice const&, rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x48) [0x55584597a24e]
 10: (rocksdb::WritableFileWriter::WriteBuffered(char const*, unsigned long)+0x338) [0x555845b54d18]
 11: (rocksdb::WritableFileWriter::Append(rocksdb::Slice const&)+0x5d7) [0x555845b5329b]
 12: (rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const&, rocksdb::CompressionType, rocksdb::BlockHandle*, bool)+0x11d) [0x555845d1d2d7]
 13: (rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::Slice const&, rocksdb::BlockHandle*, bool)+0x7d0) [0x555845d1d0be]
 14: (rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::BlockBuilder*, rocksdb::BlockHandle*, bool)+0x48) [0x555845d1c8da]
 15: (rocksdb::BlockBasedTableBuilder::Flush()+0x9a) [0x555845d1c88a]
 16: (rocksdb::BlockBasedTableBuilder::Add(rocksdb::Slice const&, rocksdb::Slice const&)+0x197) [0x555845d1c3bf]
 17: (rocksdb::BuildTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::Env*, rocksdb::FileSystem*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::FileOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase<rocksdb::Slice>*, std::vector<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> >, std::allocator<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> > > >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> >, std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, unsigned long, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned long, rocksdb::Env::WriteLifeTimeHint, unsigned long)+0x782) [0x555845c9f732]
 18: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0x5ea) [0x555845a18226]
 19: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool, bool*)+0x1ad1) [0x555845a16e9d]
 20: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool, unsigned long*)+0x159e) [0x555845a143d4]
 21: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool)+0x677) [0x555845a196cd]
 22: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0x52) [0x555845a18aa4]
 23: (RocksDBStore::do_open(std::ostream&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x10a6) [0x5558459298b6]
 24: (BlueStore::_open_db(bool, bool, bool)+0xa19) [0x5558453a7b19]
 25: (BlueStore::_open_db_and_around(bool, bool)+0x332) [0x5558453ecb92]
 26: (BlueStore::_mount()+0x191) [0x5558453ef531]
 27: (OSD::init()+0x58d) [0x555844e965ed]
 28: main()
 29: __libc_start_main()
 30: _start()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Smartctl output:
Code:
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       43515
12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       403
177 Wear_Leveling_Count     0x0013   001   001   000    Pre-fail  Always       -       3445
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   100   100   010    Pre-fail  Always       -       0
181 Program_Fail_Cnt_Total  0x0032   100   100   010    Old_age   Always       -       0
182 Erase_Fail_Count_Total  0x0032   100   100   010    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0
187 Uncorrectable_Error_Cnt 0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0032   072   038   000    Old_age   Always       -       28
195 ECC_Error_Rate          0x001a   200   200   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
235 POR_Recovery_Count      0x0012   099   099   000    Old_age   Always       -       147
241 Total_LBAs_Written      0x0032   099   099   000    Old_age   Always       -       218298828484

SMART Error Log Version: 1
No Errors Logged
 

Attachments

Last edited:
Edit: Should have probably attached the Crashdump without people having to download the log file

Journalctl only shows the crashdump at also shows up at the end of the log.

Don't want to recreate the osd just yet.
Already look through the troubleshooting doc, also searched the issue tracker and the only similar issue I found was https://tracker.ceph.com/issues/50656 but the mentioned workaround bluestore_allocator = bitmap and bluefs_allocator = bitmap runs into the same error
 
In case someone in the future has the same problem with an osd being to full to boot, I was able to recover the data by copying the underlying disk to a bigger disk.
  1. Code:
    dd if=source/drive of=dest/drive bs=32M status=progress
  2. reboot the node with the source disk detached.
  3. Resize the lvm using
    Code:
    pvresize dest/drive 
    lvextend -l +100%FREE (LV Path taken from output of "lvdisplay")
  4. Resize BlueFs storage using
    Code:
    ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-<nubmer of osd>
  5. Start the osd
    Code:
    systemctl restart ceph-osd@<osd Number>
 
  • Like
Reactions: mpopgun

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!