Unable to start OSD - crashes while loading pgs

Feb 7, 2019
4
1
23
43
Hi

I did a few things to our (Proxmox) Ceph cluster:
  • Added an additional node with a three more hdd OSD's (yielding a 3 node cluster with 3 hdds each)
  • Increased pg_num and pgp_num for one of the pools (from 128 to 256 with size 3 and min_size 1)
I shouldn't have fiddled with pg_num and pgp_num before it had rebalanced - this caused requests to be stuck blocked. Followed advice in https://forum.proxmox.com/threads/a...m-recovery-going-very-slow.41451/#post-199521 and

  • Set mon_max_pg_per_osd = 1000 to resolve issue with blocked requests

This got I/O going again.

But now I have one (large hdd) OSD that will not start - it crashes while loading pgs - see attached log file - excerpt below:
Code:
2019-08-02 10:08:21.021207 7fea86d7be00  0 osd.1 1844 load_pgs
2019-08-02 10:08:39.370112 7fea86d7be00 -1 *** Caught signal (Aborted) **
 in thread 7fea86d7be00 thread_name:ceph-osd

 ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable)
 1: (()+0xa59c94) [0x55b835a6dc94]
 2: (()+0x110e0) [0x7fea843800e0]
 3: (gsignal()+0xcf) [0x7fea83347fff]
 4: (abort()+0x16a) [0x7fea8334942a]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7fea83c600ad]
 6: (()+0x8f066) [0x7fea83c5e066]
 7: (()+0x8f0b1) [0x7fea83c5e0b1]
 8: (()+0x8f2c9) [0x7fea83c5e2c9]
 9: (pg_log_entry_t::decode_with_checksum(ceph::buffer::list::iterator&)+0x156) [0x55b8356f57c6]
 10: (void PGLog::read_log_and_missing<pg_missing_set<true> >(ObjectStore*, coll_t, coll_t, ghobject_t, pg_info_t const&, PGLog::IndexedLog&, pg_missing_set<true>&, bool, std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >&, bool, bool*, DoutPrefixProvider const*, std::set<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >*, bool)+0x1ab4) [0x55b8355a6584]
 11: (PG::read_state(ObjectStore*, ceph::buffer::list&)+0x38b) [0x55b83554b7eb]
 12: (OSD::load_pgs()+0x8b8) [0x55b835496678]
 13: (OSD::init()+0x2237) [0x55b8354b75c7]
 14: (main()+0x3092) [0x55b8353bf1c2]
 15: (__libc_start_main()+0xf1) [0x7fea833352e1]
 16: (_start()+0x2a) [0x55b83544b8ca]

Any ideas?

Best regards,
Jesper
 

Attachments

  • ceph-osd.1.log
    156.9 KB · Views: 1
  • Increased pg_num and pgp_num for one of the pools (from 128 to 256 with size 3 and min_size 1)
min_size 1 is not a good idea, especially in a small cluster. This is the minimum amount of replicas needed to still provide write operations. If any subsequent failure arises, while there is only one replica, this replica may be lost as well.

Set mon_max_pg_per_osd = 1000 to resolve issue with blocked requests
This seems a very high amount of PGs per OSD, what does ceph osd df tree show?

But now I have one (large hdd) OSD that will not start - it crashes while loading pgs - see attached log file - excerpt below:
I suppose the big amount of PGs will need extra memory, that the OSD might not be able to allocate. Also you can try to use the bluestore-tool and try to repair the OSD (offline). Else, if it is only this one OSD, then you could just re-create it.
 
min_size 1 is not a good idea, especially in a small cluster. This is the minimum amount of replicas needed to still provide write operations. If any subsequent failure arises, while there is only one replica, this replica may be lost as well.


This seems a very high amount of PGs per OSD, what does ceph osd df tree show?


I suppose the big amount of PGs will need extra memory, that the OSD might not be able to allocate. Also you can try to use the bluestore-tool and try to repair the OSD (offline). Else, if it is only this one OSD, then you could just re-create it.

I realized - via ceph osd lspools - that all of the PGs affected by the crashed OSD were associated with one pool (aptly named lucky1 as it had size=1 and min_size=1).

Since I could re-create this data from a backup, I ended up destroying the affected pool, all affected RBDs, and the affected OSD, re-creating the OSD -- and the pool sometime in the future (without ever going with size=1 ever again).

I will try to avoid min_size=1 as well.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!