Ceph ghost OSDs

0920799u

Member
Oct 2, 2018
3
0
21
Paris, France
Hi all,

After an upgrade, Proxmox would not start and I had to reinstall it completely.
I made a backup of the config but presumably missed something : ceph.mon keeps crashing and 4 OSDs appear as ghosts (out/down).

# journalctl -b -u ceph-mon@atlas.service

Jun 04 13:26:21 atlas ceph-mon[19539]: 0> 2022-06-04T13:26:21.167+0200 7f5e5b172700 -1 *** Caught signal (Aborted) **
Jun 04 13:26:21 atlas ceph-mon[19539]: in thread 7f5e5b172700 thread_name:ms_dispatch
Jun 04 13:26:21 atlas ceph-mon[19539]: ceph version 15.2.16 (a6b69e817d6c9e6f02d0a7ac3043ba9cdbda1bdf) octopus (stable)
Jun 04 13:26:21 atlas ceph-mon[19539]: 1: (()+0x14140) [0x7f5e63e54140]
Jun 04 13:26:21 atlas ceph-mon[19539]: 2: (gsignal()+0x141) [0x7f5e63973ce1]
Jun 04 13:26:21 atlas ceph-mon[19539]: 3: (abort()+0x123) [0x7f5e6395d537]
Jun 04 13:26:21 atlas ceph-mon[19539]: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x17b) [0x7f5e6438a701]
Jun 04 13:26:21 atlas ceph-mon[19539]: 5: (()+0x252842) [0x7f5e6438a842]
Jun 04 13:26:21 atlas ceph-mon[19539]: 6: (OSDTreeFormattingDumper::dump_item_fields(CrushTreeDumper::Item const&, ceph::Formatter*)+0x24a) [0x7f5e647cb28a]
Jun 04 13:26:21 atlas ceph-mon[19539]: 7: (OSDMap::print_tree(ceph::Formatter*, std::ostream*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) const+0x2af) [0x7f5e647aab7f]
Jun 04 13:26:21 atlas ceph-mon[19539]: 8: (OSDMonitor::preprocess_command(boost::intrusive_ptr<MonOpRequest>)+0xf34) [0x5622bbadd3c4]
Jun 04 13:26:21 atlas ceph-mon[19539]: 9: (OSDMonitor::preprocess_query(boost::intrusive_ptr<MonOpRequest>)+0x1ac) [0x5622bbb1cccc]
Jun 04 13:26:21 atlas ceph-mon[19539]: 10: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x254) [0x5622bba9d694]
Jun 04 13:26:21 atlas ceph-mon[19539]: 11: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x22f6) [0x5622bb996ab6]
Jun 04 13:26:21 atlas ceph-mon[19539]: 12: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x779) [0x5622bb99a5a9]
Jun 04 13:26:21 atlas ceph-mon[19539]: 13: (Monitor::_ms_dispatch(Message*)+0x410) [0x5622bb99b5e0]
Jun 04 13:26:21 atlas ceph-mon[19539]: 14: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x59) [0x5622bb9c9b49]
Jun 04 13:26:21 atlas ceph-mon[19539]: 15: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x468) [0x7f5e645a6d68]
Jun 04 13:26:21 atlas ceph-mon[19539]: 16: (DispatchQueue::entry()+0x5ef) [0x7f5e645a446f]
Jun 04 13:26:21 atlas ceph-mon[19539]: 17: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f5e646501fd]
Jun 04 13:26:21 atlas ceph-mon[19539]: 18: (()+0x8ea7) [0x7f5e63e48ea7]
Jun 04 13:26:21 atlas ceph-mon[19539]: 19: (clone()+0x3f) [0x7f5e63a35def]
Jun 04 13:26:21 atlas ceph-mon[19539]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
# ceph osd tree

ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 16.41600 root default
-3 16.41600 host atlas
0 hdd 3.63899 osd.0 down 0 1.00000
1 hdd 3.63899 osd.1 down 0 1.00000
2 hdd 3.63899 osd.2 down 0 1.00000
3 hdd 3.63899 osd.3 down 0 1.00000
4 ssd 0.46500 osd.4 DNE 0
5 ssd 0.46500 osd.5 DNE 0
6 ssd 0.46500 osd.6 DNE 0
7 ssd 0.46500 osd.7 DNE 0
# cat /etc/pve/ceph.conf
[global]
auth_client_required = none
auth_cluster_required = none
auth_service_required = none
cluster_network = 192.168.7.2/24
fsid = d7552c89-a9f4-404c-985b-f0b1421c26c4
mon_allow_pool_delete = true
mon_host = 192.168.7.2
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 192.168.7.2/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.atlas]
host = atlas
mds standby for name = pve

proxmox version : 7.2-3
ceph version : 15.2.16

Any help appreciated !
 
How many monitors do you have?

Also in my ceph.conf there are for example the monitors in the config file:

Code:
...
[mon.<HOSTNAME>]
     public_addr = 192.168.18.13
...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!