Hi all,
After an upgrade, Proxmox would not start and I had to reinstall it completely.
I made a backup of the config but presumably missed something : ceph.mon keeps crashing and 4 OSDs appear as ghosts (out/down).
proxmox version : 7.2-3
ceph version : 15.2.16
Any help appreciated !
After an upgrade, Proxmox would not start and I had to reinstall it completely.
I made a backup of the config but presumably missed something : ceph.mon keeps crashing and 4 OSDs appear as ghosts (out/down).
# journalctl -b -u ceph-mon@atlas.service
Jun 04 13:26:21 atlas ceph-mon[19539]: 0> 2022-06-04T13:26:21.167+0200 7f5e5b172700 -1 *** Caught signal (Aborted) **
Jun 04 13:26:21 atlas ceph-mon[19539]: in thread 7f5e5b172700 thread_name:ms_dispatch
Jun 04 13:26:21 atlas ceph-mon[19539]: ceph version 15.2.16 (a6b69e817d6c9e6f02d0a7ac3043ba9cdbda1bdf) octopus (stable)
Jun 04 13:26:21 atlas ceph-mon[19539]: 1: (()+0x14140) [0x7f5e63e54140]
Jun 04 13:26:21 atlas ceph-mon[19539]: 2: (gsignal()+0x141) [0x7f5e63973ce1]
Jun 04 13:26:21 atlas ceph-mon[19539]: 3: (abort()+0x123) [0x7f5e6395d537]
Jun 04 13:26:21 atlas ceph-mon[19539]: 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x17b) [0x7f5e6438a701]
Jun 04 13:26:21 atlas ceph-mon[19539]: 5: (()+0x252842) [0x7f5e6438a842]
Jun 04 13:26:21 atlas ceph-mon[19539]: 6: (OSDTreeFormattingDumper::dump_item_fields(CrushTreeDumper::Item const&, ceph::Formatter*)+0x24a) [0x7f5e647cb28a]
Jun 04 13:26:21 atlas ceph-mon[19539]: 7: (OSDMap:rint_tree(ceph::Formatter*, std:stream*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) const+0x2af) [0x7f5e647aab7f]
Jun 04 13:26:21 atlas ceph-mon[19539]: 8: (OSDMonitor:reprocess_command(boost::intrusive_ptr<MonOpRequest>)+0xf34) [0x5622bbadd3c4]
Jun 04 13:26:21 atlas ceph-mon[19539]: 9: (OSDMonitor:reprocess_query(boost::intrusive_ptr<MonOpRequest>)+0x1ac) [0x5622bbb1cccc]
Jun 04 13:26:21 atlas ceph-mon[19539]: 10: (PaxosService::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x254) [0x5622bba9d694]
Jun 04 13:26:21 atlas ceph-mon[19539]: 11: (Monitor::handle_command(boost::intrusive_ptr<MonOpRequest>)+0x22f6) [0x5622bb996ab6]
Jun 04 13:26:21 atlas ceph-mon[19539]: 12: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x779) [0x5622bb99a5a9]
Jun 04 13:26:21 atlas ceph-mon[19539]: 13: (Monitor::_ms_dispatch(Message*)+0x410) [0x5622bb99b5e0]
Jun 04 13:26:21 atlas ceph-mon[19539]: 14: (Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x59) [0x5622bb9c9b49]
Jun 04 13:26:21 atlas ceph-mon[19539]: 15: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr<Message> const&)+0x468) [0x7f5e645a6d68]
Jun 04 13:26:21 atlas ceph-mon[19539]: 16: (DispatchQueue::entry()+0x5ef) [0x7f5e645a446f]
Jun 04 13:26:21 atlas ceph-mon[19539]: 17: (DispatchQueue:ispatchThread::entry()+0xd) [0x7f5e646501fd]
Jun 04 13:26:21 atlas ceph-mon[19539]: 18: (()+0x8ea7) [0x7f5e63e48ea7]
Jun 04 13:26:21 atlas ceph-mon[19539]: 19: (clone()+0x3f) [0x7f5e63a35def]
Jun 04 13:26:21 atlas ceph-mon[19539]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 16.41600 root default
-3 16.41600 host atlas
0 hdd 3.63899 osd.0 down 0 1.00000
1 hdd 3.63899 osd.1 down 0 1.00000
2 hdd 3.63899 osd.2 down 0 1.00000
3 hdd 3.63899 osd.3 down 0 1.00000
4 ssd 0.46500 osd.4 DNE 0
5 ssd 0.46500 osd.5 DNE 0
6 ssd 0.46500 osd.6 DNE 0
7 ssd 0.46500 osd.7 DNE 0
# cat /etc/pve/ceph.conf
[global]
auth_client_required = none
auth_cluster_required = none
auth_service_required = none
cluster_network = 192.168.7.2/24
fsid = d7552c89-a9f4-404c-985b-f0b1421c26c4
mon_allow_pool_delete = true
mon_host = 192.168.7.2
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 192.168.7.2/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.atlas]
host = atlas
mds standby for name = pve
proxmox version : 7.2-3
ceph version : 15.2.16
Any help appreciated !