ceph-mon crashed

Baader-IT

Well-Known Member
Oct 29, 2018
49
1
48
41
Hi,

for the 2nd time in only a few days we got a message about a crashed ceph-mon process:

root@sv18002:~# ceph crash info 2020-03-09_07:11:48.755203Z_8747897c-939b-4012-9c50-f4e266562f28
{
"os_version_id": "10",
"assert_condition": "z >= signedspan::zero()",
"utsname_release": "5.3.13-1-pve",
"os_name": "Debian GNU/Linux 10 (buster)",
"entity_name": "mon.sv18006",
"assert_file": "/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/common/ceph_time.h",
"timestamp": "2020-03-09 07:11:48.755203Z",
"process_name": "ceph-mon",
"utsname_machine": "x86_64",
"assert_line": 485,
"utsname_sysname": "Linux",
"os_version": "10 (buster)",
"os_id": "10",
"assert_thread_name": "ms_dispatch",
"utsname_version": "#1 SMP PVE 5.3.13-1 (Thu, 05 Dec 2019 07:18:14 +0100)",
"backtrace": [
"(()+0x12730) [0x7fa12d4e8730]",
"(gsignal()+0x10b) [0x7fa12cfcb7bb]",
"(abort()+0x121) [0x7fa12cfb6535]",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7fa12e627e79]",
"(()+0x282000) [0x7fa12e628000]",
"(Paxos::do_refresh()+0x1a4) [0x5632789924b4]",
"(Paxos::handle_commit(boost::intrusive_ptr<MonOpRequest>)+0x2f2) [0x563278997a62]",
"(Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x223) [0x56327899d213]",
"(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x131c) [0x5632788d2b1c]",
"(Monitor::_ms_dispatch(Message*)+0x4aa) [0x5632788d310a]",
"(Monitor::ms_dispatch(Message*)+0x26) [0x563278902a36]",
"(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x26) [0x5632788fef66]",
"(DispatchQueue::entry()+0x1a49) [0x7fa12e860e69]",
"(DispatchQueue::DispatchThread::entry()+0xd) [0x7fa12e90e9ed]",
"(()+0x7fa3) [0x7fa12d4ddfa3]",
"(clone()+0x3f) [0x7fa12d08d4cf]"
],
"utsname_hostname": "sv18006",
"assert_msg": "/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/common/ceph_time.h: In function 'ceph::time_detail::timespan ceph::to_timespan(ceph::time_detail::signedspan)' thread 7fa1245a1700 time 2020-03-09 08:11:48.749756\n/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/common/ceph_time.h: 485: FAILED ceph_assert(z >= signedspan::zero())\n",
"crash_id": "2020-03-09_07:11:48.755203Z_8747897c-939b-4012-9c50-f4e266562f28",
"assert_func": "ceph::time_detail::timespan ceph::to_timespan(ceph::time_detail::signedspan)",
"ceph_version": "14.2.6"
}

is there a known issue regarding this ?

Regards
Frank
 
It seems so. The pull request was merged 5 days ago on the Ceph master repo. So it isn't in the 14.2.8 release.
https://tracker.ceph.com/issues/43365
https://tracker.ceph.com/issues/44078

In any case, it seems that Ceph encounters negative values from the monotonic clock (it should never go backwards). You could try changing the HPET (or others) settings in the BIOS an see if this may be a workaround.