ceph-mon crashed

Baader-IT

Active Member
Oct 29, 2018
49
1
28
41
Hi,

for the 2nd time in only a few days we got a message about a crashed ceph-mon process:

root@sv18002:~# ceph crash info 2020-03-09_07:11:48.755203Z_8747897c-939b-4012-9c50-f4e266562f28
{
"os_version_id": "10",
"assert_condition": "z >= signedspan::zero()",
"utsname_release": "5.3.13-1-pve",
"os_name": "Debian GNU/Linux 10 (buster)",
"entity_name": "mon.sv18006",
"assert_file": "/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/common/ceph_time.h",
"timestamp": "2020-03-09 07:11:48.755203Z",
"process_name": "ceph-mon",
"utsname_machine": "x86_64",
"assert_line": 485,
"utsname_sysname": "Linux",
"os_version": "10 (buster)",
"os_id": "10",
"assert_thread_name": "ms_dispatch",
"utsname_version": "#1 SMP PVE 5.3.13-1 (Thu, 05 Dec 2019 07:18:14 +0100)",
"backtrace": [
"(()+0x12730) [0x7fa12d4e8730]",
"(gsignal()+0x10b) [0x7fa12cfcb7bb]",
"(abort()+0x121) [0x7fa12cfb6535]",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a3) [0x7fa12e627e79]",
"(()+0x282000) [0x7fa12e628000]",
"(Paxos::do_refresh()+0x1a4) [0x5632789924b4]",
"(Paxos::handle_commit(boost::intrusive_ptr<MonOpRequest>)+0x2f2) [0x563278997a62]",
"(Paxos::dispatch(boost::intrusive_ptr<MonOpRequest>)+0x223) [0x56327899d213]",
"(Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x131c) [0x5632788d2b1c]",
"(Monitor::_ms_dispatch(Message*)+0x4aa) [0x5632788d310a]",
"(Monitor::ms_dispatch(Message*)+0x26) [0x563278902a36]",
"(Dispatcher::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x26) [0x5632788fef66]",
"(DispatchQueue::entry()+0x1a49) [0x7fa12e860e69]",
"(DispatchQueue::DispatchThread::entry()+0xd) [0x7fa12e90e9ed]",
"(()+0x7fa3) [0x7fa12d4ddfa3]",
"(clone()+0x3f) [0x7fa12d08d4cf]"
],
"utsname_hostname": "sv18006",
"assert_msg": "/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/common/ceph_time.h: In function 'ceph::time_detail::timespan ceph::to_timespan(ceph::time_detail::signedspan)' thread 7fa1245a1700 time 2020-03-09 08:11:48.749756\n/mnt/npool/tlamprecht/pve-ceph/ceph-14.2.6/src/common/ceph_time.h: 485: FAILED ceph_assert(z >= signedspan::zero())\n",
"crash_id": "2020-03-09_07:11:48.755203Z_8747897c-939b-4012-9c50-f4e266562f28",
"assert_func": "ceph::time_detail::timespan ceph::to_timespan(ceph::time_detail::signedspan)",
"ceph_version": "14.2.6"
}

is there a known issue regarding this ?

Regards
Frank
 
It seems so. The pull request was merged 5 days ago on the Ceph master repo. So it isn't in the 14.2.8 release.
https://tracker.ceph.com/issues/43365
https://tracker.ceph.com/issues/44078

In any case, it seems that Ceph encounters negative values from the monotonic clock (it should never go backwards). You could try changing the HPET (or others) settings in the BIOS an see if this may be a workaround.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!