Spiros Papageorgiou

    Aug 1, 2017
    Likes Received:
    Hi all,

    While my ceph cluster show as healthy, I can see at the syslog, these messages:
    ceph-osd[7215]: 2018-09-05 12:59:55.317306 7f565fbc8700 -1 osd.19 234 heartbeat_check: no reply from osd.13 ever on either front or back

    The network seems fine. MTUs are ok and I can connect to The above message is between px2 and px3. px3 ( can't connect to px2 (

    Now, px1 also says
    Sep 5 13:07:10 px1 kernel: [1781177.145998] libceph: mon2 session lost, hunting for new mon
    Sep 5 13:07:10 px1 kernel: [1781177.147560] libceph: mon1 session established

    Is says the same thing, even for the mon running on itself (px1). It keeps get connecting and disconnecting to all 3 mons in my 3-node cluster.

    Connectivity is a 10G dedicated adapter and is OK. The nodes are doing nothing, so there is no stress on neither CPU or network.

    Any ideas, where to look for the problem?

