Tags:
  1. Spiros Papageorgiou

    Joined:
    Aug 1, 2017
    Messages:
    57
    Likes Received:
    0
    Hi all,

    While my ceph cluster show as healthy, I can see at the syslog, these messages:
    ceph-osd[7215]: 2018-09-05 12:59:55.317306 7f565fbc8700 -1 osd.19 234 heartbeat_check: no reply from 10.11.20.130:6814 osd.13 ever on either front or back

    The network seems fine. MTUs are ok and I can connect to 10.11.20.130:6814. The above message is between px2 and px3. px3 (10.11.20.131) can't connect to px2 (10.11.20.130).

    Now, px1 also says
    Sep 5 13:07:10 px1 kernel: [1781177.145998] libceph: mon2 10.11.20.131:6789 session lost, hunting for new mon
    Sep 5 13:07:10 px1 kernel: [1781177.147560] libceph: mon1 10.11.20.130:6789 session established

    Is says the same thing, even for the mon running on itself (px1). It keeps get connecting and disconnecting to all 3 mons in my 3-node cluster.

    Connectivity is a 10G dedicated adapter and is OK. The nodes are doing nothing, so there is no stress on neither CPU or network.

    Any ideas, where to look for the problem?

    Thanx,
    Sp
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice