Hi all,
While my ceph cluster show as healthy, I can see at the syslog, these messages:
ceph-osd[7215]: 2018-09-05 12:59:55.317306 7f565fbc8700 -1 osd.19 234 heartbeat_check: no reply from 10.11.20.130:6814 osd.13 ever on either front or back
The network seems fine. MTUs are ok and I can connect to 10.11.20.130:6814. The above message is between px2 and px3. px3 (10.11.20.131) can't connect to px2 (10.11.20.130).
Now, px1 also says
Sep 5 13:07:10 px1 kernel: [1781177.145998] libceph: mon2 10.11.20.131:6789 session lost, hunting for new mon
Sep 5 13:07:10 px1 kernel: [1781177.147560] libceph: mon1 10.11.20.130:6789 session established
Is says the same thing, even for the mon running on itself (px1). It keeps get connecting and disconnecting to all 3 mons in my 3-node cluster.
Connectivity is a 10G dedicated adapter and is OK. The nodes are doing nothing, so there is no stress on neither CPU or network.
Any ideas, where to look for the problem?
Thanx,
Sp
While my ceph cluster show as healthy, I can see at the syslog, these messages:
ceph-osd[7215]: 2018-09-05 12:59:55.317306 7f565fbc8700 -1 osd.19 234 heartbeat_check: no reply from 10.11.20.130:6814 osd.13 ever on either front or back
The network seems fine. MTUs are ok and I can connect to 10.11.20.130:6814. The above message is between px2 and px3. px3 (10.11.20.131) can't connect to px2 (10.11.20.130).
Now, px1 also says
Sep 5 13:07:10 px1 kernel: [1781177.145998] libceph: mon2 10.11.20.131:6789 session lost, hunting for new mon
Sep 5 13:07:10 px1 kernel: [1781177.147560] libceph: mon1 10.11.20.130:6789 session established
Is says the same thing, even for the mon running on itself (px1). It keeps get connecting and disconnecting to all 3 mons in my 3-node cluster.
Connectivity is a 10G dedicated adapter and is OK. The nodes are doing nothing, so there is no stress on neither CPU or network.
Any ideas, where to look for the problem?
Thanx,
Sp