ceph problems

Aug 1, 2017
63
0
6
39
Hi all,

While my ceph cluster show as healthy, I can see at the syslog, these messages:
ceph-osd[7215]: 2018-09-05 12:59:55.317306 7f565fbc8700 -1 osd.19 234 heartbeat_check: no reply from 10.11.20.130:6814 osd.13 ever on either front or back

The network seems fine. MTUs are ok and I can connect to 10.11.20.130:6814. The above message is between px2 and px3. px3 (10.11.20.131) can't connect to px2 (10.11.20.130).

Now, px1 also says
Sep 5 13:07:10 px1 kernel: [1781177.145998] libceph: mon2 10.11.20.131:6789 session lost, hunting for new mon
Sep 5 13:07:10 px1 kernel: [1781177.147560] libceph: mon1 10.11.20.130:6789 session established

Is says the same thing, even for the mon running on itself (px1). It keeps get connecting and disconnecting to all 3 mons in my 3-node cluster.

Connectivity is a 10G dedicated adapter and is OK. The nodes are doing nothing, so there is no stress on neither CPU or network.

Any ideas, where to look for the problem?

Thanx,
Sp
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!