"libceph: socket closed" on Proxmox heads connecting to ceph storage

IzakEygelaar

Member
Feb 14, 2019
5
0
6
33
Hi All,

We have a situation where we are currently experiencing a massive amount of "socket closed " messages for Ceph OSD's on our KVM Proxmox heads.

Current KVM Proxmox infrastructure:
  • DELL R710
  • Dual CPU's
  • Dual 10Gbps bonded copper connection to the CEPH storage (Current ceph version => Jewel (running on Centos 7 infrastructure), connected via a Juniper EX4550)
  • KVM and CEPH nodes have both local firewalls (iptables and firewalld)
Current running PVE kernel: 4.15.18-11-pve
PVE version: pve-manager/5.3-9/ba817b29

Example of messages received on every single KVM Proxmox head:
[Wed Feb 13 03:44:01 2019] libceph: osd9 10.0.1.2:6800 socket closed (con state OPEN)
[Wed Feb 13 04:09:54 2019] libceph: osd7 10.0.1.1:6804 socket closed (con state OPEN)
[Wed Feb 13 05:06:24 2019] libceph: osd7 10.0.1.1:6804 socket closed (con state OPEN)
[Wed Feb 13 06:00:01 2019] libceph: osd7 10.0.1.1:6804 socket closed (con state OPEN)
[Wed Feb 13 06:16:58 2019] libceph: osd1 10.0.1.5:6802 socket closed (con state OPEN)

This happens on all proxmox's heads and against random OSD's, it's not always the same OSD's. On the Ceph storage side, we can't see anything obvious wrong with any of the Ceph OSD's (no heartbeat errors on Ceph side).

Is there any know bug / issue related to Proxmox 5.3.x and Ceph Jewel versions, as we have scoured the logs on both Ceph and Proxmox, without any success.

Also no dropped packets / errors on the Dual bonded 10Gbps nic's connecting to Ceph from Proxmox side.

Any advice / pointers would be greatly appreciated!

Thanks
 
[Wed Feb 13 03:44:01 2019] libceph: osd9 10.0.1.2:6800 socket closed (con state OPEN)
[Wed Feb 13 04:09:54 2019] libceph: osd7 10.0.1.1:6804 socket closed (con state OPEN)
[Wed Feb 13 05:06:24 2019] libceph: osd7 10.0.1.1:6804 socket closed (con state OPEN)
[Wed Feb 13 06:00:01 2019] libceph: osd7 10.0.1.1:6804 socket closed (con state OPEN)
[Wed Feb 13 06:16:58 2019] libceph: osd1 10.0.1.5:6802 socket closed (con state OPEN)
AFAIK, if the socket is not in use then they are closed after 15min. Are VMs started/shutdown (short living) often?

Is there any know bug / issue related to Proxmox 5.3.x and Ceph Jewel versions, as we have scoured the logs on both Ceph and Proxmox, without any success.
Not that I know of. Are you using the stock debian packages or a different source?

Else you might need to pump up the logging. ;)
http://docs.ceph.com/docs/jewel/rados/troubleshooting/log-and-debug/
 
AFAIK, if the socket is not in use than they are closed after 15min. Are VMs started/shutdown (short living) often?"

So all of these VM's are "long lived" / production VM's and running for the majority of the time.

Not that I know of. Are you using the stock debian packages or a different source?

Else you might need to pump up the logging. ;)

The Ceph storage runs on Centos 7 and makes use of the provided packages by Ceph. We have also increased the verbosity level (level 5) on the following subsystems:
* osd
* rbd
* filestore

From increasing the verbosity on that, we could find anything out of the ordinary
 
Last edited:
If no other repository is configured, then the stock packages are installed. I also would activate the logging on the PVE side.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!