Hi All,
We have a situation where we are currently experiencing a massive amount of "socket closed " messages for Ceph OSD's on our KVM Proxmox heads.
Current KVM Proxmox infrastructure:
PVE version: pve-manager/5.3-9/ba817b29
Example of messages received on every single KVM Proxmox head:
[Wed Feb 13 03:44:01 2019] libceph: osd9 10.0.1.2:6800 socket closed (con state OPEN)
[Wed Feb 13 04:09:54 2019] libceph: osd7 10.0.1.1:6804 socket closed (con state OPEN)
[Wed Feb 13 05:06:24 2019] libceph: osd7 10.0.1.1:6804 socket closed (con state OPEN)
[Wed Feb 13 06:00:01 2019] libceph: osd7 10.0.1.1:6804 socket closed (con state OPEN)
[Wed Feb 13 06:16:58 2019] libceph: osd1 10.0.1.5:6802 socket closed (con state OPEN)
This happens on all proxmox's heads and against random OSD's, it's not always the same OSD's. On the Ceph storage side, we can't see anything obvious wrong with any of the Ceph OSD's (no heartbeat errors on Ceph side).
Is there any know bug / issue related to Proxmox 5.3.x and Ceph Jewel versions, as we have scoured the logs on both Ceph and Proxmox, without any success.
Also no dropped packets / errors on the Dual bonded 10Gbps nic's connecting to Ceph from Proxmox side.
Any advice / pointers would be greatly appreciated!
Thanks
We have a situation where we are currently experiencing a massive amount of "socket closed " messages for Ceph OSD's on our KVM Proxmox heads.
Current KVM Proxmox infrastructure:
- DELL R710
- Dual CPU's
- Dual 10Gbps bonded copper connection to the CEPH storage (Current ceph version => Jewel (running on Centos 7 infrastructure), connected via a Juniper EX4550)
- KVM and CEPH nodes have both local firewalls (iptables and firewalld)
PVE version: pve-manager/5.3-9/ba817b29
Example of messages received on every single KVM Proxmox head:
[Wed Feb 13 03:44:01 2019] libceph: osd9 10.0.1.2:6800 socket closed (con state OPEN)
[Wed Feb 13 04:09:54 2019] libceph: osd7 10.0.1.1:6804 socket closed (con state OPEN)
[Wed Feb 13 05:06:24 2019] libceph: osd7 10.0.1.1:6804 socket closed (con state OPEN)
[Wed Feb 13 06:00:01 2019] libceph: osd7 10.0.1.1:6804 socket closed (con state OPEN)
[Wed Feb 13 06:16:58 2019] libceph: osd1 10.0.1.5:6802 socket closed (con state OPEN)
This happens on all proxmox's heads and against random OSD's, it's not always the same OSD's. On the Ceph storage side, we can't see anything obvious wrong with any of the Ceph OSD's (no heartbeat errors on Ceph side).
Is there any know bug / issue related to Proxmox 5.3.x and Ceph Jewel versions, as we have scoured the logs on both Ceph and Proxmox, without any success.
Also no dropped packets / errors on the Dual bonded 10Gbps nic's connecting to Ceph from Proxmox side.
Any advice / pointers would be greatly appreciated!
Thanks