Thank you for your help today.

Yes, I agree, I've been on it for a few months with five nodes just using home apps and learning. It's phenomenal, but a little scary how fragile it is.
 
Some thoughts...
  • According to configuration your ceph public network is shared with pve interface and presumably vm access. Your ceph networks should be isolated, it's fairly sensitive to latency so mixing all of that other traffic is going to contend with high priority storage traffic, I doubt you have QoS dialed in to account for this.
  • I wouldn't use linux bridges for ceph interfaces, use the physical interface alone. Seems like an unnecessary abstraction layer in this case.
  • Ensure MTU 9000 (jumbo frames) is supported by your hardware and is set on every interface, from physical interfaces, to linux bridges (if you must), and switchports. You may need to reset or restart hardware after enabling jumbo frames. MTU 9000 is especially useful for SAN applications, and would be useful for ceph public and cluster networks but not so much for the PVE management interface.
    • I would leave vmbr0 MTU 1500, only 9000 for ceph networks.
  • With the way things seem intermittent or unstable, the issue has a L2 smell to it. May be MTU but might also be ARP or STP.
I would remove vmbr1 and combine your ceph public/cluster networks and put their IP on eno2np1 with MTU 9000, verify switchports are configured to handle MTU 9000 (including trunk ports if they're involved). Add some 10G NICs in the future, break ceph public/cluster networks up again so that they each get one or more dedicated 10G links.
 
  • Like
Reactions: Johannes S and UdoB
  • According to configuration your ceph public network is shared with pve interface and presumably vm access. Your ceph networks should be isolated, it's fairly sensitive to latency so mixing all of that other traffic is going to contend with high priority storage traffic, I doubt you have QoS dialed in to account for this.
Precisely. This is what all the documentation says, which is why I gave isolating Ceph's traffic a try.
I wouldn't use linux bridges for ceph interfaces, use the physical interface alone. Seems like an unnecessary abstraction layer in this case.
So SR-IOV enabled on the Ceph private interfaces?
Ensure MTU 9000 (jumbo frames) is supported by your hardware and is set on every interface, from physical interfaces, to linux bridges (if you must), and switchports. You may need to reset or restart hardware after enabling jumbo frames. MTU 9000 is especially useful for SAN applications, and would be useful for ceph public and cluster networks but not so much for the PVE management interface
The hardware is four R740s and one R730, they are all using identical Mellanox ConnectX-4 LX Dual Port 25GbE SFP+ NICs.
 
The linux bridge is vmbr1. Move the IP address on vmbr1 to eno2np1 and remove vmbr1. Linux bridges are very nice and cool and performant but they're unnecessary here and add complication, for instance they may be set at MTU 1500 despite the parent interface and the rest of the stack being set to 9000. You can check if this is the case right now with ip link | grep vmbr1: | grep --color -E 'mtu\s[0-9]+'

If you're going to set MTU 9000 make sure you set it on the server eno2np1 interface as well as switch that the servers are connected to.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!