We are using Open vSwitch (OVS) for networking in our Proxmox VE cluster. We are experiencing a network issue after restarting a node – Ceph is not immediately ready, and HEALTH_OK status takes several minutes to appear instead of just a few seconds.
Observed behavior:
• Ping with MTU 8900 from some nodes to the restarted host does not work immediately.
• One node usually has working MTU 8900 ping right away
• Two other nodes regain MTU 8900 connectivity only after several minutes
• Ceph reaches HEALTH_OK status only after several minutes, even though OSDs and network should be available immediately.
• The issue seems random – sometimes restarting a specific node causes problems, sometimes another does not.
Network setup:
• MTU 8996 for Ceph network
• OVS + Bonding (active-backup) on 10G interfaces
• Previously, we used Linux Bridge + Bonding, and there was no issue. We had to switch from linux bridge + bonding because we had very high latency when switching interfaces (in case one stopped working). On OVS it works instantly.
Has anyone experienced similar behavior with OVS in Proxmox? What could be the possible causes?
Any suggestions would be greatly appreciated!
I have attached the network configuration
Observed behavior:
• Ping with MTU 8900 from some nodes to the restarted host does not work immediately.
• One node usually has working MTU 8900 ping right away
• Two other nodes regain MTU 8900 connectivity only after several minutes
• Ceph reaches HEALTH_OK status only after several minutes, even though OSDs and network should be available immediately.
• The issue seems random – sometimes restarting a specific node causes problems, sometimes another does not.
Network setup:
• MTU 8996 for Ceph network
• OVS + Bonding (active-backup) on 10G interfaces
• Previously, we used Linux Bridge + Bonding, and there was no issue. We had to switch from linux bridge + bonding because we had very high latency when switching interfaces (in case one stopped working). On OVS it works instantly.
Has anyone experienced similar behavior with OVS in Proxmox? What could be the possible causes?
Any suggestions would be greatly appreciated!
I have attached the network configuration