Open vSwitch across physical servers

chrispage1

Member
Sep 1, 2021
86
36
23
32
Hi there!

We have three physical servers, each with 2 x 2P 10G NIC's and 1 x 2P 1G NIC.

1 x 10G NIC will be bonded, active active for Ceph
1 x 10G NIC will be bonded, active active for public/private networking
1 x 1G NIC will be bonded, active active for management & Corosync

Each NIC will be across two 25G switches, with Ceph, public/private & Corosync all routing through the same switch.

My concern is, that if we were to have a switching issue (which we've had before) and the bond failed for whatever reason, Corosync would see all of the nodes as down, fence them all off and perform a reboot. In the case of a switching issue, this won't fix anything, and potentially delay the recovery of the cluster.

I was wondering if it's possible to use Open vSwitch to directly connect the 1G NIC's so that in the case of a switch failure, Corosync would still be able to heartbeat between the servers or am I completely misunderstanding the possible uses of Open vSwitch?

Additionally, does the above network configuration look like a good setup?

Thanks,
Chris.
 
and the bond failed for whatever reason
do you mean, only 1 link ? or the 2 links ?

because with only 1 link down, it shouldn't fail.

of course, if you loose all links in the bond, and that you don't have network access, the node will be fenced if HA enabled.
(This is the same with openvswitch, no magic here ;)

if you have really a full network problem on the 25G switch at the same time, you could configure a second ring in corosync onfig, on the gigabit network.
https://pve.proxmox.com/wiki/Separate_Cluster_Network#Redundant_Ring_Protocol
 
Hi Spirit,

According to our provider the active went into a semi- down state and the standby failed to kick in. It caused enough issues with our Xen cluster so know it'd definitely cause problems with Fencing once we've moved over to Proxmox. Personally I don't see how it could have happened so getting an RFO from them to find out!

Thanks for the details, it looks like a redundant ring may be the way to go.

Chris.
 
Thanks Spirit - I believe we did have LACP but quite why the failover didn't work I'm not sure. I'm pretty sure there's something we're not being told by our infrastructure provider...
 
Do you think it's valid to recommend the "Meshed network" approach as documented here? https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server

I've been thinking using this technique, I could setup as below. It'd be greatly appreciated if you wouldn't mind casting your eye over it and let me know your thoughts? Perhaps this could be of use to someone else in the future...

If we were to run Ceph on a meshed network, I presume writes from node 1 would be replicated over node1 -> node2 & node1 -> node3 respectively, thus theoretically increasing the overall throughput and preventing saturation of our 10GbE links? And if one link was to fail, say between node1 & node2, node1 would then take the next available path (node1 -> node3 -> node2)

We could use our NIC's as below, removing a single point of failure:

Each machine has:
10GbE dual NIC (ens18 & ens19)
10GbE dual NIC (ens20 & ens21)
1GbE dual NIC (ens22 & ens23)

Each network split across two NIC's preventing a single NIC failure:
ens18 & ens20 for Ceph traffic (meshed)
ens19 & ens21 for WAN & VLAN private (LACP across two 25GbE switches)
ens22 & ens23 for management & heartbeat (LACP across two 1GbE switches)

Meshed network configured:
Node1/ens18 - Node2/ens20
Node2/ens18 - Node3/ens20
Node3/ens18 - Node1/ens20

I could then use the meshed network as a failover Corosync ring, so in the case of a total switch failure, WAN & private networking would be compromised but storage & heartbeat would continue to function, preventing Proxmox from being pulled offline. In my mind this would drastically reduce the chances of a total failure and help with Ceph throughput.

Chris.
 
If we were to run Ceph on a meshed network, I presume writes from node 1 would be replicated over node1 -> node2 & node1 -> node3 respectively, thus theoretically increasing the overall throughput and preventing saturation of our 10GbE links?

I could then use the meshed network as a failover Corosync ring, so in the case of a total switch failure, WAN & private networking would be compromised but storage & heartbeat would continue to function, preventing Proxmox from being pulled offline. In my mind this would drastically reduce the chances of a total failure and help with Ceph throughput.
node1->node2 & node1->node3 vs dual 10G LACP are in theory same througput
you can move pve management to ens19&20 bond and use 2x1Gbps links unbonded for corosync.

You can use mesh in 3 nodes. But when you will want add more nodes in future, it will limit you.
 
node1->node2 & node1->node3 vs dual 10G LACP are in theory same througput
you can move pve management to ens19&20 bond and use 2x1Gbps links unbonded for corosync.

You can use mesh in 3 nodes. But when you will want add more nodes in future, it will limit you.

Thanks very much for your input - I'll go with switching rather than mesh nodes for redundancy and ability to scale.

- 'Public' networking (public, private & management) separated by VLAN on 10Gbps LACP
- Ceph on 10Gbps LACP
- Corosync with redundant ring, one 1Gbps network on one switch & one 1Gbps on the other

Networking will be rarely pushed so I'll configure Proxmox to use a migration network across public networking as per docs (https://pve.proxmox.com/wiki/Manual:_datacenter.cfg)

Chris.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!