Open vSwitch across physical servers

chrispage1 · Sep 10, 2021

Hi there!

We have three physical servers, each with 2 x 2P 10G NIC's and 1 x 2P 1G NIC.

1 x 10G NIC will be bonded, active active for Ceph
1 x 10G NIC will be bonded, active active for public/private networking
1 x 1G NIC will be bonded, active active for management & Corosync

Each NIC will be across two 25G switches, with Ceph, public/private & Corosync all routing through the same switch.

My concern is, that if we were to have a switching issue (which we've had before) and the bond failed for whatever reason, Corosync would see all of the nodes as down, fence them all off and perform a reboot. In the case of a switching issue, this won't fix anything, and potentially delay the recovery of the cluster.

I was wondering if it's possible to use Open vSwitch to directly connect the 1G NIC's so that in the case of a switch failure, Corosync would still be able to heartbeat between the servers or am I completely misunderstanding the possible uses of Open vSwitch?

Additionally, does the above network configuration look like a good setup?

Thanks,
Chris.

spirit · Sep 10, 2021

and the bond failed for whatever reason

do you mean, only 1 link ? or the 2 links ?

because with only 1 link down, it shouldn't fail.

of course, if you loose all links in the bond, and that you don't have network access, the node will be fenced if HA enabled.
(This is the same with openvswitch, no magic here

if you have really a full network problem on the 25G switch at the same time, you could configure a second ring in corosync onfig, on the gigabit network.
https://pve.proxmox.com/wiki/Separate_Cluster_Network#Redundant_Ring_Protocol

chrispage1 · Sep 10, 2021

Hi Spirit,

According to our provider the active went into a semi- down state and the standby failed to kick in. It caused enough issues with our Xen cluster so know it'd definitely cause problems with Fencing once we've moved over to Proxmox. Personally I don't see how it could have happened so getting an RFO from them to find out!

Thanks for the details, it looks like a redundant ring may be the way to go.

Chris.

spirit · Sep 10, 2021

BTW, try to use lacp for bonding, it's 100% safe and have a monitoring protocol for fast detection of link failure. (failover take less than 100ms)

chrispage1 · Sep 13, 2021

Thanks Spirit - I believe we did have LACP but quite why the failover didn't work I'm not sure. I'm pretty sure there's something we're not being told by our infrastructure provider...

chrispage1 · Sep 13, 2021

Do you think it's valid to recommend the "Meshed network" approach as documented here? https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server

I've been thinking using this technique, I could setup as below. It'd be greatly appreciated if you wouldn't mind casting your eye over it and let me know your thoughts? Perhaps this could be of use to someone else in the future...

If we were to run Ceph on a meshed network, I presume writes from node 1 would be replicated over node1 -> node2 & node1 -> node3 respectively, thus theoretically increasing the overall throughput and preventing saturation of our 10GbE links? And if one link was to fail, say between node1 & node2, node1 would then take the next available path (node1 -> node3 -> node2)

We could use our NIC's as below, removing a single point of failure:

Each machine has:
10GbE dual NIC (ens18 & ens19)
10GbE dual NIC (ens20 & ens21)
1GbE dual NIC (ens22 & ens23)

Each network split across two NIC's preventing a single NIC failure:
ens18 & ens20 for Ceph traffic (meshed)
ens19 & ens21 for WAN & VLAN private (LACP across two 25GbE switches)
ens22 & ens23 for management & heartbeat (LACP across two 1GbE switches)

Meshed network configured:
Node1/ens18 - Node2/ens20
Node2/ens18 - Node3/ens20
Node3/ens18 - Node1/ens20

I could then use the meshed network as a failover Corosync ring, so in the case of a total switch failure, WAN & private networking would be compromised but storage & heartbeat would continue to function, preventing Proxmox from being pulled offline. In my mind this would drastically reduce the chances of a total failure and help with Ceph throughput.

Chris.

czechsys · Sep 15, 2021

chrispage1 said:
If we were to run Ceph on a meshed network, I presume writes from node 1 would be replicated over node1 -> node2 & node1 -> node3 respectively, thus theoretically increasing the overall throughput and preventing saturation of our 10GbE links?

I could then use the meshed network as a failover Corosync ring, so in the case of a total switch failure, WAN & private networking would be compromised but storage & heartbeat would continue to function, preventing Proxmox from being pulled offline. In my mind this would drastically reduce the chances of a total failure and help with Ceph throughput.

node1->node2 & node1->node3 vs dual 10G LACP are in theory same througput
you can move pve management to ens19&20 bond and use 2x1Gbps links unbonded for corosync.

You can use mesh in 3 nodes. But when you will want add more nodes in future, it will limit you.

chrispage1 · Sep 16, 2021

czechsys said:
node1->node2 & node1->node3 vs dual 10G LACP are in theory same througput
you can move pve management to ens19&20 bond and use 2x1Gbps links unbonded for corosync.

You can use mesh in 3 nodes. But when you will want add more nodes in future, it will limit you.

Thanks very much for your input - I'll go with switching rather than mesh nodes for redundancy and ability to scale.

- 'Public' networking (public, private & management) separated by VLAN on 10Gbps LACP
- Ceph on 10Gbps LACP
- Corosync with redundant ring, one 1Gbps network on one switch & one 1Gbps on the other

Networking will be rarely pushed so I'll configure Proxmox to use a migration network across public networking as per docs (https://pve.proxmox.com/wiki/Manual:_datacenter.cfg)

Chris.

Search

Search

Open vSwitch across physical servers

chrispage1

Active Member

spirit

Distinguished Member

chrispage1

Active Member

spirit

Distinguished Member

chrispage1

Active Member

chrispage1

Active Member

czechsys

Renowned Member

chrispage1

Active Member

We value your privacy