Hi,
After realising that NFS trunking/multipath is not offering improved redundancy I'm looking at getting this done at the network level. Specifically I would like to be able to use a pair of switches without MLAG-like features, yet, still have redundancy for the storage network. Ideally by having two separate uplinks for each node (including NAS/SAN) and send traffic out the link that allows connectivity to the other side end-to-end.
Basic link-monitoring is insufficient as switches easily fail while still having active links. Also if the cross-connectivity between the switches were to fail all links to the host will remain up (thus no link switching happens), yet hosts that have their primary link pointed towards switch #1 wont be able to see other hosts that have their primary link on switch #2. So, we need something 'smarter'.
So, ideally we would want to use LACP from the nodes to two switches, and use MLAG between the switches. This allows for high availability and is the most flexible - but most likely also not within the budget for our small deployments (2-4 PVE nodes). Another (cheaper) option could be to stack the switches, however this does not allow for individual management of switches - meaning that one config error, or firmware update, will take down everything.
In this thread I was spitballing some ideas on how to do this at the OS level: Using arp-ping to determine which of the two interfaces has access to the destination, and activate/change a network route to force outgoing traffic to use that interface. This can be done through the use of a regularly executed script for instance.
The idea for this mechanism is basically from the NIC-teaming options that used to be available. NIC-Teaming is no longer being used I believe. However looking through the man pages for bonding it looks like something similar is actually available still. Bonding with an ARP-IP target. This allows an Active-Backup bond to decide which link to use based on an arp-ping, rather than link status. Basically exactly what I'm looking for.
So I'm trying this on a PVE hosts. First thing I realised is that, in order tfor the bond to be able to send ARP requests, the bond needs to have an IP address. This isn't great as this means that I won't be able to attach a vmbr to the bond. But Ok, for the sake of testing, this is the config I came up with:
The bond comes up fine. The IP is reachable. It looks fine... but it is not.
Something potentially suspicious is the bond-arp-interval not showing up in the bond status, but MII showing. Running an network capture on the bond0 (or eno7 / eno8) does not show any arp messages towards 10.200.0.201. Doing an arping manually works fine:
I'm not sure where to look for the potential error. Looking through Debian manpages I'm able to find all the bonding stuff to be described in ifupdown-ng, however the manpages for bookworm ifupdown2 don't mention bonding at all - which is a little odd as definitely the basics are supported.
Anybody familiar with this topic and could give me some hints as to were I could look to make this work?
After realising that NFS trunking/multipath is not offering improved redundancy I'm looking at getting this done at the network level. Specifically I would like to be able to use a pair of switches without MLAG-like features, yet, still have redundancy for the storage network. Ideally by having two separate uplinks for each node (including NAS/SAN) and send traffic out the link that allows connectivity to the other side end-to-end.
Basic link-monitoring is insufficient as switches easily fail while still having active links. Also if the cross-connectivity between the switches were to fail all links to the host will remain up (thus no link switching happens), yet hosts that have their primary link pointed towards switch #1 wont be able to see other hosts that have their primary link on switch #2. So, we need something 'smarter'.
So, ideally we would want to use LACP from the nodes to two switches, and use MLAG between the switches. This allows for high availability and is the most flexible - but most likely also not within the budget for our small deployments (2-4 PVE nodes). Another (cheaper) option could be to stack the switches, however this does not allow for individual management of switches - meaning that one config error, or firmware update, will take down everything.
In this thread I was spitballing some ideas on how to do this at the OS level: Using arp-ping to determine which of the two interfaces has access to the destination, and activate/change a network route to force outgoing traffic to use that interface. This can be done through the use of a regularly executed script for instance.
The idea for this mechanism is basically from the NIC-teaming options that used to be available. NIC-Teaming is no longer being used I believe. However looking through the man pages for bonding it looks like something similar is actually available still. Bonding with an ARP-IP target. This allows an Active-Backup bond to decide which link to use based on an arp-ping, rather than link status. Basically exactly what I'm looking for.
So I'm trying this on a PVE hosts. First thing I realised is that, in order tfor the bond to be able to send ARP requests, the bond needs to have an IP address. This isn't great as this means that I won't be able to attach a vmbr to the bond. But Ok, for the sake of testing, this is the config I came up with:
Code:
auto eno7
iface eno7 inet manual
auto eno8
iface eno8 inet manual
auto bond0
iface bond0 inet static
address 10.200.0.101/24
bond-slaves eno7 eno8
bond-mode active-backup
bond-primary eno7
bond-arp-interval 100
bond-arp-ip-target 10.200.0.201
bond-arp-validate filter
The bond comes up fine. The IP is reachable. It looks fine... but it is not.
Code:
root@pve-gen10:~# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v6.5.13-3-pve
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eno7
MII Status: up
MII Polling Interval (ms): 0
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0
Something potentially suspicious is the bond-arp-interval not showing up in the bond status, but MII showing. Running an network capture on the bond0 (or eno7 / eno8) does not show any arp messages towards 10.200.0.201. Doing an arping manually works fine:
Code:
root@pve-gen10:~# arping -I bond0 10.200.0.201
ARPING 10.200.0.201
60 bytes from 00:11:32:91:4e:af (10.200.0.201): index=0 time=182.914 usec
I'm not sure where to look for the potential error. Looking through Debian manpages I'm able to find all the bonding stuff to be described in ifupdown-ng, however the manpages for bookworm ifupdown2 don't mention bonding at all - which is a little odd as definitely the basics are supported.
Anybody familiar with this topic and could give me some hints as to were I could look to make this work?