Hey all, I've been managing a cluster of four nodes. Each node has a set of Mellanox Connect-X3 in a bond. The first 3 nodes were made a much longer time ago and use OVS IntPorts for that bond. The 4th node uses just Linux bonds and bridges. We weren't aware of it until I was testing something for an upgrade, but the Mellanox Connect-X3 NICs only allow 128 unique VLANs to be enabled. Most of the VLANs that matter are below 118, but there was one in particular (1306) that gave me trouble, which lead me to this discovery. We were using VLAN Aware mode on the Linux bridge, which has one slave port which is the bond with the Mellanox NICS as its ports.
Checking the /etc/network/interface file I see that by default, checking VLAN Aware explicitly sets VLANs 2-4094, which is well above the maximum number of VLANs on these cards. I was able to get around this by manually setting the VLAN IDs in the config file, tested this, it works fine.
What I'm curious about is why this isn't an issue on the nodes using OVS instead of Linux bridges. What exactly is the underlying difference?
Checking the /etc/network/interface file I see that by default, checking VLAN Aware explicitly sets VLANs 2-4094, which is well above the maximum number of VLANs on these cards. I was able to get around this by manually setting the VLAN IDs in the config file, tested this, it works fine.
What I'm curious about is why this isn't an issue on the nodes using OVS instead of Linux bridges. What exactly is the underlying difference?