Broadcom BCM57504 - VLAN issues over Bond

FingerlessGloves

Well-Known Member
Oct 22, 2019
58
13
48
I've got a BCM57504 connected to two upstream switches using a 802.3ad bond and then off that bond I have vmbr0. Then I have vmbr0.108 for the management interface and then I have SDN configured, to create bridges on top of that bridge for each network I want to use in VMs.

I have 3 identical servers, I did something to get ti working on one of them and I thought it was `ethtool -K ethhere rx-vlan-offload off` but I've applied the same to the others. But the others don't want to work. VLANs can't be completely broken, because I can reach the management interface which requires a VLAN.

Comparing with `ethtool -k` there's no differences, and same with the `interfaces` file because I've been ansibling that out across the nodes.

Proper strange issue, anyone got any advice for the BCM57504 card? I'll keep trying things but it's proper strange
 
Can you post the output of the following commands (from a node where its working / not working):

Code:
cat /etc/network/interfaces
cat /etc/network/interfaces.d/sdn

ip a
 
Can you post the output of the following commands (from a node where its working / not working):

Code:
cat /etc/network/interfaces
cat /etc/network/interfaces.d/sdn

ip a

Working system
https://paste.n9.uk/?3f742395b8606f43#2JkUfxBhEpEviBK3gD9SbEahNK6YWEfE4oHxCJHKm8Re

Non working
https://paste.n9.uk/?d068f772f5c37401#HUzenbr8oJXLGApxEQr1XEqsmbeSjteFMpbFBZEGHnrn

Text was too long to post here

I can't see any differences bar the ones you'd expect like IPs and MACs
 
Network configuration looks fine at first glance - can you elaborate a bit more on what isn't working? Management network is working on each node? So I assume you're having issues with connectivity in the VMs? Could you post a configuration of a VM that is working and one that isn't working (optimally located on one of the servers that you sent me the network configuration from) + an information about which connection flows exactly are broken?

Do you have the firewall activated?
 
The management interface is working on each host and that's never had a problem. No firewall are configured on the Proxmox hosts.

VM is configured as `virtio=00:1a:4a:4c:72:72,bridge=vmbr0,tag=103` as I wanted to rule out the sdn bridges being the issue.

What I have noticed, is using the VM console I can run `arp -an` and I can see some ARPs getting answered but not all. So this points me to a bond issue, with the layer3+4 hashing. But switch side I've checked and each bond to each server is the exact same, so it feels like the Network Card is being funky.

I did have Intel E810 cards and all this was working, but I changed the cards due to issue I had when rebooting the server would cause problems, these BCM57504 cards do not have that problem, but now VLANs seem a bit screwy. Switch wise it's the same config I used on the E810s