Broadcom BCM57504 - VLAN issues over Bond

FingerlessGloves

Well-Known Member
Oct 22, 2019
61
13
48
I've got a BCM57504 connected to two upstream switches using a 802.3ad bond and then off that bond I have vmbr0. Then I have vmbr0.108 for the management interface and then I have SDN configured, to create bridges on top of that bridge for each network I want to use in VMs.

I have 3 identical servers, I did something to get ti working on one of them and I thought it was `ethtool -K ethhere rx-vlan-offload off` but I've applied the same to the others. But the others don't want to work. VLANs can't be completely broken, because I can reach the management interface which requires a VLAN.

Comparing with `ethtool -k` there's no differences, and same with the `interfaces` file because I've been ansibling that out across the nodes.

Proper strange issue, anyone got any advice for the BCM57504 card? I'll keep trying things but it's proper strange
 
Can you post the output of the following commands (from a node where its working / not working):

Code:
cat /etc/network/interfaces
cat /etc/network/interfaces.d/sdn

ip a
 
Can you post the output of the following commands (from a node where its working / not working):

Code:
cat /etc/network/interfaces
cat /etc/network/interfaces.d/sdn

ip a

Working system
https://paste.n9.uk/?3f742395b8606f43#2JkUfxBhEpEviBK3gD9SbEahNK6YWEfE4oHxCJHKm8Re

Non working
https://paste.n9.uk/?d068f772f5c37401#HUzenbr8oJXLGApxEQr1XEqsmbeSjteFMpbFBZEGHnrn

Text was too long to post here

I can't see any differences bar the ones you'd expect like IPs and MACs
 
Network configuration looks fine at first glance - can you elaborate a bit more on what isn't working? Management network is working on each node? So I assume you're having issues with connectivity in the VMs? Could you post a configuration of a VM that is working and one that isn't working (optimally located on one of the servers that you sent me the network configuration from) + an information about which connection flows exactly are broken?

Do you have the firewall activated?
 
The management interface is working on each host and that's never had a problem. No firewall are configured on the Proxmox hosts.

VM is configured as `virtio=00:1a:4a:4c:72:72,bridge=vmbr0,tag=103` as I wanted to rule out the sdn bridges being the issue.

What I have noticed, is using the VM console I can run `arp -an` and I can see some ARPs getting answered but not all. So this points me to a bond issue, with the layer3+4 hashing. But switch side I've checked and each bond to each server is the exact same, so it feels like the Network Card is being funky.

I did have Intel E810 cards and all this was working, but I changed the cards due to issue I had when rebooting the server would cause problems, these BCM57504 cards do not have that problem, but now VLANs seem a bit screwy. Switch wise it's the same config I used on the E810s
 
Did you, by chance, ever find a solution for the issues you encountered? There have been some additional reports cropping up regarding this NIC model lately. I looked at the changelogs for the broadcom drivers as well, but couldn't find any suspicious commits.
 
Did you, by chance, ever find a solution for the issues you encountered? There have been some additional reports cropping up regarding this NIC model lately. I looked at the changelogs for the broadcom drivers as well, but couldn't find any suspicious commits.
Hi Shanreich,

It's the issue was a mixure of two issues, the cards and the Mikrotik Switches.

On Proxmox I have to do add a postup for the vlan offload, otherwise I've found sometimes the VLANs on a bridge don't work at all, it's not very consistent but turning off the offload fixes it.
`post-up ethtool -K nic1_1 rx-vlan-offload off`

MikroTik wise MLAG I believe had a bug in sharing the MAC (hosts) between the two switches, but since being on 7.22.1 it's been fine. MikroTik CRS520

With this configuration it's been stable for 2 months now, no issues when restarting proxmox servers or restarting switches.