Proxmox 7 and Mellanox ConnectX4 and vlan aware bridge

VictorSTS

Renowned Member
Oct 7, 2019
459
138
63
Spain
Hello,

I'm setting up a new cluster using Mellanox ConnectX5 and Connect4LX cards. The ConnectX5 have not given me any issue (yet?), but the 4LX do not work that flawlessly.

After solving a very slow boot issue related to some old firmware version, now it seems that I won't be able to use more than 512 vlans in a vlan aware bridge. When booting such server I'm getting a lot of messages telling me that "netdev vlans list size (XXXX) > (512) max vport list size, some vlans will be dropped".

AFAIK, I can restrict the supported vlans in the bridge by adjusing the bridge-vids parameter in /etc/network/interfaces, but maybe someone has more experience with this hardware and knows how to increase that value to fully cover the whole range. Not that I need them now, but would be nice to know.

Thanks in advance.
 
AFAIK, I can restrict the supported vlans in the bridge by adjusing the bridge-vids parameter in /etc/network/interfaces, but maybe someone has more experience with this hardware and knows how to increase that value to fully cover the whole range. Not that I need them now, but would be nice to know
Never had a Mellanox card to experiment with - but you could try to disable vlan offloading with ethtool and see if this helps
(man ethtool and ethtool -h should get you started)

I hope this helps!
 
In any case I prefer 512 offloaded vlans that any non-offloaded higher number
I'm not sure that offloading provides that much of a performance benefit (at least have yet to see any numbers which would justify it compared to limited functionality and sometimes even breakage due to a mismatch of firmware and driver)

In any case - keep us posted if you have new findings!
 
you can limit indeed the number of vlans with

Code:
iface vmbr0
    ....
    bridge-vids 10-20,50-100

I known than connect-x3 was pretty limited to 64 vlans. (It's was working fine with older kernel (proxmox 5.X), maybe offloading was disabled at this time. But newer kernel 5.4 have introduced this limit.

I don't remember that connect-x4 was limited to 512 vlans. (Maybe something have changed in last kernel ?)

I don't think that disabling offloading is working with ethtool, I had tried with my connect-x3, and it has never worked.

With ifupdown2, you can change and reload the bridge-vids values without any problem.
 
  • Like
Reactions: Stoiko Ivanov
Had a chance to test some other Mellanox hardware and collect some info:

ConnectX3: 128 vlan limit
ConnectX4LX: 512 vlan limit (tested)
ConnectX4: 4096 vlan limit
ConnectX5: 4096 512 vlan limit (tested)

Seems that Linux kernel eventually introduced a mechanism to somehow initialize all the vlans in a vlan-aware bridge and thats why I'm getting those messages. If the bridge is not vlan-aware and you use more than the limit for your NIC, the vlan would simply not work.

As a workaround I have opened a feature request to be able to set bridge-vids from the GUI. I can't imagine the disaster that could arise from having my Management/Corosync/Ceph networks on vlans over the nic limit (1000, 1001, 1002, etc) and getting my manually set bridge-vids overwritten with the default for any reason.

Edit: ConnectX5 also has a 512 vlan limit in a vlan aware bridge.
 
Last edited:
CX6 also has 512 vlan limit. Upvoted your bugreport. Easy fix via cli though.
 
Last edited:
  • Like
Reactions: VictorSTS
Did you had a chance to try with the OFED driver instead of the "inbox" one that comes with Proxmox (Debian) ? Maybe its a limitation of the driver, as I haven't found any docs mentioning the limit yet.

Haven't tried myself, but installing the driver on Proxmox does not seem to be trivial due to the mix of Debian base distro + custom Ubuntu kernel used by Proxmox.
 
We experienced the same issue with our ConnectX-6 Lx cards. Solution was to disable vlan filtering with "rx-vlan-filter off" ethtool option. To make this setting permanent include this line under the interface parameters "pre-up ethtool -K $IFACE rx-vlan-filter off" (without quotes) in /etc/network/interfaces.

Works with both inbox and OFED drivers. Fix came directly from Mellanox/Nvidia engineering.
 
Last edited:
Hello,

I have the same issue with Mellanox

How does you /etc/network/interfaces looks like?

I've putted

Code:
pre-up ethtool -K $IFACE rx-vlan-filter off

But it doesn't work.

Ty,

Ivan
 
Hello,

I have the same issue with Mellanox

How does you /etc/network/interfaces looks like?

I've putted

Code:
pre-up ethtool -K $IFACE rx-vlan-filter off

But it doesn't work.

Ty,

Ivan

Thats how we do it!

Code:
sed -i 's/bridge-vids 2-4094/bridge-vids 2-512/' /etc/network/interfaces
Be careful, this makes all bridges only be able to use VLANs from 2-512 (max number of vlans for that card)
 
Hello,

I have the same issue with Mellanox

How does you /etc/network/interfaces looks like?

I've putted

Code:
pre-up ethtool -K $IFACE rx-vlan-filter off

But it doesn't work.

Ty,

Ivan

Where do you use that?

Afair, $IFACE will be replaced with the interface name currently being processed, so if you put that line in the bridge itself you will be "disabling" vlan offload in the bridge, but you have to do it in the physical interface instead.

Either limit allowed vlans, as jsterr showed you above, or set the interface name explicitly in the pre-up directive. Oh, and use a pre-up directive for every physical interface involved in the bridge if you are using bonding too!
 
Hello,

I'm setting up a new cluster using Mellanox ConnectX5 and Connect4LX cards. The ConnectX5 have not given me any issue (yet?), but the 4LX do not work that flawlessly.

After solving a very slow boot issue related to some old firmware version, now it seems that I won't be able to use more than 512 vlans in a vlan aware bridge. When booting such server I'm getting a lot of messages telling me that "netdev vlans list size (XXXX) > (512) max vport list size, some vlans will be dropped".

AFAIK, I can restrict the supported vlans in the bridge by adjusing the bridge-vids parameter in /etc/network/interfaces, but maybe someone has more experience with this hardware and knows how to increase that value to fully cover the whole range. Not that I need them now, but would be nice to know.

Thanks in advance.
 
@iffster solved issues as in, all is working fine and we can ignore this message?

mlx5e_vport_context_update_vlans:186:(pid 1448): netdev vlans list size (4095) > (512) max vport list size, some vlans will be dropped

I'm fine with not using more than 512 vlans per host, the message gives me anxiety a bit, but it does seem that it indeed all is working.

I do need vlan tags higher than 512, just not more than 512 total.

The card was running 14.28.2006, it showed no warning, since 14.32.1010 it shows this warning, but seems to work fine.
 
Last edited:
@iffster solved issues as in, all is working fine and we can ignore this message?

mlx5e_vport_context_update_vlans:186:(pid 1448): netdev vlans list size (4095) > (512) max vport list size, some vlans will be dropped

I'm fine with not using more than 512 vlans per host, the message gives me anxiety a bit, but it does seem that it indeed all is working.

I do need vlan tags higher than 512, just not more than 512 total.

The card was running 14.28.2006, it showed no warning, since 14.32.1010 it shows this warning, but seems to work fine.
You can also set vlans in the file according to your needs, for example:

bridge-vids 2-10 100 124 1000

should work, according to the docs: https://manpages.ubuntu.com/manpages/jammy/en/man5/interfaces-bridge.5.html
 
You can also set vlans in the file according to your needs, for example:

bridge-vids 2-10 100 124 1000

should work, according to the docs: https://manpages.ubuntu.com/manpages/jammy/en/man5/interfaces-bridge.5.html
I'm aware of that, but that would mean a lot of work when creating a new vlan for a customer, as I'd not only have to configure the switches, I would also have to configure all hosts that will use it.

However, as far as I see it works fine, with the command:
Bash:
bridge vlan show vmbrX

I see all the vlans in use by vm's listed there, instead of all the vlans from 2-4094 as usual.

So it looks like the kernel programs the bridge with the needed vlans as we go.

Please note that im bonding two ports and on top of that use a bridge.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!