Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF

syfy323

Member
Nov 16, 2019
80
5
8
30
Hi!

Fresh install on Supermicro X11DPi-N(T) using Proxmox v6.1.

This message is looping through the log and saturates one core:
[ 911.977646] i40e 0000:60:00.1: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
[ 955.642636] i40e 0000:60:00.1: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on

The interesting part is, the PCI address is one of two links in a bond. The port is not connected / down (NO-CARRIER).

After about 20 minutes the logging stopped and the load is back to normal.

This bug report relates the problem to vlans:
https://sourceforge.net/p/e1000/bugs/575/

Indeed, I am using VLAN, actually QinQ.

I was unable to find some settings to adjust, any ideas?

Kind regards
Kevin
 
I had this error as well, from looking at the bug report I went ahead and changed the VLANS to only the ones I needed and this error then went away. I have the Supermicro X11SPM-TF motherboard. So I would agree with the bug report that it has to do with the X722 hardware VLAN capability.

Code:
lspci | egrep -i --color 'network|ethernet'
b5:00.0 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GBASE-T (rev 08)
b5:00.1 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GBASE-T (rev 08)

Code:
ethtool -k eno1
Features for eno1:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: on
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off
hw-tc-offload: on
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]


Note that I installed ifupdown2. The full config:
Code:
auto lo
iface lo inet loopback

iface eno1 inet manual

iface eno2 inet manual

auto bond0
iface bond0 inet manual
        bond-slaves eno1 eno2
        bond-miimon 100
        bond-mode 802.3ad
        bond_xmit_has_policy layer2+3
        bond_lacp_rate 1

auto vmbr0
iface vmbr0 inet manual
        bridge-ports bond0
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 10 20

auto vmbr0.10
iface vmbr0.10 inet static
        address 192.168.10.10/24
        gateway 192.168.10.1
 
What I did notice was odd, was that the version of the i40e driver, wasn't listed on the Intel site.

Code:
2.10.19.82 (latest)
2.10.19.30
2.9.21
2.8.43
2.7.29
2.7.26
2.7.12
2.7.11


Code:
lscip -nk
b5:00.0 0200: 8086:37d2 (rev 08)
        Subsystem: 15d9:37d2
        Kernel driver in use: i40e
        Kernel modules: i40e
b5:00.1 0200: 8086:37d2 (rev 08)
        Subsystem: 15d9:37d2
        Kernel driver in use: i40e
        Kernel modules: i40e

Code:
modinfo i40e | grep -i version
version:        2.8.20-k
srcversion:     61EE7A7018ED9599D5BADBB
vermagic:       5.3.18-3-pve SMP mod_unload modversions[CODE]
 
I see this as well..

But I have changed the network config to only specify the VLANs I am interested in via the bridge-vids option in /etc/network/interfaces, instead of all of them, since that Proxmox bug mentioned that X710 have limited VLAN resources.. and I haven't seen the issue in a few hours... at least. Time will tell I guess.

Code:
auto vmbr0
iface vmbr0 inet static
   address 192.168.1.3/24
   gateway 192.168.1.1
   bridge-ports enp1s0f0
   bridge-stp off
   bridge-fd 0
   bridge-vlan-aware yes
   bridge-vids 42 69

I also got a card had the newest firmware already on it, which Proxmox and kernel 5.4 aren't the newest drivers, but I haven't tried, nor to i want to deal with, installing the latest driver from Intel's website (v3.3.x)..

Code:
[ 1.342348] i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.8.20-k
[ 1.347833] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
[ 1.418697] i40e 0000:01:00.0: fw 8.13.63341 api 1.12 nvm 8.15 0x80009507 1.1853.0 [8086:1572] [8086:0000]
[ 1.423476] i40e 0000:01:00.0: The driver for the device detected a newer version of the NVM image v1.12 than expected v1.9. Please install the most recent version of the network driver.
 
Hi

I have a similar issue, but networking seems to work fine, I just don't like the logs:

# modinfo i40e | grep -i version
version: 2.8.20-k
srcversion: B1FBC8992487C269EA33AA4
vermagic: 5.4.78-2-pve SMP mod_unload modversions


Jan 18 13:35:05 jupiter kernel: [ 11.664705] i40e: Intel(R) Ethernet Connection XL710 Network Driver - version 2.8.20-k
Jan 18 13:35:05 jupiter kernel: [ 11.665165] xhci_hcd 0000:00:14.0: xHCI Host Controller
Jan 18 13:35:05 jupiter kernel: [ 11.665176] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 1
Jan 18 13:35:05 jupiter kernel: [ 11.667396] xhci_hcd 0000:00:14.0: hcc params 0x200077c1 hci version 0x100 quirks 0x0000000000009810
Jan 18 13:35:05 jupiter kernel: [ 11.667458] xhci_hcd 0000:00:14.0: cache line size of 32 is not supported


Jan 18 14:44:27 jupiter kernel: [ 145.060365] i40e 0000:3d:00.0: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
Jan 18 14:44:27 jupiter kernel: [ 145.060584] i40e 0000:3d:00.0: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
Jan 18 14:44:27 jupiter kernel: [ 145.060819] i40e 0000:3d:00.0: Error I40E_AQ_RC_ENOSPC, forcing overflow promiscuous on PF
Jan 18 14:44:27 jupiter kernel: [ 145.061119] i40e 0000:3d:00.0: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on
Jan 18 14:44:27 jupiter kernel: [ 145.063400] i40e 0000:3d:00.1: Error I40E_AQ_RC_ENOSPC adding RX filters on PF, promiscuous mode forced on

Any thoughts?
 
Just set "offload-rx-vlan-filter off" in your bond config to solve the problem. Hardware is not able to handle it ;)
 
Still see this i dmesg:

26.648130] Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
[ 26.651855] bonding: bond0 is being created...
[ 26.732023] IPv6: ADDRCONF(NETDEV_CHANGE): eno2: link becomes ready
[ 26.732935] i40e 0000:3d:00.0 eno1: already using mac address a4:bf:01:86:15:5b
[ 26.767216] irq 61: Affinity broken due to vector space exhaustion.
[ 26.768149] irq 62: Affinity broken due to vector space exhaustion.
[ 26.768949] irq 63: Affinity broken due to vector space exhaustion.
[ 26.769599] irq 64: Affinity broken due to vector space exhaustion.
[ 26.770234] irq 65: Affinity broken due to vector space exhaustion.
[ 26.770887] irq 66: Affinity broken due to vector space exhaustion.
[ 26.771563] irq 67: Affinity broken due to vector space exhaustion.
[ 26.772218] irq 68: Affinity broken due to vector space exhaustion.
[ 26.772825] irq 69: Affinity broken due to vector space exhaustion.
[ 26.773419] irq 70: Affinity broken due to vector space exhaustion.
[ 26.774015] irq 71: Affinity broken due to vector space exhaustion.
[ 26.774600] irq 72: Affinity broken due to vector space exhaustion.
[ 26.775228] irq 73: Affinity broken due to vector space exhaustion.
[ 26.775822] irq 74: Affinity broken due to vector space exhaustion.
[ 26.776411] irq 75: Affinity broken due to vector space exhaustion.
[ 26.776985] irq 76: Affinity broken due to vector space exhaustion.
[ 26.777678] irq 93: Affinity broken due to vector space exhaustion.
[ 26.778230] irq 94: Affinity broken due to vector space exhaustion.
[ 26.778758] irq 95: Affinity broken due to vector space exhaustion.
[ 26.779329] irq 96: Affinity broken due to vector space exhaustion.
[ 26.779854] irq 97: Affinity broken due to vector space exhaustion.
[ 26.780366] irq 98: Affinity broken due to vector space exhaustion.
[ 26.780889] irq 99: Affinity broken due to vector space exhaustion.
[ 26.781379] irq 100: Affinity broken due to vector space exhaustion.
[ 26.781867] irq 101: Affinity broken due to vector space exhaustion.
[ 26.782351] irq 102: Affinity broken due to vector space exhaustion.
[ 26.782813] irq 103: Affinity broken due to vector space exhaustion.
[ 26.783329] irq 112: Affinity broken due to vector space exhaustion.
[ 26.783773] irq 113: Affinity broken due to vector space exhaustion.
[ 26.784211] irq 114: Affinity broken due to vector space exhaustion.
[ 26.784664] irq 115: Affinity broken due to vector space exhaustion.
[ 26.785084] irq 116: Affinity broken due to vector space exhaustion.
[ 26.786441] bond0: (slave eno1): Enslaving as a backup interface with an up link
[ 26.851399] i40e 0000:3d:00.1 eno2: set new mac address a4:bf:01:86:15:5b
[ 26.889448] irq 169: Affinity broken due to vector space exhaustion.
[ 26.890979] irq 170: Affinity broken due to vector space exhaustion.
[ 26.891549] irq 171: Affinity broken due to vector space exhaustion.
[ 26.892079] irq 172: Affinity broken due to vector space exhaustion.
[ 26.892568] irq 173: Affinity broken due to vector space exhaustion.
[ 26.893017] irq 174: Affinity broken due to vector space exhaustion.
[ 26.893448] irq 175: Affinity broken due to vector space exhaustion.
[ 26.893871] irq 176: Affinity broken due to vector space exhaustion.
[ 26.894282] irq 177: Affinity broken due to vector space exhaustion.
[ 26.894680] irq 178: Affinity broken due to vector space exhaustion.
[ 26.895099] irq 179: Affinity broken due to vector space exhaustion.
[ 26.895482] irq 180: Affinity broken due to vector space exhaustion.
[ 26.895861] irq 181: Affinity broken due to vector space exhaustion.
[ 26.896235] irq 182: Affinity broken due to vector space exhaustion.
[ 26.896625] irq 183: Affinity broken due to vector space exhaustion.
[ 26.896970] irq 184: Affinity broken due to vector space exhaustion.
[ 26.897439] irq 201: Affinity broken due to vector space exhaustion.
[ 26.897771] irq 202: Affinity broken due to vector space exhaustion.
[ 26.898095] irq 203: Affinity broken due to vector space exhaustion.
[ 26.898411] irq 204: Affinity broken due to vector space exhaustion.
[ 26.898724] irq 205: Affinity broken due to vector space exhaustion.
[ 26.899076] irq 206: Affinity broken due to vector space exhaustion.
[ 26.899393] irq 207: Affinity broken due to vector space exhaustion.
[ 26.899706] irq 208: Affinity broken due to vector space exhaustion.
[ 26.900011] irq 209: Affinity broken due to vector space exhaustion.
[ 26.900335] irq 210: Affinity broken due to vector space exhaustion.
[ 26.900624] irq 211: Affinity broken due to vector space exhaustion.
[ 26.900906] irq 212: Affinity broken due to vector space exhaustion.
[ 26.901181] irq 213: Affinity broken due to vector space exhaustion.
[ 26.901455] irq 214: Affinity broken due to vector space exhaustion.
[ 26.901725] irq 215: Affinity broken due to vector space exhaustion.
[ 26.901988] irq 216: Affinity broken due to vector space exhaustion.
[ 26.903033] bond0: (slave eno2): Enslaving as a backup interface with an up link
[ 26.907560] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
[ 26.933296] vmbr0: port 1(bond0) entered blocking state
[ 26.933576] vmbr0: port 1(bond0) entered disabled state
[ 26.933940] device bond0 entered promiscuous mode
[ 26.934194] device eno1 entered promiscuous mode
[ 26.934463] device eno2 entered promiscuous mode
[ 26.937361] vmbr0: port 1(bond0) entered blocking state
[ 26.937616] vmbr0: port 1(bond0) entered forwarding state
[ 27.049355] device bond0 left promiscuous mode
[ 27.049594] device eno1 left promiscuous mode
[ 27.049844] device eno2 left promiscuous mode
[ 27.146285] bpfilter: Loaded bpfilter_umh pid 4506
[ 27.146439] Started bpfilter
[ 27.895286] i40e 0000:3d:00.1: eno2 is entering allmulti mode.
[ 27.896363] i40e 0000:3d:00.0: eno1 is entering allmulti mode.


*************

auto lo
iface lo inet loopback

auto eno1
iface eno1 inet manual

auto eno2
iface eno2 inet manual

auto ens803f0
iface ens803f0 inet manual

auto ens803f1
iface ens803f1 inet manual

auto bond0
iface bond0 inet manual
slaves eno1 eno2
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
offload-rx-vlan-filter off

auto vmbr0
iface vmbr0 inet static
address 192.168.183.10
netmask 255.255.255.0
gateway 192.168.183.1
bridge-ports bond0
bridge-vlan_aware yes
bridge-stp off
bridge-fd 0


************
 
Note that the option offload-rx-vlan-filter in network/interfaces requires the installation of the ethtool package (which is not installed by default).
 
Here is my "success" story with this bug.


Getting rid of the logging was good but not the solution. I silenced the logs with:

Code:
auto vmbr0
iface vmbr0 inet manual
#iface vmbr0 inet static
        bridge-ports bond1
        bridge-stp off
        bridge-fd 0
        bridge-vlan-aware yes
        bridge-vids 100

Note the bridge-vids. I currently only have one vlan. Log was clean, but system was still hanging as op reported.

offload-rx-vlan-filter alone did not fix the issue.

Did to do a FW-Upgrade of the nics and had to set several offloading features off for the nics and bonds. No idea if setting them for bond alone would be sufficient.
But as ethtools -k reports flags differently for the real nics and bond, i guess it has to be set atlest on the phyiscal interfaces.

iface enp33s0f1 inet manual
offload-rxvlan off
offload-txvlan off
offload-tso off
offload-rx-vlan-filter off

iface enp33s0f2 inet manual
offload-rxvlan off
offload-txvlan off
offload-tso off
offload-rx-vlan-filter off


auto bond0
iface bond0 inet static
address 172.16.0.11
netmask 255.255.255.0
bond-slaves enp33s0f1 enp33s0f2
bond-miimon 100
bond-mode broadcast
offload-rxvlan off
offload-txvlan off
offload-tso off
offload-rx-vlan-filter off


Now problem is gone here.
 
Last edited:
Looks like this did not finally solve my problem. Anyone else an idea how to fix this permanently?
 
Here are some observations i've made. Maybe others can relate:

After rebooting host1, also host3 looses all of it's links according to KNET. These are independent bonds in my case. The links itself did not go down. I still had running pings over these links. This must be a problem of some higher or lower level.

Code:
Feb  6 22:55:14 pve4 corosync[3599]:   [KNET  ] link: host: 1 link: 0 is down
Feb  6 22:55:14 pve4 corosync[3599]:   [KNET  ] link: host: 1 link: 1 is down
Feb  6 22:55:14 pve4 corosync[3599]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Feb  6 22:55:14 pve4 corosync[3599]:   [KNET  ] host: host: 1 has no active links
Feb  6 22:55:14 pve4 corosync[3599]:   [KNET  ] host: host: 1 (passive) best link: 0 (pri: 1)
Feb  6 22:55:14 pve4 corosync[3599]:   [KNET  ] host: host: 1 has no active links
Feb  6 22:55:21 pve4 corosync[3599]:   [KNET  ] link: host: 3 link: 0 is down
Feb  6 22:55:21 pve4 corosync[3599]:   [KNET  ] link: host: 3 link: 1 is down
Feb  6 22:55:21 pve4 corosync[3599]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb  6 22:55:21 pve4 corosync[3599]:   [KNET  ] host: host: 3 has no active links
Feb  6 22:55:21 pve4 corosync[3599]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb  6 22:55:21 pve4 corosync[3599]:   [KNET  ] host: host: 3 has no active links
Feb  6 22:55:22 pve4 corosync[3599]:   [KNET  ] rx: host: 3 link: 0 is up
Feb  6 22:55:22 pve4 corosync[3599]:   [KNET  ] rx: host: 3 link: 1 is up

Even though pings going forth and back, ceph complains:

Code:
Feb  6 23:05:46 pve4 ceph-osd[362861]: 2022-02-06T23:05:46.906+0100 7fac59de1700 -1 osd.9 12160 heartbeat_check: no reply from 172.16.0.13:6812 osd.12 since back 2022-02-06T22:55:17.509263+0100 front 2022-02-06T22:55:17.509285+0100 (oldest deadline 2022-02-06T22:55:38.609106+0100)
 
Last edited:
This is actually keeping us from configuring a new PVE host successfully. The M/B is a Supermicro one with a Xeon Silver gen2, and two 10G SFP+ NICs, 710 based.

It fails with VLANs configured.
 
Hi,
any news on this topic?
Also working with Intel Xeon Silver CPU and 2 10g SFP+ NICs based on Intel X710
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!