Hello everyone,
I have come across very peculiar issue. Some time ago my institution acquired 6 identical Dell r7515 servers. I've provisioned them into two Proxmox clusters 3 nodes each. They all have 10Gbit Intel X710 NIC, dual port, aggregated into LACP bond.
1st cluster works no problem, the second one however has upload capped at around 6Mbit/s.
All servers are running the same proxmox version and kernel. All have the same firmware, bios, etc.
The working cluster initializes seemingly without any issues:
But on the other cluster the dmesg has issues (this might seem it has newer FW because i tried recent Dell FW upgrade from 22.x to 23.x for the NIC today, but the result is all the same):
I've tried running other OS'es (Rockylinux 8, 9; Debian 12) and even without network config they have similar dmesg messages.
I have also tried Intel's i40e driver but it only made things worse since with them I lost all ability to connect to the network, no pinging gateway, no nothing.
I have come across very peculiar issue. Some time ago my institution acquired 6 identical Dell r7515 servers. I've provisioned them into two Proxmox clusters 3 nodes each. They all have 10Gbit Intel X710 NIC, dual port, aggregated into LACP bond.
1st cluster works no problem, the second one however has upload capped at around 6Mbit/s.
All servers are running the same proxmox version and kernel. All have the same firmware, bios, etc.
The working cluster initializes seemingly without any issues:
Code:
root@ocean-pve1-r7515:~# dmesg -T | grep i40e
[Tue Dec 17 16:54:44 2024] i40e: Intel(R) Ethernet Connection XL710 Network Driver
[Tue Dec 17 16:54:44 2024] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
[Tue Dec 17 16:54:44 2024] i40e 0000:02:00.0: fw 9.840.76614 api 1.15 nvm 9.40 0x8000e9c2 22.5.7 [8086:1572] [8086:0006]
[Tue Dec 17 16:54:44 2024] i40e 0000:02:00.0: MAC address: 6c:fe:54:7b:e9:20
[Tue Dec 17 16:54:44 2024] i40e 0000:02:00.0: FW LLDP is enabled
[Tue Dec 17 16:54:44 2024] i40e 0000:02:00.0 eth2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
[Tue Dec 17 16:54:44 2024] i40e 0000:02:00.0: PCI-Express: Speed 8.0GT/s Width x8
[Tue Dec 17 16:54:44 2024] i40e 0000:02:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 48 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
[Tue Dec 17 16:54:44 2024] i40e 0000:02:00.1: fw 9.840.76614 api 1.15 nvm 9.40 0x8000e9c2 22.5.7 [8086:1572] [8086:0006]
[Tue Dec 17 16:54:44 2024] i40e 0000:02:00.1: MAC address: 6c:fe:54:7b:e9:21
[Tue Dec 17 16:54:44 2024] i40e 0000:02:00.1: FW LLDP is enabled
[Tue Dec 17 16:54:44 2024] i40e 0000:02:00.1 eth3: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
[Tue Dec 17 16:54:44 2024] i40e 0000:02:00.1: PCI-Express: Speed 8.0GT/s Width x8
[Tue Dec 17 16:54:44 2024] i40e 0000:02:00.1: Features: PF-id[1] VFs: 64 VSIs: 66 QP: 48 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
[Tue Dec 17 16:54:45 2024] i40e 0000:02:00.1 ens5f1np1: renamed from eth3
[Tue Dec 17 16:54:45 2024] i40e 0000:02:00.0 ens5f0np0: renamed from eth2
[Tue Dec 17 16:54:58 2024] i40e 0000:02:00.1 ens5f1np1: set new mac address 6c:fe:54:7b:e9:20
[Tue Dec 17 16:54:58 2024] i40e 0000:02:00.0 ens5f0np0: entered allmulticast mode
[Tue Dec 17 16:54:58 2024] i40e 0000:02:00.1 ens5f1np1: entered allmulticast mode
[Tue Dec 17 16:54:58 2024] i40e 0000:02:00.0 ens5f0np0: entered promiscuous mode
[Tue Dec 17 16:54:58 2024] i40e 0000:02:00.1 ens5f1np1: entered promiscuous mode
[Tue Dec 17 16:54:58 2024] i40e 0000:02:00.0: entering allmulti mode.
[Tue Dec 17 16:54:58 2024] i40e 0000:02:00.1: entering allmulti mode.
But on the other cluster the dmesg has issues (this might seem it has newer FW because i tried recent Dell FW upgrade from 22.x to 23.x for the NIC today, but the result is all the same):
Code:
root@paw1-pve2-r7515:~# dmesg -T | grep i40e
[Fri Jan 17 17:58:10 2025] i40e: Intel(R) Ethernet Connection XL710 Network Driver
[Fri Jan 17 17:58:10 2025] i40e: Copyright (c) 2013 - 2019 Intel Corporation.
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.0: fw 9.851.77832 api 1.15 nvm 9.50 0x8000f255 23.0.8 [8086:1572] [8086:0006]
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.0: MAC address: 6c:fe:54:7b:fe:00
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.0: FW LLDP is enabled
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.0: couldn't add VEB, err -EIO aq_err I40E_AQ_RC_EINVAL
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.0: couldn't add VEB
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.0: Couldn't create FDir VSI
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.0 eth2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.0: PCI-Express: Speed 8.0GT/s Width x8
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.0: Features: PF-id[0] VFs: 64 VSIs: 2 QP: 48 RSS FD_ATR DCB VxLAN Geneve PTP VEPA
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.1: fw 9.851.77832 api 1.15 nvm 9.50 0x8000f255 23.0.8 [8086:1572] [8086:0006]
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.1: MAC address: 6c:fe:54:7b:fe:01
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.1: FW LLDP is enabled
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.1: couldn't add VEB, err -EIO aq_err I40E_AQ_RC_EINVAL
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.1: couldn't add VEB
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.1: Couldn't create FDir VSI
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.1 eth3: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.1: PCI-Express: Speed 8.0GT/s Width x8
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.1: Features: PF-id[1] VFs: 64 VSIs: 2 QP: 48 RSS FD_ATR DCB VxLAN Geneve PTP VEPA
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.0 ens5f0np0: renamed from eth2
[Fri Jan 17 17:58:10 2025] i40e 0000:02:00.1 ens5f1np1: renamed from eth3
[Fri Jan 17 17:58:23 2025] i40e 0000:02:00.1 ens5f1np1: set new mac address 6c:fe:54:7b:fe:00
[Fri Jan 17 17:58:23 2025] i40e 0000:02:00.0 ens5f0np0: entered allmulticast mode
[Fri Jan 17 17:58:23 2025] i40e 0000:02:00.1 ens5f1np1: entered allmulticast mode
[Fri Jan 17 17:58:23 2025] i40e 0000:02:00.0 ens5f0np0: entered promiscuous mode
[Fri Jan 17 17:58:23 2025] i40e 0000:02:00.1 ens5f1np1: entered promiscuous mode
[Fri Jan 17 17:58:23 2025] i40e 0000:02:00.1: entering allmulti mode.
[Fri Jan 17 17:58:24 2025] i40e 0000:02:00.0: entering allmulti mode.
[Fri Jan 17 17:58:28 2025] i40e 0000:02:00.1 ens5f1np1: NETDEV WATCHDOG: CPU: 13: transmit queue 23 timed out 5160 ms
[Fri Jan 17 17:58:28 2025] i40e 0000:02:00.1 ens5f1np1: tx_timeout: VSI_seid: 389, Q 23, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
[Fri Jan 17 17:58:28 2025] i40e 0000:02:00.1 ens5f1np1: tx_timeout recovery level 1, txqueue 23
[Fri Jan 17 17:58:28 2025] i40e 0000:02:00.0 ens5f0np0: NETDEV WATCHDOG: CPU: 13: transmit queue 15 timed out 5232 ms
[Fri Jan 17 17:58:28 2025] i40e 0000:02:00.0 ens5f0np0: tx_timeout: VSI_seid: 388, Q 15, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
[Fri Jan 17 17:58:28 2025] i40e 0000:02:00.0 ens5f0np0: tx_timeout recovery level 1, txqueue 15
[Fri Jan 17 17:58:28 2025] i40e 0000:02:00.1: VSI seid 389 Tx ring 0 disable timeout
[Fri Jan 17 17:58:28 2025] i40e 0000:02:00.0: VSI seid 388 Tx ring 0 disable timeout
[Fri Jan 17 17:58:28 2025] i40e 0000:02:00.1: entering allmulti mode.
[Fri Jan 17 17:58:28 2025] i40e 0000:02:00.0: Error I40E_AQ_RC_EINVAL adding RX filters on PF, promiscuous mode forced on
[Fri Jan 17 17:58:28 2025] i40e 0000:02:00.0: entering allmulti mode.
[Fri Jan 17 17:58:34 2025] i40e 0000:02:00.0 ens5f0np0: NETDEV WATCHDOG: CPU: 13: transmit queue 23 timed out 5688 ms
[Fri Jan 17 17:58:34 2025] i40e 0000:02:00.0 ens5f0np0: tx_timeout: VSI_seid: 388, Q 23, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
[Fri Jan 17 17:58:34 2025] i40e 0000:02:00.0 ens5f0np0: tx_timeout recovery level 2, txqueue 23
[Fri Jan 17 17:58:34 2025] i40e 0000:02:00.1 ens5f1np1: NETDEV WATCHDOG: CPU: 13: transmit queue 3 timed out 5504 ms
[Fri Jan 17 17:58:34 2025] i40e 0000:02:00.1 ens5f1np1: tx_timeout: VSI_seid: 389, Q 3, NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
[Fri Jan 17 17:58:34 2025] i40e 0000:02:00.1 ens5f1np1: tx_timeout recovery level 2, txqueue 3
[Fri Jan 17 17:58:34 2025] i40e 0000:02:00.0: VSI seid 388 Tx ring 0 disable timeout
[Fri Jan 17 17:58:34 2025] i40e 0000:02:00.1: VSI seid 389 Tx ring 0 disable timeout
[Fri Jan 17 17:58:37 2025] i40e 0000:02:00.0: Error I40E_AQ_RC_EINVAL adding RX filters on PF, promiscuous mode forced on
[Fri Jan 17 17:58:37 2025] i40e 0000:02:00.0: entering allmulti mode.
[Fri Jan 17 17:58:37 2025] i40e 0000:02:00.1: entering allmulti mode.
[Fri Jan 17 17:58:41 2025] i40e 0000:02:00.0: Error I40E_AQ_RC_EINVAL adding RX filters on PF, promiscuous mode forced on
[Fri Jan 17 17:58:41 2025] i40e 0000:02:00.0: entering allmulti mode.
[Fri Jan 17 17:58:41 2025] i40e 0000:02:00.1: entering allmulti mode.
I've tried running other OS'es (Rockylinux 8, 9; Debian 12) and even without network config they have similar dmesg messages.
I have also tried Intel's i40e driver but it only made things worse since with them I lost all ability to connect to the network, no pinging gateway, no nothing.