Hello,
I have two supermicro servers (SM-HV01 and SM-HV02) running proxmox ve 7.3-4. These servers are directly connectet with two 10Gbit/s Fiber DAC Cables (enp3s0f0 and enp3s0f1), and one 10Gbit/s Ethernet cable (eno2). The outside interface of these two servers are connected to the datacenter uplink switch (eno1). The fiber interfaces are in a bond state with lacp 802.3ad. The interfaces eno1 and eno2 (driver: igb) are working fine and there seems to be no problem. Since ive upgraded to proxmox 7.3-4, the fiber interfaces went down under havy tx load (see the log under section syslog). The interesting part is, that only the server SM-HV01 produces this error. The only workaround is rebooting the hypervisor SM-HV01.
What i have already tried:
- disable offloading on both servers and interfaces
- disable pcie aspm under /etc/default/grub
- update ixgbe driver to 5.18.6
- disable port enp3s0f0 or enp3s0f1
- trying using another bond mode (active-backup)
since that doesnt work, i switched back to lacp
- disable jumbo frames on enp3s0f0 and enp3s0f1
Host (SM-HV02)
Host (SM-HV02)
Host (SM-HV02)
thank you for your help!
sebeschn
I have two supermicro servers (SM-HV01 and SM-HV02) running proxmox ve 7.3-4. These servers are directly connectet with two 10Gbit/s Fiber DAC Cables (enp3s0f0 and enp3s0f1), and one 10Gbit/s Ethernet cable (eno2). The outside interface of these two servers are connected to the datacenter uplink switch (eno1). The fiber interfaces are in a bond state with lacp 802.3ad. The interfaces eno1 and eno2 (driver: igb) are working fine and there seems to be no problem. Since ive upgraded to proxmox 7.3-4, the fiber interfaces went down under havy tx load (see the log under section syslog). The interesting part is, that only the server SM-HV01 produces this error. The only workaround is rebooting the hypervisor SM-HV01.
What i have already tried:
- disable offloading on both servers and interfaces
Code:
ethtool -K enp3s0f0 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off
ethtool -K enp3s0f1 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off
- disable pcie aspm under /etc/default/grub
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off"
- update ixgbe driver to 5.18.6
- disable port enp3s0f0 or enp3s0f1
- trying using another bond mode (active-backup)
since that doesnt work, i switched back to lacp
- disable jumbo frames on enp3s0f0 and enp3s0f1
NIC
Host (SM-HV01 and SM-HV02)
Code:
lspci -nnk | grep -A2 Ethernet
eno1
01:00.0 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)
DeviceName: Onboard Intel Ethernet 1
Subsystem: Super Micro Computer Inc I350 Gigabit Network Connection [15d9:1521]
Kernel driver in use: igb
--
eno2
01:00.1 Ethernet controller [0200]: Intel Corporation I350 Gigabit Network Connection [8086:1521] (rev 01)
DeviceName: Onboard Intel Ethernet 2
Subsystem: Super Micro Computer Inc I350 Gigabit Network Connection [15d9:1521]
Kernel driver in use: igb
--
enp3s0f0 and enp3s0f1
03:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
Subsystem: Hewlett-Packard Company Ethernet 10Gb 2-port 560SFP+ Adapter [103c:17d3]
Kernel driver in use: ixgbe
Kernel modules: ixgbe
03:00.1 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection [8086:10fb] (rev 01)
Subsystem: Hewlett-Packard Company Ethernet 10Gb 2-port 560SFP+ Adapter [103c:17d3]
Kernel driver in use: ixgbe
Kernel modules: ixgbe
Driver
Host (SM-HV01)
Code:
root@SM-HV01:~# ethtool -i enp3s0f0
driver: ixgbe
version: 5.18.6
firmware-version: 0x80000835, 1.1200.0
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
root@SM-HV01:~# ethtool -i enp3s0f1
driver: ixgbe
version: 5.18.6
firmware-version: 0x80000835, 1.1200.0
expansion-rom-version:
bus-info: 0000:03:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
Host (SM-HV02)
Code:
root@SM-HV02:~# ethtool -i enp3s0f0
driver: ixgbe
version: 5.18.6
firmware-version: 0x80000811, 1.1099.0
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
root@SM-HV02:~# ethtool -i enp3s0f1
driver: ixgbe
version: 5.18.6
firmware-version: 0x80000811, 1.1099.0
expansion-rom-version:
bus-info: 0000:03:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
Networkconfiguration
Host (SM-HV01)
Code:
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
auto enp3s0f0
iface enp3s0f0 inet manual
bond-master bond1
auto enp3s0f1
iface enp3s0f1 inet manual
bond-master bond1
auto eno2.11
iface eno2.11 inet static
address 0.0.0.0
auto eno2.21
iface eno2.21 inet static
address 10.x.x.x/27
alias HVCL01
#SM-VSW-HVCL01
auto eno2.22
iface eno2.22 inet static
address 0.0.0.0
mask 0.0.0.0
auto bond1
iface bond1 inet static
address 0.0.0.0/32
bond-slaves enp3s0f0 enp3s0f1
bond-miimon 100
bond-mode 802.3ad
bond-lacp-rate 1
auto bond1.31
iface bond1.31 inet static
address 0.0.0.0
alias ISCSI01
auto bond1.32
iface bond1.32 inet static
address 0.0.0.0
alias ISCSI02
auto bond1.41
iface bond1.41 inet static
address 0.0.0.0
alias DMZ01
auto bond1.42
iface bond1.42 inet static
address 0.0.0.0
alias DMZ02
auto vmbr0
iface vmbr0 inet static
address 10.x.x.x/24
bridge-ports eno1
bridge-stp off
bridge-fd 0
#OUTSIDE
auto vmbr11
iface vmbr11 inet static
address 10.x.x.x/24
gateway 10.x.x.x
bridge-ports eno2.11
bridge-stp off
bridge-fd 0
alias MGMT01
#SM-VSW-MGMT01
auto vmbr22
iface vmbr22 inet static
address 0.0.0.0
bridge-ports eno2.22
bridge-stp off
bridge-fd 0
alias FWCL01
mask 0.0.0.0
#SM-VSW-FWCL01
auto vmbr31
iface vmbr31 inet static
address 0.0.0.0
bridge-ports bond1.31
bridge-stp off
bridge-fd 0
alias ISCSI01
#SM-VSW-ISCSI01
auto vmbr32
iface vmbr32 inet static
address 0.0.0.0
bridge-ports bond1.32
bridge-stp off
bridge-fd 0
alias ISCSI02
#SM-VSW-ISCSI02
auto vmbr41
iface vmbr41 inet static
address 0.0.0.0/32
bridge-ports bond1.41
bridge-stp off
bridge-fd 0
#SM-VSW-DMZ01
auto vmbr42
iface vmbr42 inet static
address 0.0.0.0
bridge-ports bond1.42
bridge-stp off
bridge-fd 0
#SM-VSW-DMZ02
Host (SM-HV02)
Code:
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
auto enp3s0f0
iface enp3s0f0 inet manual
bond-master bond1
auto enp3s0f1
iface enp3s0f1 inet manual
bond-master bond1
auto eno2.11
iface eno2.11 inet static
address 0.0.0.0
auto eno2.21
iface eno2.21 inet static
address 10.x.x.x/27
alias HVCL01
#SM-VSW-HVCL01
auto eno2.22
iface eno2.22 inet static
address 0.0.0.0
mask 0.0.0.0
auto bond1
iface bond1 inet static
address 0.0.0.0/32
bond-slaves enp3s0f0 enp3s0f1
bond-miimon 100
bond-mode 802.3ad
bond-lacp-rate 1
auto bond1.31
iface bond1.31 inet static
address 0.0.0.0
alias ISCSI01
auto bond1.32
iface bond1.32 inet static
address 0.0.0.0
alias ISCSI02
auto bond1.41
iface bond1.41 inet static
address 0.0.0.0
alias DMZ01
auto bond1.42
iface bond1.42 inet static
address 0.0.0.0
alias DMZ02
auto vmbr0
iface vmbr0 inet static
address 10.x.x.x/24
bridge-ports eno1
bridge-stp off
bridge-fd 0
#OUTSIDE
auto vmbr11
iface vmbr11 inet static
address 10.x.x.x/24
gateway 10.x.x.x
bridge-ports eno2.11
bridge-stp off
bridge-fd 0
alias MGMT01
#SM-VSW-MGMT01
auto vmbr22
iface vmbr22 inet static
address 0.0.0.0
bridge-ports eno2.22
bridge-stp off
bridge-fd 0
alias FWCL01
mask 0.0.0.0
#SM-VSW-FWCL01
auto vmbr31
iface vmbr31 inet static
address 0.0.0.0
bridge-ports bond1.31
bridge-stp off
bridge-fd 0
alias ISCSI01
#SM-VSW-ISCSI01
auto vmbr32
iface vmbr32 inet static
address 0.0.0.0
bridge-ports bond1.32
bridge-stp off
bridge-fd 0
alias ISCSI02
#SM-VSW-ISCSI02
auto vmbr41
iface vmbr41 inet static
address 0.0.0.0/32
bridge-ports bond1.41
bridge-stp off
bridge-fd 0
#SM-VSW-DMZ01
auto vmbr42
iface vmbr42 inet static
address 0.0.0.0
bridge-ports bond1.42
bridge-stp off
bridge-fd 0
#SM-VSW-DMZ02
Syslog
Host (SM-HV01)
Code:
Dec 31 15:02:39 SM-HV01 kernel: ixgbe 0000:03:00.1 enp3s0f1: Detected Tx Unit Hang Tx Queue <19> TDH, TDT <0>, <5> next_to_use <5> next_to_clean <0> tx_buffer_info[next_to_clean] time_stamp <1000031da> jiffies <1000035c8>
Dec 31 15:02:39 SM-HV01 kernel: ixgbe 0000:03:00.1 enp3s0f1: tx hang 4 detected on queue 19, resetting adapter
Dec 31 15:02:39 SM-HV01 kernel: ixgbe 0000:03:00.1 enp3s0f1: Reset adapter
Dec 31 15:02:40 SM-HV01 kernel: ixgbe 0000:03:00.1 enp3s0f1: RXDCTL.ENABLE for one or more queues not cleared within the polling period
Dec 31 15:02:40 SM-HV01 kernel: ixgbe 0000:03:00.1 enp3s0f1: TXDCTL.ENABLE for one or more queues not cleared within the polling period
Dec 31 15:02:40 SM-HV01 kernel: ixgbe 0000:03:00.1 enp3s0f1: PCIe transaction pending bit also did not clear.
Dec 31 15:02:40 SM-HV01 kernel: ixgbe 0000:03:00.1: primary disable timed out
Dec 31 15:02:40 SM-HV01 kernel: bond1: (slave enp3s0f1): speed changed to 0 on port 2
Dec 31 15:02:40 SM-HV01 kernel: bond1: (slave enp3s0f1): link status definitely down, disabling slave
Dec 31 15:02:40 SM-HV01 kernel: ixgbe 0000:03:00.1 enp3s0f1: detected SFP+: 3
Dec 31 15:02:41 SM-HV01 kernel: ixgbe 0000:03:00.1 enp3s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Dec 31 15:02:41 SM-HV01 kernel: bond1: (slave enp3s0f1): link status definitely up, 10000 Mbps full duplex
Dec 31 15:02:41 SM-HV01 kernel: ixgbe 0000:03:00.1 enp3s0f1: Detected Tx Unit Hang Tx Queue <21> TDH, TDT <0>, <1> next_to_use <1> next_to_clean <0> tx_buffer_info[next_to_clean] time_stamp <10000370c> jiffies <10000373f>
Dec 31 15:02:41 SM-HV01 kernel: ixgbe 0000:03:00.1 enp3s0f1: tx hang 5 detected on queue 21, resetting adapter
Dec 31 15:02:41 SM-HV01 kernel: ixgbe 0000:03:00.1 enp3s0f1: Reset adapter
Dec 31 15:02:41 SM-HV01 kernel: ixgbe 0000:03:00.1 enp3s0f1: RXDCTL.ENABLE for one or more queues not cleared within the polling period
Dec 31 15:02:41 SM-HV01 kernel: ixgbe 0000:03:00.1 enp3s0f1: TXDCTL.ENABLE for one or more queues not cleared within the polling period
Dec 31 15:02:41 SM-HV01 kernel: ixgbe 0000:03:00.1 enp3s0f1: PCIe transaction pending bit also did not clear.
Dec 31 15:02:41 SM-HV01 kernel: ixgbe 0000:03:00.1: primary disable timed out
Dec 31 15:02:41 SM-HV01 kernel: bond1: (slave enp3s0f1): link status definitely down, disabling slave
Dec 31 15:02:41 SM-HV01 kernel: ixgbe 0000:03:00.1 enp3s0f1: detected SFP+: 3
Host (SM-HV02)
Code:
Dec 31 15:02:36 SM-HV02 kernel: ixgbe 0000:03:00.1 enp3s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Dec 31 15:02:36 SM-HV02 kernel: bond1: (slave enp3s0f1): link status definitely up, 10000 Mbps full duplex
Dec 31 15:02:40 SM-HV02 kernel: ixgbe 0000:03:00.1 enp3s0f1: NIC Link is Down
Dec 31 15:02:40 SM-HV02 kernel: bond1: (slave enp3s0f1): speed changed to 0 on port 2
Dec 31 15:02:40 SM-HV02 kernel: bond1: (slave enp3s0f1): link status definitely down, disabling slave
Dec 31 15:02:42 SM-HV02 kernel: ixgbe 0000:03:00.1 enp3s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Dec 31 15:02:42 SM-HV02 kernel: bond1: (slave enp3s0f1): link status definitely up, 10000 Mbps full duplex
Dec 31 15:02:47 SM-HV02 kernel: ixgbe 0000:03:00.1 enp3s0f1: NIC Link is Down
Dec 31 15:02:47 SM-HV02 kernel: bond1: (slave enp3s0f1): speed changed to 0 on port 2
Dec 31 15:02:47 SM-HV02 kernel: bond1: (slave enp3s0f1): link status definitely down, disabling slave
Dec 31 15:02:48 SM-HV02 kernel: ixgbe 0000:03:00.1 enp3s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Dec 31 15:02:48 SM-HV02 kernel: bond1: (slave enp3s0f1): link status definitely up, 10000 Mbps full duplex
Dec 31 15:02:48 SM-HV02 kernel: ixgbe 0000:03:00.1 enp3s0f1: NIC Link is Down
Dec 31 15:02:48 SM-HV02 kernel: bond1: (slave enp3s0f1): speed changed to 0 on port 2
Dec 31 15:02:48 SM-HV02 kernel: bond1: (slave enp3s0f1): link status definitely down, disabling slave
Dec 31 15:02:50 SM-HV02 kernel: ixgbe 0000:03:00.1 enp3s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Dec 31 15:02:50 SM-HV02 kernel: bond1: (slave enp3s0f1): link status definitely up, 10000 Mbps full duplex
Dec 31 15:02:53 SM-HV02 kernel: ixgbe 0000:03:00.1 enp3s0f1: NIC Link is Down
Dec 31 15:02:53 SM-HV02 kernel: bond1: (slave enp3s0f1): speed changed to 0 on port 2
Dec 31 15:02:54 SM-HV02 kernel: bond1: (slave enp3s0f1): link status definitely down, disabling slave
Dec 31 15:02:55 SM-HV02 kernel: ixgbe 0000:03:00.1 enp3s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Dec 31 15:02:55 SM-HV02 kernel: bond1: (slave enp3s0f1): link status definitely up, 10000 Mbps full duplex
Dec 31 15:02:58 SM-HV02 kernel: ixgbe 0000:03:00.1 enp3s0f1: NIC Link is Down
Dec 31 15:02:58 SM-HV02 kernel: bond1: (slave enp3s0f1): speed changed to 0 on port 2
Dec 31 15:02:59 SM-HV02 kernel: bond1: (slave enp3s0f1): link status definitely down, disabling slave
Dec 31 15:02:59 SM-HV02 kernel: ixgbe 0000:03:00.1 enp3s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Dec 31 15:02:59 SM-HV02 kernel: bond1: (slave enp3s0f1): link status definitely up, 10000 Mbps full duplex
Dec 31 15:03:02 SM-HV02 kernel: ixgbe 0000:03:00.1 enp3s0f1: NIC Link is Down
Dec 31 15:03:02 SM-HV02 kernel: bond1: (slave enp3s0f1): speed changed to 0 on port 2
Dec 31 15:03:02 SM-HV02 kernel: bond1: (slave enp3s0f1): link status definitely down, disabling slave
Dec 31 15:03:03 SM-HV02 kernel: ixgbe 0000:03:00.1 enp3s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Dec 31 15:03:03 SM-HV02 kernel: bond1: (slave enp3s0f1): link status definitely up, 10000 Mbps full duplex
thank you for your help!
sebeschn