Hello:
I have been experiencing this for ~1.5 years through several versions of proxmox. Workarounds work, but I'm inquiring about a fix.
The system is an Intel NUC8i5BEH latest BIOS "BECFL357.86A.0087.2020.1209.1115 12/09/2020".
The host OS is currently up to date, proxmox-ve 6.4-1.
I have loaded optional kernel 5.11.17-1-pve at the recommendation of some because 5.11 solved their Ethernet problems.
The onboard NIC is I219-V
As mentioned, this has been present on multiple versions of PVE and multiple kernels. I am not positive what version of e1000e is included with 5.11.17-1-pve but it has changed since the half dozen kernel revisions preceding it (or at least it is not reporting a 3.2.x-y revision format):
The problem can be triggered by using iperf in a guest ubuntu OS: "iperf3 -c -t 240 -P8".
Running the same from the host OS doesn't seem to trigger the issue.
I have "VLAN aware" configured on the interface.
The workaround is to disable TSO with 'ethtool -k eno1 tso off'.
I did not have this problem when running ubuntu LTS 18.04 on the same hardware
When the problem is triggered, these messages are logged:
Is this a bug in e1000e? Is it fixed in the "current" revision of e1000e, but PVE does not include that revision at this time, or is it still broken in the current revision? The "PC vendor" is Intel themselves, if this is Intel's problem how do I convince them of that?
Thanks
I have been experiencing this for ~1.5 years through several versions of proxmox. Workarounds work, but I'm inquiring about a fix.
The system is an Intel NUC8i5BEH latest BIOS "BECFL357.86A.0087.2020.1209.1115 12/09/2020".
The host OS is currently up to date, proxmox-ve 6.4-1.
I have loaded optional kernel 5.11.17-1-pve at the recommendation of some because 5.11 solved their Ethernet problems.
The onboard NIC is I219-V
Code:
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (6) I219-V (rev 30)
Subsystem: Intel Corporation Ethernet Connection (6) I219-V
Flags: bus master, fast devsel, latency 0, IRQ 137
Memory at c0b00000 (32-bit, non-prefetchable) [size=128K]
Capabilities: [c8] Power Management version 3
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Kernel driver in use: e1000e
Kernel modules: e1000e
As mentioned, this has been present on multiple versions of PVE and multiple kernels. I am not positive what version of e1000e is included with 5.11.17-1-pve but it has changed since the half dozen kernel revisions preceding it (or at least it is not reporting a 3.2.x-y revision format):
Code:
# modinfo -k 5.11.17-1-pve e1000e
filename: /lib/modules/5.11.17-1-pve/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
license: GPL v2
description: Intel(R) PRO/1000 Network Driver
author: Intel Corporation, <linux.nics@intel.com>
srcversion: 8543CA62F65379D0D09CCD6
root@pve01:~# modinfo -k 5.4.114-1-pve e1000e
filename: /lib/modules/5.4.114-1-pve/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
version: 3.2.6-k
license: GPL v2
description: Intel(R) PRO/1000 Network Driver
author: Intel Corporation, <linux.nics@intel.com>
srcversion: A9698026892EE8F2061C993
The problem can be triggered by using iperf in a guest ubuntu OS: "iperf3 -c -t 240 -P8".
Running the same from the host OS doesn't seem to trigger the issue.
I have "VLAN aware" configured on the interface.
The workaround is to disable TSO with 'ethtool -k eno1 tso off'.
I did not have this problem when running ubuntu LTS 18.04 on the same hardware
When the problem is triggered, these messages are logged:
Code:
[Tue May 18 15:14:19 2021] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
TDH <2e>
TDT <57>
next_to_use <57>
next_to_clean <2d>
buffer_info[next_to_clean]:
time_stamp <10524cc6e>
next_to_watch <2e>
jiffies <10524ce60>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[Tue May 18 15:14:21 2021] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
TDH <2e>
TDT <57>
next_to_use <57>
next_to_clean <2d>
buffer_info[next_to_clean]:
time_stamp <10524cc6e>
next_to_watch <2e>
jiffies <10524d059>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[Tue May 18 15:14:23 2021] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
TDH <2e>
TDT <57>
next_to_use <57>
next_to_clean <2d>
buffer_info[next_to_clean]:
time_stamp <10524cc6e>
next_to_watch <2e>
jiffies <10524d248>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[Tue May 18 15:14:25 2021] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
TDH <2e>
TDT <57>
next_to_use <57>
next_to_clean <2d>
buffer_info[next_to_clean]:
time_stamp <10524cc6e>
next_to_watch <2e>
jiffies <10524d440>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[Tue May 18 15:14:26 2021] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
[Tue May 18 15:14:27 2021] vmbr0: port 1(eno1) entered disabled state
[Tue May 18 15:14:32 2021] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[Tue May 18 15:14:32 2021] vmbr0: port 1(eno1) entered blocking state
[Tue May 18 15:14:32 2021] vmbr0: port 1(eno1) entered forwarding state
Is this a bug in e1000e? Is it fixed in the "current" revision of e1000e, but PVE does not include that revision at this time, or is it still broken in the current revision? The "PC vendor" is Intel themselves, if this is Intel's problem how do I convince them of that?
Thanks