e1000 driver hang

Hello, same issue here on NUC8I7BEH.
I tried the workaroud and it's not working even with the full offload disabled :
Code:
ethtool -K eno1 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off

I don't have intensive network use but when I spawn a win 10 VM (with GPU throughput) the problem occurs more often

If you ran
ethtool -K eno1 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off

did you reset it back ? How?
 
I had this problem several times when a task uses network intensively, I added `post-up ethtool -K eno1 tso off gso off` to `/etc/network/interfaces`, the result remains to be seen.


Code:
╰─# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
    address 192.168.21.151
    netmask 255.255.255.0
    gateway 192.168.21.1
    bridge_ports eno1
    bridge_stp off
    bridge_fd 0
    # https://forum.proxmox.com/threads/e1000-driver-hang.58284/post-302895
    post-up ethtool -K eno1 tso off gso off
 
I'm using a Lenovo ThinkCentre M710q as a proxmox host.

Ethernet interface as follows:
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-V [8086:15b8]

Looking at the console of the host I found this "Detected Hardware Unit Hang:" and "Reset adapter unexpectedly" error.

I have placed "post-up ethtool -K enp0s31f6 tso off gso off" under the vmbr0 section of /etc/network/interfaces. I'm not sure what traffic prompted this failure but I don't think there's anything too significant on this network. No big RAID file storage or anything.

When the network interface went down, I wasn't able to log in as a root via the Proxmox host's console. If I had been able to, it would have been possible to power down the various VMs & containers before restarting the host. Would anyone care to explain why it wasn't possible to log in as root via the console please?
 
Just a quick note to say I'm also experiencing this on an ASRock Z490m-ITC/ac motherboard.

In my case the system was up for days, then I added a VM with a PCI passthough of the Intel Wifi card to an Ubuntu VM and everything was fine until I created a network bridge in the host between the VM's network interface and the intel wifi card and started up hostapd to create an access point. i.e. WIFI (wlan0) -> VM's bridge interface (br0) -> VM's interface (en0) -> VM host's interface -> Host bridge (vmbr0) -> eth0.

Previously I had created the AP on the proxmox host itself, but it kernel paniced and took the entire machine down which leads to a 12+ hour raid5 re-verification so to mitigate this I moved the AP into a VM but then started having issues with the timeout of the e1000 network interface.

Maybe, just maybe, the information above will help with tracking down what traffic seems to cause the issue.

In my case my workaround was to use the second 1GB network interface on the motherboard, a Realtek 8125, but that required compiling drivers from source, which turned out to be fairly trivial as realtek did a good job of creating a script to help!
 
same issue on my ASUS Z270i, I219-V.
I saw many cases on this in Forum.
the common solution is "ethtool -K eno1 tso off gso off", but I also suffered serval times after setting.
not sure if it caused by kernel, or pve?
mine is running in the newer kernel. (apt upgrade, stable)
 
I had this issue today as well. It happened after I started a VM with PCIE passthrough.

My ethernet card is Intel I219-LM. The kernel driver in use is e1000e. This one had hang issues.
My other ethernet card, which I use for PCIE passthrough, is Intel I210. The kernel driver in use is igb.

Is there any supported way to swap the module "igb" for "e1000e"? The igb seems to support the I219-LM.
 
same e1000e driver Fail++

  • Proxmox 6.3 with all latest non-enterprise updates as of 20dec22
  • Circa 2010 Dell Optiplex 960 generic box w/o any 3rd party HW additions other than HDs
  • Dell BIOS A18 (2019 Spectre/Meltdown mc updates)
  • Q9550 2.83Ghz
  • Intel 82567LM-3 NIC chip
  • Single VM consisting of Proxmox Backup Server Beta (5.4.64-1-pve)
  • Only passed-through HW is 5x SATA (AHCI mode) through to PBS VM

*le sigh*
 
Last edited:
Hello there!

I'm experiencing the same issue on NUC8i3BEH with I219-V NIC. Can someone please advice me if I'm applying the fix properly?

Here's what I have on PCI:
Code:
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (6) I219-V (rev 30)
        Subsystem: Intel Corporation Ethernet Connection (6) I219-V
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 134
        Region 0: Memory at d0b00000 (32-bit, non-prefetchable) [size=128K]
        Capabilities: [c8] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee003b8  Data: 0000
        Kernel driver in use: e1000e
        Kernel modules: e1000e


Here's what I have in '/etc/network/interfaces':
Code:
auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.8.101
        netmask 255.255.255.0
        gateway 192.168.8.1
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0


And here's what I applied:
Code:
auto lo
iface lo inet loopback

iface eno1 inet manual
        post-up ethtool -K eno1 tso off gso off

auto vmbr0
iface vmbr0 inet static
        address 192.168.8.101
        netmask 255.255.255.0
        gateway 192.168.8.1
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0
        post-up ethtool -K eno1 tso off gso off

Is this post-up line supposed to be added under both eno1 and vmbr0? Should I add/change anything else?
Many thanks for your assistance!
 
Hello there!

I'm experiencing the same issue on NUC8i3BEH with I219-V NIC. Can someone please advice me if I'm applying the fix properly?

Here's what I have on PCI:
Code:
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (6) I219-V (rev 30)
        Subsystem: Intel Corporation Ethernet Connection (6) I219-V
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 134
        Region 0: Memory at d0b00000 (32-bit, non-prefetchable) [size=128K]
        Capabilities: [c8] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee003b8  Data: 0000
        Kernel driver in use: e1000e
        Kernel modules: e1000e


Here's what I have in '/etc/network/interfaces':
Code:
auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.8.101
        netmask 255.255.255.0
        gateway 192.168.8.1
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0


And here's what I applied:
Code:
auto lo
iface lo inet loopback

iface eno1 inet manual
        post-up ethtool -K eno1 tso off gso off

auto vmbr0
iface vmbr0 inet static
        address 192.168.8.101
        netmask 255.255.255.0
        gateway 192.168.8.1
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0
        post-up ethtool -K eno1 tso off gso off

Is this post-up line supposed to be added under both eno1 and vmbr0? Should I add/change anything else?
Many thanks for your assistance!
I'd set post-up only on the "real" interfaces and not on the virtual ones. One thing to add would be the path to the "ethtool" and make sure that it is installed.

For example:
Code:
...
iface eno1 inet manual
        post-up /usr/sbin/ethtool -K $IFACE tso off gso off 2> /dev/null
...
 
Last edited:
Thanks for a real quick answer!

I'd set post-up only on the "real" interfaces and not on the virtual ones. One thing to add would be the path to the "ethtool" and make sure that it is installed.

For example:
Code:
...
iface eno1 inet manual
        post-up /usr/sbin/ethtool -K $IFACE tso off gso off 2> /dev/null
...
Is there any drawback from applying post-up to both interfaces? I've seen (here: https://forum.proxmox.com/threads/e1000-driver-hang.58284/post-303366) that it may not work if applied only to 'real' interface.

Regarding your example. While I clearly understand the path to ethtool, I am not sure what "2> /dev/null" do. Can you please briefly explain to me why should I add it?
 
Thanks for a real quick answer!


Is there any drawback from applying post-up to both interfaces? I've seen (here: https://forum.proxmox.com/threads/e1000-driver-hang.58284/post-303366) that it may not work if applied only to 'real' interface.

Regarding your example. While I clearly understand the path to ethtool, I am not sure what "2> /dev/null" do. Can you please briefly explain to me why should I add it?
I didn't test it myself. I would do it tonight after work and report back.
File descriptor 2 is the standard error (stderr). In case of en error, you might want to pipe it to the nirvana (/dev/null) for scripts or dependent programs, who try to interpret the output of the interfaces scripts, just in case.

Edit: You can also test, if it works with:
Code:
ethtool -k eno1
and check the features, if they've been set to off, after ifdown and ifup.
 
  • Like
Reactions: namelessx
I didn't test it myself. I would do it tonight after work and report back.
File descriptor 2 is the standard error (stderr). In case of en error, you might want to pipe it to the nirvana (/dev/null) for scripts or dependent programs, who try to interpret the output of the interfaces scripts, just in case.
Many thanks! I would also appreciate if you could indeed come back with your observations/experience.


You can also test, if it works with:
Code:
ethtool -k eno1
and check the features, if they've been set to off, after ifdown and ifup.

Thanks. Here's the output:
Code:
root@pve:/sbin# ethtool -k eno1
Features for eno1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
 
Many thanks! I would also appreciate if you could indeed come back with your observations/experience.




Thanks. Here's the output:
Code:
root@pve:/sbin# ethtool -k eno1
Features for eno1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
Is the output, after you've run ifdown and ifup? If so, it clearly didn't work...
 
Ok, so it worked only after I put it under both interfaces (eno1 and vmbr0). Heres the output:

Code:
Features for eno1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
        tx-tcp-segmentation: off
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

Not sure if it resolves the issue yet. I will need to test it later.
 
  • Like
Reactions: Taylan
Ok, so it worked only after I put it under both interfaces (eno1 and vmbr0). Heres the output:

Code:
Features for eno1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
        tx-tcp-segmentation: off
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

Not sure if it resolves the issue yet. I will need to test it later.
I can confirm, that it resolves the issue, although it is just a workaround.
 
  • Like
Reactions: namelessx
I can confirm, that it resolves the issue, although it is just a workaround.
Thanks. Can you point me to the best way to check if this workaround resolved the issue on my setup?
I've previously tried uploading/downloading large files to/from the network drive hosted on my NUC. I also tried iperf3. However, none of those tests make me 100% sure as this issue seems to pop-up quite randomly and it's hard to catch it red-handed.
 
Thanks. Can you point me to the best way to check if this workaround resolved the issue on my setup?
I've previously tried uploading/downloading large files to/from the network drive hosted on my NUC. I also tried iperf3. However, none of those tests make me 100% sure as this issue seems to pop-up quite randomly and it's hard to catch it red-handed.
I can replicate the issue easily if I passthrough one NIC to a Windows-VM and let it just boot. No heavy use, no large files. Just try RDP into the VM and voila. Just passthrough the NIC and it would crash the kernel driver. After this workaround there hasn't been any crash.
 
Last edited:
  • Like
Reactions: namelessx
I can replicate the issue easily if I passthrough one NIC to a Windows-VM and let it just boot. No heavy use, no large files. Just try RDP into the VM and voila. Just passthrough the NIC and it would crash the kernel driver. After this workaround there hasn't been any crash.
Oh, ok. I don't do passthrough because couple of VM's make use of this 'real' interface. I also don't have direct access to the host. That's my only NIC and I run NUC in headless mode. Loosing connectivity would be highly problematic then ;) I guess I'll run some "stress tests" and will see how it goes.