e1000 driver hang

May 11, 2019
232
10
23
Hello, same issue here on NUC8I7BEH.
I tried the workaroud and it's not working even with the full offload disabled :
Code:
ethtool -K eno1 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off

I don't have intensive network use but when I spawn a win 10 VM (with GPU throughput) the problem occurs more often

If you ran
ethtool -K eno1 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off

did you reset it back ? How?
 

Tony-bb48

New Member
Apr 20, 2020
2
0
1
32
I had this problem several times when a task uses network intensively, I added `post-up ethtool -K eno1 tso off gso off` to `/etc/network/interfaces`, the result remains to be seen.


Code:
╰─# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
    address 192.168.21.151
    netmask 255.255.255.0
    gateway 192.168.21.1
    bridge_ports eno1
    bridge_stp off
    bridge_fd 0
    # https://forum.proxmox.com/threads/e1000-driver-hang.58284/post-302895
    post-up ethtool -K eno1 tso off gso off
 

PickleRick63

New Member
Sep 28, 2020
2
0
1
41
I'm using a Lenovo ThinkCentre M710q as a proxmox host.

Ethernet interface as follows:
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-V [8086:15b8]

Looking at the console of the host I found this "Detected Hardware Unit Hang:" and "Reset adapter unexpectedly" error.

I have placed "post-up ethtool -K enp0s31f6 tso off gso off" under the vmbr0 section of /etc/network/interfaces. I'm not sure what traffic prompted this failure but I don't think there's anything too significant on this network. No big RAID file storage or anything.

When the network interface went down, I wasn't able to log in as a root via the Proxmox host's console. If I had been able to, it would have been possible to power down the various VMs & containers before restarting the host. Would anyone care to explain why it wasn't possible to log in as root via the console please?
 

dominicc

New Member
Oct 22, 2020
1
0
1
48
Just a quick note to say I'm also experiencing this on an ASRock Z490m-ITC/ac motherboard.

In my case the system was up for days, then I added a VM with a PCI passthough of the Intel Wifi card to an Ubuntu VM and everything was fine until I created a network bridge in the host between the VM's network interface and the intel wifi card and started up hostapd to create an access point. i.e. WIFI (wlan0) -> VM's bridge interface (br0) -> VM's interface (en0) -> VM host's interface -> Host bridge (vmbr0) -> eth0.

Previously I had created the AP on the proxmox host itself, but it kernel paniced and took the entire machine down which leads to a 12+ hour raid5 re-verification so to mitigate this I moved the AP into a VM but then started having issues with the timeout of the e1000 network interface.

Maybe, just maybe, the information above will help with tracking down what traffic seems to cause the issue.

In my case my workaround was to use the second 1GB network interface on the motherboard, a Realtek 8125, but that required compiling drivers from source, which turned out to be fairly trivial as realtek did a good job of creating a script to help!
 

jhyang

New Member
Oct 22, 2020
1
0
1
39
same issue on my ASUS Z270i, I219-V.
I saw many cases on this in Forum.
the common solution is "ethtool -K eno1 tso off gso off", but I also suffered serval times after setting.
not sure if it caused by kernel, or pve?
mine is running in the newer kernel. (apt upgrade, stable)
 

Taylan

Member
Oct 19, 2020
62
15
8
52
I had this issue today as well. It happened after I started a VM with PCIE passthrough.

My ethernet card is Intel I219-LM. The kernel driver in use is e1000e. This one had hang issues.
My other ethernet card, which I use for PCIE passthrough, is Intel I210. The kernel driver in use is igb.

Is there any supported way to swap the module "igb" for "e1000e"? The igb seems to support the I219-LM.
 

fiveangle

New Member
Dec 23, 2020
22
2
3
San Francisco Bay Area
same e1000e driver Fail++

  • Proxmox 6.3 with all latest non-enterprise updates as of 20dec22
  • Circa 2010 Dell Optiplex 960 generic box w/o any 3rd party HW additions other than HDs
  • Dell BIOS A18 (2019 Spectre/Meltdown mc updates)
  • Q9550 2.83Ghz
  • Intel 82567LM-3 NIC chip
  • Single VM consisting of Proxmox Backup Server Beta (5.4.64-1-pve)
  • Only passed-through HW is 5x SATA (AHCI mode) through to PBS VM

*le sigh*
 
Last edited:

namelessx

New Member
May 2, 2020
18
4
3
31
Hello there!

I'm experiencing the same issue on NUC8i3BEH with I219-V NIC. Can someone please advice me if I'm applying the fix properly?

Here's what I have on PCI:
Code:
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (6) I219-V (rev 30)
        Subsystem: Intel Corporation Ethernet Connection (6) I219-V
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 134
        Region 0: Memory at d0b00000 (32-bit, non-prefetchable) [size=128K]
        Capabilities: [c8] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee003b8  Data: 0000
        Kernel driver in use: e1000e
        Kernel modules: e1000e


Here's what I have in '/etc/network/interfaces':
Code:
auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.8.101
        netmask 255.255.255.0
        gateway 192.168.8.1
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0


And here's what I applied:
Code:
auto lo
iface lo inet loopback

iface eno1 inet manual
        post-up ethtool -K eno1 tso off gso off

auto vmbr0
iface vmbr0 inet static
        address 192.168.8.101
        netmask 255.255.255.0
        gateway 192.168.8.1
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0
        post-up ethtool -K eno1 tso off gso off

Is this post-up line supposed to be added under both eno1 and vmbr0? Should I add/change anything else?
Many thanks for your assistance!
 

Taylan

Member
Oct 19, 2020
62
15
8
52
Hello there!

I'm experiencing the same issue on NUC8i3BEH with I219-V NIC. Can someone please advice me if I'm applying the fix properly?

Here's what I have on PCI:
Code:
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (6) I219-V (rev 30)
        Subsystem: Intel Corporation Ethernet Connection (6) I219-V
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 134
        Region 0: Memory at d0b00000 (32-bit, non-prefetchable) [size=128K]
        Capabilities: [c8] Power Management version 3
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee003b8  Data: 0000
        Kernel driver in use: e1000e
        Kernel modules: e1000e


Here's what I have in '/etc/network/interfaces':
Code:
auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.8.101
        netmask 255.255.255.0
        gateway 192.168.8.1
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0


And here's what I applied:
Code:
auto lo
iface lo inet loopback

iface eno1 inet manual
        post-up ethtool -K eno1 tso off gso off

auto vmbr0
iface vmbr0 inet static
        address 192.168.8.101
        netmask 255.255.255.0
        gateway 192.168.8.1
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0
        post-up ethtool -K eno1 tso off gso off

Is this post-up line supposed to be added under both eno1 and vmbr0? Should I add/change anything else?
Many thanks for your assistance!
I'd set post-up only on the "real" interfaces and not on the virtual ones. One thing to add would be the path to the "ethtool" and make sure that it is installed.

For example:
Code:
...
iface eno1 inet manual
        post-up /usr/sbin/ethtool -K $IFACE tso off gso off 2> /dev/null
...
 
Last edited:
  • Like
Reactions: namelessx

namelessx

New Member
May 2, 2020
18
4
3
31
Thanks for a real quick answer!

I'd set post-up only on the "real" interfaces and not on the virtual ones. One thing to add would be the path to the "ethtool" and make sure that it is installed.

For example:
Code:
...
iface eno1 inet manual
        post-up /usr/sbin/ethtool -K $IFACE tso off gso off 2> /dev/null
...
Is there any drawback from applying post-up to both interfaces? I've seen (here: https://forum.proxmox.com/threads/e1000-driver-hang.58284/post-303366) that it may not work if applied only to 'real' interface.

Regarding your example. While I clearly understand the path to ethtool, I am not sure what "2> /dev/null" do. Can you please briefly explain to me why should I add it?
 

Taylan

Member
Oct 19, 2020
62
15
8
52
Thanks for a real quick answer!


Is there any drawback from applying post-up to both interfaces? I've seen (here: https://forum.proxmox.com/threads/e1000-driver-hang.58284/post-303366) that it may not work if applied only to 'real' interface.

Regarding your example. While I clearly understand the path to ethtool, I am not sure what "2> /dev/null" do. Can you please briefly explain to me why should I add it?
I didn't test it myself. I would do it tonight after work and report back.
File descriptor 2 is the standard error (stderr). In case of en error, you might want to pipe it to the nirvana (/dev/null) for scripts or dependent programs, who try to interpret the output of the interfaces scripts, just in case.

Edit: You can also test, if it works with:
Code:
ethtool -k eno1
and check the features, if they've been set to off, after ifdown and ifup.
 
  • Like
Reactions: namelessx

namelessx

New Member
May 2, 2020
18
4
3
31
I didn't test it myself. I would do it tonight after work and report back.
File descriptor 2 is the standard error (stderr). In case of en error, you might want to pipe it to the nirvana (/dev/null) for scripts or dependent programs, who try to interpret the output of the interfaces scripts, just in case.
Many thanks! I would also appreciate if you could indeed come back with your observations/experience.


You can also test, if it works with:
Code:
ethtool -k eno1
and check the features, if they've been set to off, after ifdown and ifup.

Thanks. Here's the output:
Code:
root@pve:/sbin# ethtool -k eno1
Features for eno1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
 

Taylan

Member
Oct 19, 2020
62
15
8
52
Many thanks! I would also appreciate if you could indeed come back with your observations/experience.




Thanks. Here's the output:
Code:
root@pve:/sbin# ethtool -k eno1
Features for eno1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
Is the output, after you've run ifdown and ifup? If so, it clearly didn't work...
 

namelessx

New Member
May 2, 2020
18
4
3
31
Ok, so it worked only after I put it under both interfaces (eno1 and vmbr0). Heres the output:

Code:
Features for eno1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
        tx-tcp-segmentation: off
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

Not sure if it resolves the issue yet. I will need to test it later.
 
  • Like
Reactions: Taylan

Taylan

Member
Oct 19, 2020
62
15
8
52
Ok, so it worked only after I put it under both interfaces (eno1 and vmbr0). Heres the output:

Code:
Features for eno1:
rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: off [fixed]
        tx-checksum-ip-generic: on
        tx-checksum-ipv6: off [fixed]
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
        tx-tcp-segmentation: off
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: off
        tx-tcp6-segmentation: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

Not sure if it resolves the issue yet. I will need to test it later.
I can confirm, that it resolves the issue, although it is just a workaround.
 
  • Like
Reactions: namelessx

namelessx

New Member
May 2, 2020
18
4
3
31
I can confirm, that it resolves the issue, although it is just a workaround.
Thanks. Can you point me to the best way to check if this workaround resolved the issue on my setup?
I've previously tried uploading/downloading large files to/from the network drive hosted on my NUC. I also tried iperf3. However, none of those tests make me 100% sure as this issue seems to pop-up quite randomly and it's hard to catch it red-handed.
 

Taylan

Member
Oct 19, 2020
62
15
8
52
Thanks. Can you point me to the best way to check if this workaround resolved the issue on my setup?
I've previously tried uploading/downloading large files to/from the network drive hosted on my NUC. I also tried iperf3. However, none of those tests make me 100% sure as this issue seems to pop-up quite randomly and it's hard to catch it red-handed.
I can replicate the issue easily if I passthrough one NIC to a Windows-VM and let it just boot. No heavy use, no large files. Just try RDP into the VM and voila. Just passthrough the NIC and it would crash the kernel driver. After this workaround there hasn't been any crash.
 
Last edited:
  • Like
Reactions: namelessx

namelessx

New Member
May 2, 2020
18
4
3
31
I can replicate the issue easily if I passthrough one NIC to a Windows-VM and let it just boot. No heavy use, no large files. Just try RDP into the VM and voila. Just passthrough the NIC and it would crash the kernel driver. After this workaround there hasn't been any crash.
Oh, ok. I don't do passthrough because couple of VM's make use of this 'real' interface. I also don't have direct access to the host. That's my only NIC and I run NUC in headless mode. Loosing connectivity would be highly problematic then ;) I guess I'll run some "stress tests" and will see how it goes.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!