e1000 driver hang

I'm still getting the same "Detected Hardware Unit Hang" errors sporadically when using PVE kernel 5.4.

Code:
Mar 19 20:11:15 pve-host1.local kernel: [30377.339967] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:

I recall there was previously some advice around setting: ethtool -K <ADAPTER> gso off gro off tso off ... (and maybe even? ) ... tx off rx off

can anyone who has one of these types of adapters and stopped the "Detected Hardware Hang" through the ethtool feature setting:

  • Confirm what the exact ethtool command should be?
  • Confirm how this should be applied as a "post up" (in /etc/network/interfaces?) so that the workaround is applied at each adapter reset/reboot?
  • Confirm whether they have seen any performance degradation from the above workaround?

Thanks!
 
Last edited:
  • Like
Reactions: digital21cn
  • Like
Reactions: Moayad
I, too, can confirm that ethtool -K eno1 tso off gso off mitigates the issue for me.

I have been having this issue for months now and did not realize it. I assumed it was a connection issue with the ISP or host (neither of which would even investigate). Turns out it was being logged in syslog the whole time and I was too stubborn to take a look:

1584985098475.png

The connection always came back on its own in my case, but still caused massive headaches. I'm very happy that it's mostly resolved at the moment.

That being said, there's no definition for eno1 in the /etc/network/interfaces file. So I added the post-up to vmbr0 instead, which seems like it would still work since presumably both come up at the same time?

Code:
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback


# vmbr0: Bridging. Make sure to use only MAC adresses that were assigned to you.
auto vmbr0
iface vmbr0 inet static
        address 1.2.3.4/24
        gateway 1.2.3.4.254
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0
        post-up ethtool -K eno1 tso off gso off

Is there a better way I should have done this?
 
I have a one liner with eno1 in my file and added it to that ... no idea if correct :-(
Maybe someone can advice

Code:
iface eno1 inet manual
        post-up /sbin/ethtool -K eno1 tso off gso off
 
I have this issue on Linux 5.5.9 with Intel Corporation Ethernet Connection (7) I219-LM adapter. I have not yet tried the mitigation through `ethtool -K eno1 tso off gso off`, but someone mentioned this was supposed to have been fixed on 5.4.18 and I'm running a newer kernel. I would like to not use the mitigation since I've read it severely affects network performance, which is key for my application. Does anyone have any suggestions? Thanks!
 
I have this issue on Linux 5.5.9 with Intel Corporation Ethernet Connection (7) I219-LM adapter. I have not yet tried the mitigation through `ethtool -K eno1 tso off gso off`, but someone mentioned this was supposed to have been fixed on 5.4.18 and I'm running a newer kernel. I would like to not use the mitigation since I've read it severely affects network performance, which is key for my application. Does anyone have any suggestions? Thanks!
they are 2 differents bug.

the ethtool fix, is an old bug on some chipsets, and intel can't fix it (or don't want to fix it).

the kernel 5.5/5.4 fix is a bug introduced in kernel 5.0/5.1.

So please try ethtool too.
 
To those who are unable to fix this with ethool: I relized that there's a VLAN offload feature in NICs and if you have VLANs on your host, this offlload can cause the issue to happen as well.

To disable all offloading on the NIC, the following command can be used:

Bash:
ethtool -K eno1 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off

This resolved the issue for me, even though the ethtool stuff given earlier in this thread didn't work. It's worth to try out at least.
 
I have a one liner with eno1 in my file and added it to that ... no idea if correct :-(
Maybe someone can advice

Code:
iface eno1 inet manual
        post-up /sbin/ethtool -K eno1 tso off gso off

After installing ethtool on the node ...

Code:
apt install ethtool

... I tried this way first (putting the post-up under eno1) and rebooted. Checked whether the post-up had worked with "ethtool -k eno1 | grep offload" and I could see that it had not worked (tso was still enabled)... By placing the post-up line under both the "vmbr0" config (that eno1 is a bridge port of) and the "eno1" config, I could see that the config was set as expected after a reboot...

Code:
iface eno1 inet manual
        post-up ethtool -K eno1 tso off gso off

auto vmbr0
iface vmbr0 inet static
        address X.X.X.X/Y
        gateway X.X.X.X
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0
        post-up ethtool -K eno1 tso off gso off

I'm not yet sure whether just tso, gso=off are required or whether the full line of post-up
"ethtool -K eno1 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off" are required but thought I'd start with the minimal disabled first and check whether it still causes the hardware hang on eno1...
 
Last edited:
  • Like
Reactions: Apollon77
I also had the feeling that it dit not worked in a first place .. but wasnt shure how to verify correctly. But with this info I will also place in both :-)
 
I'm not yet sure whether just tso, gso=off are required or whether the full line of post-up
"ethtool -K eno1 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off" are required but thought I'd start with the minimal disabled first and check whether it still causes the hardware hang on eno1

Yeah, most probably there are some certain offloads that need to be disabled and others can be left on. However, I don't want to debug it further on a production server myself and the CPU load increase from having no offloading at all even when utilizing 1Gbps 100% is neglible (at least on my servers)
 
I tried this way first (putting the post-up under eno1) and rebooted. Checked whether the post-up had worked with "ethtool -k eno1 | grep offload" and I could see that it had not worked (tso was still enabled)... By placing the post-up line under both the "vmbr0" config (that eno1 is a bridge port of) and the "eno1" config, I could see that the config was set as expected after a reboot...

Code:
iface eno1 inet manual
        post-up ethtool -K eno1 tso off gso off

auto vmbr0
iface vmbr0 inet static
        address X.X.X.X/Y
        gateway X.X.X.X
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0
        post-up ethtool -K eno1 tso off gso off

as you don't have "auto eno1", ifupdown1 don't execute the post-up in the eno1 section.

the eno1 is simply "ip link set eno1 up" by "bridge-ports eno1", then post-up in vmbr0 is executing after both vmbr0 && eno1 are up.
 
I think I have the very same problem as everyone else here. Under heavy network load the NIC seems to go down and I can't reach proxmox or any VM's through ssh or the web gui. The network load come primary from one of the VM's running Sabnzbd.

This is my log output, which repeats over and over:

Code:
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   TDH                  <0>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   TDT                  <1>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   next_to_use          <1>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   next_to_clean        <0>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633] buffer_info[next_to_clean]:
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   time_stamp           <15a14b4cb>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   next_to_watch        <0>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   jiffies              <15a14b8b8>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   next_to_watch.status <0>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633] MAC Status             <40080083>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633] PHY Status             <796d>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633] PHY 1000BASE-T Status  <3800>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633] PHY Extended Status    <3000>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633] PCI Status             <10>
Mar 14 14:17:11 yggdrasil kernel: [6045454.164399] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
Mar 14 14:17:11 yggdrasil kernel: [6045454.164439] vmbr0: port 1(eno1) entered disabled state
Mar 14 14:17:18 yggdrasil kernel: [6045461.092765] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Mar 14 14:17:18 yggdrasil kernel: [6045461.092810] vmbr0: port 1(eno1) entered blocking state
Mar 14 14:17:18 yggdrasil kernel: [6045461.092812] vmbr0: port 1(eno1) entered forwarding state

I'm running proxmox on a Intel NUC 7i5BNK.

I'm going to try the suggested solution:
Bash:
ethtool -K <device name> gso off tso off

And see how it works, I really hope we find a permanent fix for this soon.
 
Morning!

I have the problem with flapping eno1 port on switch so i tried to exchange the cable etc.... it is a intel card in a lenovo m93 box

Unbenannt.JPG

so i tried the kernel from a post here... no effect

Bash:
Linux 5.4.24-1-pve #1 SMP PVE 5.4.24-1 (Mon, 09 Mar 2020 12:59:46 +0100)


I am now testing with disable all offloads

Bash:
ethtool -K eno1 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off

Hope this will fixed soon.

Regards
 
Last edited:
Just to report back here... Since adding the "ethtool -K eno1 tso off gso off" to postup (about a week ago), I haven't had any further occurrences of the "Detected Hardware Unit Hang" issue... So it looks like only "tso off gso off" are required and not all the other parameters
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!