e1000 driver hang

n1nj4888

Member
Jan 13, 2019
162
14
23
42
I'm still getting the same "Detected Hardware Unit Hang" errors sporadically when using PVE kernel 5.4.

Code:
Mar 19 20:11:15 pve-host1.local kernel: [30377.339967] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:

I recall there was previously some advice around setting: ethtool -K <ADAPTER> gso off gro off tso off ... (and maybe even? ) ... tx off rx off

can anyone who has one of these types of adapters and stopped the "Detected Hardware Hang" through the ethtool feature setting:

  • Confirm what the exact ethtool command should be?
  • Confirm how this should be applied as a "post up" (in /etc/network/interfaces?) so that the workaround is applied at each adapter reset/reboot?
  • Confirm whether they have seen any performance degradation from the above workaround?

Thanks!
 
Last edited:
  • Like
Reactions: digital21cn

Stoiko Ivanov

Proxmox Staff Member
Staff member
May 2, 2018
6,679
992
163
  • Like
Reactions: Moayad

rewen

Member
Oct 26, 2017
2
0
6
36
I, too, can confirm that ethtool -K eno1 tso off gso off mitigates the issue for me.

I have been having this issue for months now and did not realize it. I assumed it was a connection issue with the ISP or host (neither of which would even investigate). Turns out it was being logged in syslog the whole time and I was too stubborn to take a look:

1584985098475.png

The connection always came back on its own in my case, but still caused massive headaches. I'm very happy that it's mostly resolved at the moment.

That being said, there's no definition for eno1 in the /etc/network/interfaces file. So I added the post-up to vmbr0 instead, which seems like it would still work since presumably both come up at the same time?

Code:
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback


# vmbr0: Bridging. Make sure to use only MAC adresses that were assigned to you.
auto vmbr0
iface vmbr0 inet static
        address 1.2.3.4/24
        gateway 1.2.3.4.254
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0
        post-up ethtool -K eno1 tso off gso off

Is there a better way I should have done this?
 

Apollon77

Member
Sep 24, 2018
147
13
23
45
I have a one liner with eno1 in my file and added it to that ... no idea if correct :-(
Maybe someone can advice

Code:
iface eno1 inet manual
        post-up /sbin/ethtool -K eno1 tso off gso off
 

ambyjkl

New Member
Mar 23, 2020
1
0
1
23
I have this issue on Linux 5.5.9 with Intel Corporation Ethernet Connection (7) I219-LM adapter. I have not yet tried the mitigation through `ethtool -K eno1 tso off gso off`, but someone mentioned this was supposed to have been fixed on 5.4.18 and I'm running a newer kernel. I would like to not use the mitigation since I've read it severely affects network performance, which is key for my application. Does anyone have any suggestions? Thanks!
 

spirit

Famous Member
Apr 2, 2010
5,527
566
133
www.odiso.com
I have this issue on Linux 5.5.9 with Intel Corporation Ethernet Connection (7) I219-LM adapter. I have not yet tried the mitigation through `ethtool -K eno1 tso off gso off`, but someone mentioned this was supposed to have been fixed on 5.4.18 and I'm running a newer kernel. I would like to not use the mitigation since I've read it severely affects network performance, which is key for my application. Does anyone have any suggestions? Thanks!
they are 2 differents bug.

the ethtool fix, is an old bug on some chipsets, and intel can't fix it (or don't want to fix it).

the kernel 5.5/5.4 fix is a bug introduced in kernel 5.0/5.1.

So please try ethtool too.
 

tssge

New Member
Mar 20, 2020
5
0
1
28
To those who are unable to fix this with ethool: I relized that there's a VLAN offload feature in NICs and if you have VLANs on your host, this offlload can cause the issue to happen as well.

To disable all offloading on the NIC, the following command can be used:

Bash:
ethtool -K eno1 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off

This resolved the issue for me, even though the ethtool stuff given earlier in this thread didn't work. It's worth to try out at least.
 

n1nj4888

Member
Jan 13, 2019
162
14
23
42
I have a one liner with eno1 in my file and added it to that ... no idea if correct :-(
Maybe someone can advice

Code:
iface eno1 inet manual
        post-up /sbin/ethtool -K eno1 tso off gso off

After installing ethtool on the node ...

Code:
apt install ethtool

... I tried this way first (putting the post-up under eno1) and rebooted. Checked whether the post-up had worked with "ethtool -k eno1 | grep offload" and I could see that it had not worked (tso was still enabled)... By placing the post-up line under both the "vmbr0" config (that eno1 is a bridge port of) and the "eno1" config, I could see that the config was set as expected after a reboot...

Code:
iface eno1 inet manual
        post-up ethtool -K eno1 tso off gso off

auto vmbr0
iface vmbr0 inet static
        address X.X.X.X/Y
        gateway X.X.X.X
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0
        post-up ethtool -K eno1 tso off gso off

I'm not yet sure whether just tso, gso=off are required or whether the full line of post-up
"ethtool -K eno1 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off" are required but thought I'd start with the minimal disabled first and check whether it still causes the hardware hang on eno1...
 
Last edited:
  • Like
Reactions: Apollon77

Apollon77

Member
Sep 24, 2018
147
13
23
45
I also had the feeling that it dit not worked in a first place .. but wasnt shure how to verify correctly. But with this info I will also place in both :)
 

tssge

New Member
Mar 20, 2020
5
0
1
28
I'm not yet sure whether just tso, gso=off are required or whether the full line of post-up
"ethtool -K eno1 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off" are required but thought I'd start with the minimal disabled first and check whether it still causes the hardware hang on eno1

Yeah, most probably there are some certain offloads that need to be disabled and others can be left on. However, I don't want to debug it further on a production server myself and the CPU load increase from having no offloading at all even when utilizing 1Gbps 100% is neglible (at least on my servers)
 

spirit

Famous Member
Apr 2, 2010
5,527
566
133
www.odiso.com
I tried this way first (putting the post-up under eno1) and rebooted. Checked whether the post-up had worked with "ethtool -k eno1 | grep offload" and I could see that it had not worked (tso was still enabled)... By placing the post-up line under both the "vmbr0" config (that eno1 is a bridge port of) and the "eno1" config, I could see that the config was set as expected after a reboot...

Code:
iface eno1 inet manual
        post-up ethtool -K eno1 tso off gso off

auto vmbr0
iface vmbr0 inet static
        address X.X.X.X/Y
        gateway X.X.X.X
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0
        post-up ethtool -K eno1 tso off gso off

as you don't have "auto eno1", ifupdown1 don't execute the post-up in the eno1 section.

the eno1 is simply "ip link set eno1 up" by "bridge-ports eno1", then post-up in vmbr0 is executing after both vmbr0 && eno1 are up.
 

Kribbstar

New Member
Mar 28, 2020
1
0
1
41
I think I have the very same problem as everyone else here. Under heavy network load the NIC seems to go down and I can't reach proxmox or any VM's through ssh or the web gui. The network load come primary from one of the VM's running Sabnzbd.

This is my log output, which repeats over and over:

Code:
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   TDH                  <0>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   TDT                  <1>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   next_to_use          <1>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   next_to_clean        <0>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633] buffer_info[next_to_clean]:
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   time_stamp           <15a14b4cb>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   next_to_watch        <0>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   jiffies              <15a14b8b8>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633]   next_to_watch.status <0>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633] MAC Status             <40080083>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633] PHY Status             <796d>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633] PHY 1000BASE-T Status  <3800>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633] PHY Extended Status    <3000>
Mar 14 14:17:10 yggdrasil kernel: [6045453.108633] PCI Status             <10>
Mar 14 14:17:11 yggdrasil kernel: [6045454.164399] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
Mar 14 14:17:11 yggdrasil kernel: [6045454.164439] vmbr0: port 1(eno1) entered disabled state
Mar 14 14:17:18 yggdrasil kernel: [6045461.092765] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Mar 14 14:17:18 yggdrasil kernel: [6045461.092810] vmbr0: port 1(eno1) entered blocking state
Mar 14 14:17:18 yggdrasil kernel: [6045461.092812] vmbr0: port 1(eno1) entered forwarding state

I'm running proxmox on a Intel NUC 7i5BNK.

I'm going to try the suggested solution:
Bash:
ethtool -K <device name> gso off tso off

And see how it works, I really hope we find a permanent fix for this soon.
 

trottelvottel

New Member
Mar 30, 2020
2
0
1
36
Morning!

I have the problem with flapping eno1 port on switch so i tried to exchange the cable etc.... it is a intel card in a lenovo m93 box

Unbenannt.JPG

so i tried the kernel from a post here... no effect

Bash:
Linux 5.4.24-1-pve #1 SMP PVE 5.4.24-1 (Mon, 09 Mar 2020 12:59:46 +0100)


I am now testing with disable all offloads

Bash:
ethtool -K eno1 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off

Hope this will fixed soon.

Regards
 
Last edited:

n1nj4888

Member
Jan 13, 2019
162
14
23
42
Just to report back here... Since adding the "ethtool -K eno1 tso off gso off" to postup (about a week ago), I haven't had any further occurrences of the "Detected Hardware Unit Hang" issue... So it looks like only "tso off gso off" are required and not all the other parameters
 
  • Like
Reactions: Apollon77

dynek

New Member
Oct 30, 2019
4
0
3
38
PVE Kernel 5.4.24-1 did not fix this for me, I'm also still using ethtool.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!