e1000e eno1: Detected Hardware Unit Hang:

bogo22 · Nov 7, 2019

Hello,

i am using Proxmox on an Intel NUC NUC8i3BEH2.
Actually it is running quite stable for a long time, but recently there is some more persistent network throuput (5-10 mb/s for several hours) on an Qemu vm and I get the following message from time to time "Detected Hardware Unit Hang" (see dmesg output). When I reboot the system it works for maybe 1-2 hours and than (after the persistent network throughput) I got the message again. I also tried ethtool -K eno1 tso off gso off but the hang still appears.
I got some lxc containers which are connected directly with bridge vmbr0 and some Qemu VM's which are all using VirtIO

How can I fix the problem? Thanks for your help.

pveversion -v

proxmox-ve: 6.0-2 (running kernel: 5.0.21-3-pve)
pve-manager: 6.0-9 (running version: 6.0-9/508dcee0)
pve-kernel-5.0: 6.0-9
pve-kernel-helper: 6.0-9
pve-kernel-4.15: 5.4-6
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.18-1-pve: 5.0.18-3
pve-kernel-5.0.15-1-pve: 5.0.15-1
pve-kernel-4.15.18-18-pve: 4.15.18-44
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-12-pve: 4.15.18-36
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-5
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-8
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-7
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-9
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve1

cat /etc/network/interfaces

auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.2.12
netmask 255.255.255.0
gateway 192.168.2.1
bridge_ports eno1
bridge_stp off
bridge_fd 0

ethtool -k eno1

Features for eno1:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
tx-tcp-segmentation: off
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

dmesg | grep e1000e

Bash:

[    2.605546] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[    2.605547] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    2.606144] e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    3.009976] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): registered PHC clock
[    3.078037] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) 94:c6:91:a4:ec:d2
[    3.078039] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
[    3.078127] e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No: FFFFFF-0FF
[    3.078831] e1000e 0000:00:1f.6 eno1: renamed from eth0
[   12.376548] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

dmesg error output (see quote below or for full output pastebin):

Bash:

[72589.451401] vmbr0: port 2(veth900i0) entered blocking state
[72589.451402] vmbr0: port 2(veth900i0) entered disabled state
[72589.451451] device veth900i0 entered promiscuous mode
[72589.856974] eth0: renamed from vethHS0XH5
[72590.410454] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[72590.410483] vmbr0: port 2(veth900i0) entered blocking state
[72590.410484] vmbr0: port 2(veth900i0) entered forwarding state
[86023.359460] hrtimer: interrupt took 23694 ns
[97377.240263] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                 TDH                  <22>
                 TDT                  <2f>
                 next_to_use          <2f>
                 next_to_clean        <21>
               buffer_info[next_to_clean]:
                 time_stamp           <101725292>
                 next_to_watch        <22>
                 jiffies              <1017253e0>
                 next_to_watch.status <0>
               MAC Status             <40080083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[97381.276165] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                 TDH                  <22>
                 TDT                  <2f>
                 next_to_use          <2f>
                 next_to_clean        <21>
               buffer_info[next_to_clean]:
                 time_stamp           <101725292>
                 next_to_watch        <22>
                 jiffies              <1017257d1>
                 next_to_watch.status <0>
               MAC Status             <40080083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[121439.796191] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                  TDH                  <4>
                  TDT                  <9a>
                  next_to_use          <9a>
                  next_to_clean        <3>
                buffer_info[next_to_clean]:
                  time_stamp           <101ce1456>
                  next_to_watch        <4>
                  jiffies              <101ce1ee0>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[121440.692004] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
[121440.692068] vmbr0: port 1(eno1) entered disabled state
[121445.846522] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[121445.846594] vmbr0: port 1(eno1) entered blocking state
[121445.846597] vmbr0: port 1(eno1) entered forwarding state
[123496.079024] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                  TDH                  <44>
                  TDT                  <9b>
                  next_to_use          <9b>
                  next_to_clean        <43>
                buffer_info[next_to_clean]:
                  time_stamp           <101d5f5e3>
                  next_to_watch        <44>
                  jiffies              <101d5f700>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[124251.524535] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
[124251.524619] vmbr0: port 1(eno1) entered disabled state
[124256.099190] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[124256.099270] vmbr0: port 1(eno1) entered blocking state
[124256.099276] vmbr0: port 1(eno1) entered forwarding state

bogo22 · Nov 7, 2019

UPDATE:

Seems like an driver issue (other people got same error: https://forum.proxmox.com/threads/e1000-driver-hang.58284 )

chudak · Sep 9, 2020

@bogo22

I am seeing the same errors too now:

Code:

Sep 09 14:46:40 pve pvestatd[1206]: storage 'ISOs-SMB' is not online
Sep 09 14:46:42 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                              TDH                  <ab>
                              TDT                  <e4>
                              next_to_use          <e4>
                              next_to_clean        <aa>
                            buffer_info[next_to_clean]:
                              time_stamp           <10010824a>
                              next_to_watch        <ab>
                              jiffies              <100108a20>
                              next_to_watch.status <0>
                            MAC Status             <40080083>
                            PHY Status             <796d>
                            PHY 1000BASE-T Status  <3800>
                            PHY Extended Status    <3000>
                            PCI Status             <10>
Sep 09 14:46:42 pve kernel: e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly

How did you fix it ?
Thx

kifeo · Sep 24, 2020

same issue here on NUC and 8200 SFF

chudak · Sep 24, 2020

Anybody tried this driver https://downloadcenter.intel.com/do...et-Adapter-Complete-Driver-Pack?product=82186 ?

tom · Sep 24, 2020

Yuri Weinstein said:
Anybody tried this driver https://downloadcenter.intel.com/do...et-Adapter-Complete-Driver-Pack?product=82186 ?

Please stop triple- posts! Posting three times the same question within 30 minutes is a really annoying.

Lickermad · Mar 19, 2021

ethtool -K eth0 tx off rx off

Disabling TCP checksum offloading worked for me

chudak · Mar 19, 2021

Lickermad said:
ethtool -K eth0 tx off rx off

Disabling TCP checksum offloading worked for me

Did you lose any in performance after that ?

Lickermad · Mar 19, 2021

100 client sessions on which there was a fall are now working stably and there are no problems.

chudak · Mar 29, 2021

Lickermad said:
ethtool -K eth0 tx off rx off

Disabling TCP checksum offloading worked for me

I wonder how do I know if my tx/rx on or off?

I see:

root@pve:~# ethtool -g eno1
Ring parameters for eno1:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 256
RX Mini: 0
RX Jumbo: 0
TX: 256

In other words, if i set it as ethtool -K eth0 tx off rx off, how do I unset it in case if needed?

Thx

Lickermad · Mar 29, 2021

chudak said:
I wonder how do I know if my tx/rx on or off?

I see:

root@pve:~# ethtool -g eno1
Ring parameters for eno1:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 256
RX Mini: 0
RX Jumbo: 0
TX: 256

In other words, if i set it as ethtool -K eth0 tx off rx off, how do I unset it in case if needed?

Thx

# ethtool -a eth0
Pause parameters for eth0:
Autonegotiate: on
RX: on
TX: on

chudak · Mar 30, 2021

So did:
ethtool -K eth0 tx off rx off

Changed Network driver to VirtIO (paravirtualized) from E1000 on my VMs
Tested download/upload on several GBs and see no errors so far.
Seems a bit better in speed.

Thx for the clue @Lickermad !

chudak · Apr 7, 2021

Lickermad said:
# ethtool -a eth0
Pause parameters for eth0:
Autonegotiate: on
RX: on
TX: on

Spoke too fast ...
I was uploading ~10GB with multi threads and still hit the error:

Code:

Apr 07 14:21:21 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                              TDH                  <f3>
                              TDT                  <16>
                              next_to_use          <16>
                              next_to_clean        <f2>
                            buffer_info[next_to_clean]
                              time_stamp           <101a193f4>
                              next_to_watch        <f3>
                              jiffies              <101a19518>
                              next_to_watch.status <0>
                            MAC Status             <40080083>
                            PHY Status             <796d>
                            PHY 1000BASE-T Status  <3800>
                            PHY Extended Status    <3000>
                            PCI Status             <10>
Apr 07 14:21:23 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                              TDH                  <f3>
                              TDT                  <16>
                              next_to_use          <16>
                              next_to_clean        <f2>
                            buffer_info[next_to_clean]
                              time_stamp           <101a193f4>
                              next_to_watch        <f3>
                              jiffies              <101a19710>
                              next_to_watch.status <0>
                            MAC Status             <40080083>
                            PHY Status             <796d>
                            PHY 1000BASE-T Status  <3800>
                            PHY Extended Status    <3000>
                            PCI Status             <10>
Apr 07 14:21:25 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                              TDH                  <f3>
                              TDT                  <16>
                              next_to_use          <16>
                              next_to_clean        <f2>
                            buffer_info[next_to_clean]
                              time_stamp           <101a193f4>
                              next_to_watch        <f3>
                              jiffies              <101a19900>
                              next_to_watch.status <0>
                            MAC Status             <40080083>
                            PHY Status             <796d>
                            PHY 1000BASE-T Status  <3800>
                            PHY Extended Status    <3000>
                            PCI Status             <10>
Apr 07 14:21:27 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                              TDH                  <f3>
                              TDT                  <16>
                              next_to_use          <16>
                              next_to_clean        <f2>
                            buffer_info[next_to_clean]
                              time_stamp           <101a193f4>
                              next_to_watch        <f3>
                              jiffies              <101a19af8>
                              next_to_watch.status <0>
                            MAC Status             <40080083>
                            PHY Status             <796d>
                            PHY 1000BASE-T Status  <3800>
                            PHY Extended Status    <3000>
                            PCI Status             <10>
Apr 07 14:21:29 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                              TDH                  <f3>
                              TDT                  <16>
                              next_to_use          <16>
                              next_to_clean        <f2>
                            buffer_info[next_to_clean]
                              time_stamp           <101a193f4>
                              next_to_watch        <f3>
                              jiffies              <101a19ce8>
                              next_to_watch.status <0>
                            MAC Status             <40080083>
                            PHY Status             <796d>
                            PHY 1000BASE-T Status  <3800>
                            PHY Extended Status    <3000>
                            PCI Status             <10>
Apr 07 14:21:29 pve kernel: e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly

However, the VM was not hung and unresponsive,which is also a better behavior

michaels2408 · May 6, 2021

chudak,
When did this problem start for you? I have a proxmox secondary node, PVE2, running on an Asus board with the Intel Pro/1000 and it just recently , last two months, started behaving like this.


[ 1902.246655] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
                 TDH                  <bd>
                 TDT                  <e5>
                 next_to_use          <e5>
                 next_to_clean        <bc>
               buffer_info[next_to_clean]:
                 time_stamp           <100061a8f>
                 next_to_watch        <bd>
                 jiffies              <100061c48>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3c00>
               PHY Extended Status    <3000>
               PCI Status             <10>

Virtual machine is accessed across a 1gb network from the primary proxmox node named PVE. I guess I need to go back through old logs to see if I can spot when the problem started.

prx · May 15, 2021

Try this workaround - https://forum.proxmox.com/threads/e1000-driver-hang.58284/post-390709

dengolius · Mar 8, 2023

Today I'v failed in issue with some of the latest version of proxmox:

Bash:

Mar  8 10:19:08 ox1 kernel: [7660556.021229] e1000e 0000:00:1f.6 enp0s31f6: Reset adapter unexpectedly
Mar  8 10:19:12 ox1 kernel: [7660559.987660] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Mar  8 10:19:14 ox1 kernel: [7660561.910802] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
Mar  8 10:19:14 ox1 kernel: [7660561.910802]   TDH                  <0>
Mar  8 10:19:14 ox1 kernel: [7660561.910802]   TDT                  <1>
Mar  8 10:19:14 ox1 kernel: [7660561.910802]   next_to_use          <1>
Mar  8 10:19:14 ox1 kernel: [7660561.910802]   next_to_clean        <0>
Mar  8 10:19:14 ox1 kernel: [7660561.910802] buffer_info[next_to_clean]:
Mar  8 10:19:14 ox1 kernel: [7660561.910802]   time_stamp           <17225a522>
Mar  8 10:19:14 ox1 kernel: [7660561.910802]   next_to_watch        <0>
Mar  8 10:19:14 ox1 kernel: [7660561.910802]   jiffies              <17225a700>
Mar  8 10:19:14 ox1 kernel: [7660561.910802]   next_to_watch.status <0>
Mar  8 10:19:14 ox1 kernel: [7660561.910802] MAC Status             <80083>
Mar  8 10:19:14 ox1 kernel: [7660561.910802] PHY Status             <796d>
Mar  8 10:19:14 ox1 kernel: [7660561.910802] PHY 1000BASE-T Status  <7800>
Mar  8 10:19:14 ox1 kernel: [7660561.910802] PHY Extended Status    <3000>
Mar  8 10:19:14 ox1 kernel: [7660561.910802] PCI Status             <10>

when server became unavailable

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31)
01:00.0 RAID bus controller: Adaptec Series 8 12G SAS/PCIe 3 (rev 01)
02:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network Connection

poutinelova · May 28, 2023

Same here, on a Lenovo M720q:

syslog output:

Bash:

May 22 19:22:49 basestar kernel: [186242.857338] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
May 22 19:22:49 basestar kernel: [186242.857338]   TDH                  <79>
May 22 19:22:49 basestar kernel: [186242.857338]   TDT                  <a1>
May 22 19:22:49 basestar kernel: [186242.857338]   next_to_use          <a1>
May 22 19:22:49 basestar kernel: [186242.857338]   next_to_clean        <78>
May 22 19:22:49 basestar kernel: [186242.857338] buffer_info[next_to_clean]:
May 22 19:22:49 basestar kernel: [186242.857338]   time_stamp           <102c54e19>
May 22 19:22:49 basestar kernel: [186242.857338]   next_to_watch        <79>
May 22 19:22:49 basestar kernel: [186242.857338]   jiffies              <102c555a8>
May 22 19:22:49 basestar kernel: [186242.857338]   next_to_watch.status <0>
May 22 19:22:49 basestar kernel: [186242.857338] MAC Status             <40080083>
May 22 19:22:49 basestar kernel: [186242.857338] PHY Status             <796d>
May 22 19:22:49 basestar kernel: [186242.857338] PHY 1000BASE-T Status  <3c00>
May 22 19:22:49 basestar kernel: [186242.857338] PHY Extended Status    <3000>
May 22 19:22:49 basestar kernel: [186242.857338] PCI Status             <10>
May 22 19:22:49 basestar kernel: [186242.953122] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
May 22 19:22:49 basestar kernel: [186243.043502] vmbr0: port 1(eno1) entered disabled state
May 22 19:22:53 basestar kernel: [186246.739797] e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
May 22 19:22:53 basestar kernel: [186246.739854] vmbr0: port 1(eno1) entered blocking state
May 22 19:22:53 basestar kernel: [186246.739858] vmbr0: port 1(eno1) entered forwarding state

lspci:

Code:

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-V (rev 10)
    DeviceName: Onboard - Ethernet
    Subsystem: Lenovo Ethernet Connection (7) I219-V
    Kernel driver in use: e1000e
    Kernel modules: e1000e

sysadminfromhell · Jul 4, 2023

For me its super weird because I have 2 identical Nucs, same model: 10th Gen i7 same Intel Nic. (Intel NUC10FNH)
One has this problems and can only be fixed if I disabled the tso, gso and gro, I installed the permanent fix in the /etc/network/interfaces file and see if this will help permanently.
The other one works normally (besides the fact that it reaches 90 Degress sometimes).
Why sometimes this happens on some identitcal devices?
Same PVE Version, same BIOS, same Network same everything. Its Identical.

spirit · Jul 4, 2023

sysadminfromhell said:
Why sometimes this happens on some identitcal devices?

This is a known bug with I219-V,I218-V,... nic model since years.
Almost all bug report are coming from NUC users (just check on the forum).
I think that it's simply because they are cheap card, and they can handle the offloadling load too much.

keeka · Jul 4, 2023

sysadminfromhell said:
For me its super weird because I have 2 identical Nucs, same model: 10th Gen i7 same Intel Nic. (Intel NUC10FNH)
One has this problems and can only be fixed if I disabled the tso, gso and gro, I installed the permanent fix in the /etc/network/interfaces file and see if this will help permanently.
The other one works normally (besides the fact that it reaches 90 Degress sometimes).
Why sometimes this happens on some identitcal devices?
Same PVE Version, same BIOS, same Network same everything. Its Identical.

I have earlier Intel NIC affected by the same issue. I found I only needed to disable offload when I'd configured vlans on the interface. Traffic volume didn't appear to be a factor.

e1000e eno1: Detected Hardware Unit Hang:

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Proxmox Staff Member

New Member

Well-Known Member

New Member

Attachments

Well-Known Member

New Member

Well-Known Member

Well-Known Member

Member

New Member

Well-Known Member

New Member

Member

Distinguished Member

Well-Known Member