e1000e eno1: Detected Hardware Unit Hang:

bogo22

Active Member
Nov 4, 2016
51
3
28
Hello,

i am using Proxmox on an Intel NUC NUC8i3BEH2.
Actually it is running quite stable for a long time, but recently there is some more persistent network throuput (5-10 mb/s for several hours) on an Qemu vm and I get the following message from time to time "Detected Hardware Unit Hang" (see dmesg output). When I reboot the system it works for maybe 1-2 hours and than (after the persistent network throughput) I got the message again. I also tried ethtool -K eno1 tso off gso off but the hang still appears.
I got some lxc containers which are connected directly with bridge vmbr0 and some Qemu VM's which are all using VirtIO

How can I fix the problem? Thanks for your help.


pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-3-pve)
pve-manager: 6.0-9 (running version: 6.0-9/508dcee0)
pve-kernel-5.0: 6.0-9
pve-kernel-helper: 6.0-9
pve-kernel-4.15: 5.4-6
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.18-1-pve: 5.0.18-3
pve-kernel-5.0.15-1-pve: 5.0.15-1
pve-kernel-4.15.18-18-pve: 4.15.18-44
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-12-pve: 4.15.18-36
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-5
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-8
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-7
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-9
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve1

cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.2.12
netmask 255.255.255.0
gateway 192.168.2.1
bridge_ports eno1
bridge_stp off
bridge_fd 0

ethtool -k eno1
Features for eno1:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
tx-tcp-segmentation: off
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]


dmesg | grep e1000e
Bash:
[    2.605546] e1000e: Intel(R) PRO/1000 Network Driver - 3.2.6-k
[    2.605547] e1000e: Copyright(c) 1999 - 2015 Intel Corporation.
[    2.606144] e1000e 0000:00:1f.6: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
[    3.009976] e1000e 0000:00:1f.6 0000:00:1f.6 (uninitialized): registered PHC clock
[    3.078037] e1000e 0000:00:1f.6 eth0: (PCI Express:2.5GT/s:Width x1) 94:c6:91:a4:ec:d2
[    3.078039] e1000e 0000:00:1f.6 eth0: Intel(R) PRO/1000 Network Connection
[    3.078127] e1000e 0000:00:1f.6 eth0: MAC: 13, PHY: 12, PBA No: FFFFFF-0FF
[    3.078831] e1000e 0000:00:1f.6 eno1: renamed from eth0
[   12.376548] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

dmesg error output (see quote below or for full output pastebin):
Bash:
[72589.451401] vmbr0: port 2(veth900i0) entered blocking state
[72589.451402] vmbr0: port 2(veth900i0) entered disabled state
[72589.451451] device veth900i0 entered promiscuous mode
[72589.856974] eth0: renamed from vethHS0XH5
[72590.410454] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[72590.410483] vmbr0: port 2(veth900i0) entered blocking state
[72590.410484] vmbr0: port 2(veth900i0) entered forwarding state
[86023.359460] hrtimer: interrupt took 23694 ns
[97377.240263] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                 TDH                  <22>
                 TDT                  <2f>
                 next_to_use          <2f>
                 next_to_clean        <21>
               buffer_info[next_to_clean]:
                 time_stamp           <101725292>
                 next_to_watch        <22>
                 jiffies              <1017253e0>
                 next_to_watch.status <0>
               MAC Status             <40080083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[97381.276165] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                 TDH                  <22>
                 TDT                  <2f>
                 next_to_use          <2f>
                 next_to_clean        <21>
               buffer_info[next_to_clean]:
                 time_stamp           <101725292>
                 next_to_watch        <22>
                 jiffies              <1017257d1>
                 next_to_watch.status <0>
               MAC Status             <40080083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[121439.796191] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                  TDH                  <4>
                  TDT                  <9a>
                  next_to_use          <9a>
                  next_to_clean        <3>
                buffer_info[next_to_clean]:
                  time_stamp           <101ce1456>
                  next_to_watch        <4>
                  jiffies              <101ce1ee0>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[121440.692004] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
[121440.692068] vmbr0: port 1(eno1) entered disabled state
[121445.846522] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[121445.846594] vmbr0: port 1(eno1) entered blocking state
[121445.846597] vmbr0: port 1(eno1) entered forwarding state
[123496.079024] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                  TDH                  <44>
                  TDT                  <9b>
                  next_to_use          <9b>
                  next_to_clean        <43>
                buffer_info[next_to_clean]:
                  time_stamp           <101d5f5e3>
                  next_to_watch        <44>
                  jiffies              <101d5f700>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[124251.524535] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
[124251.524619] vmbr0: port 1(eno1) entered disabled state
[124256.099190] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
[124256.099270] vmbr0: port 1(eno1) entered blocking state
[124256.099276] vmbr0: port 1(eno1) entered forwarding state
 
Last edited:
May 11, 2019
232
10
23
@bogo22

I am seeing the same errors too now:

Code:
Sep 09 14:46:40 pve pvestatd[1206]: storage 'ISOs-SMB' is not online
Sep 09 14:46:42 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                              TDH                  <ab>
                              TDT                  <e4>
                              next_to_use          <e4>
                              next_to_clean        <aa>
                            buffer_info[next_to_clean]:
                              time_stamp           <10010824a>
                              next_to_watch        <ab>
                              jiffies              <100108a20>
                              next_to_watch.status <0>
                            MAC Status             <40080083>
                            PHY Status             <796d>
                            PHY 1000BASE-T Status  <3800>
                            PHY Extended Status    <3000>
                            PCI Status             <10>
Sep 09 14:46:42 pve kernel: e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly

How did you fix it ?
Thx
 

Lickermad

New Member
Mar 19, 2021
3
3
3
44
100 client sessions on which there was a fall are now working stably and there are no problems.
 

Attachments

  • Снимок экрана 2021-03-19 в 20.50.34.png
    Снимок экрана 2021-03-19 в 20.50.34.png
    62 KB · Views: 56
  • Like
Reactions: chudak
May 11, 2019
232
10
23
ethtool -K eth0 tx off rx off

Disabling TCP checksum offloading worked for me

I wonder how do I know if my tx/rx on or off?

I see:

root@pve:~# ethtool -g eno1
Ring parameters for eno1:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 256
RX Mini: 0
RX Jumbo: 0
TX: 256

In other words, if i set it as ethtool -K eth0 tx off rx off, how do I unset it in case if needed?

Thx
 

Lickermad

New Member
Mar 19, 2021
3
3
3
44
I wonder how do I know if my tx/rx on or off?

I see:

root@pve:~# ethtool -g eno1
Ring parameters for eno1:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 256
RX Mini: 0
RX Jumbo: 0
TX: 256

In other words, if i set it as ethtool -K eth0 tx off rx off, how do I unset it in case if needed?

Thx


# ethtool -a eth0
Pause parameters for eth0:
Autonegotiate: on
RX: on
TX: on
 
  • Like
Reactions: chudak
May 11, 2019
232
10
23
So did:
ethtool -K eth0 tx off rx off

Changed Network driver to VirtIO (paravirtualized) from E1000 on my VMs
Tested download/upload on several GBs and see no errors so far.
Seems a bit better in speed.

Thx for the clue @Lickermad !
 
May 11, 2019
232
10
23
# ethtool -a eth0
Pause parameters for eth0:
Autonegotiate: on
RX: on
TX: on

Spoke too fast ...
I was uploading ~10GB with multi threads and still hit the error:

Code:
Apr 07 14:21:21 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                              TDH                  <f3>
                              TDT                  <16>
                              next_to_use          <16>
                              next_to_clean        <f2>
                            buffer_info[next_to_clean]
                              time_stamp           <101a193f4>
                              next_to_watch        <f3>
                              jiffies              <101a19518>
                              next_to_watch.status <0>
                            MAC Status             <40080083>
                            PHY Status             <796d>
                            PHY 1000BASE-T Status  <3800>
                            PHY Extended Status    <3000>
                            PCI Status             <10>
Apr 07 14:21:23 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                              TDH                  <f3>
                              TDT                  <16>
                              next_to_use          <16>
                              next_to_clean        <f2>
                            buffer_info[next_to_clean]
                              time_stamp           <101a193f4>
                              next_to_watch        <f3>
                              jiffies              <101a19710>
                              next_to_watch.status <0>
                            MAC Status             <40080083>
                            PHY Status             <796d>
                            PHY 1000BASE-T Status  <3800>
                            PHY Extended Status    <3000>
                            PCI Status             <10>
Apr 07 14:21:25 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                              TDH                  <f3>
                              TDT                  <16>
                              next_to_use          <16>
                              next_to_clean        <f2>
                            buffer_info[next_to_clean]
                              time_stamp           <101a193f4>
                              next_to_watch        <f3>
                              jiffies              <101a19900>
                              next_to_watch.status <0>
                            MAC Status             <40080083>
                            PHY Status             <796d>
                            PHY 1000BASE-T Status  <3800>
                            PHY Extended Status    <3000>
                            PCI Status             <10>
Apr 07 14:21:27 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                              TDH                  <f3>
                              TDT                  <16>
                              next_to_use          <16>
                              next_to_clean        <f2>
                            buffer_info[next_to_clean]
                              time_stamp           <101a193f4>
                              next_to_watch        <f3>
                              jiffies              <101a19af8>
                              next_to_watch.status <0>
                            MAC Status             <40080083>
                            PHY Status             <796d>
                            PHY 1000BASE-T Status  <3800>
                            PHY Extended Status    <3000>
                            PCI Status             <10>
Apr 07 14:21:29 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                              TDH                  <f3>
                              TDT                  <16>
                              next_to_use          <16>
                              next_to_clean        <f2>
                            buffer_info[next_to_clean]
                              time_stamp           <101a193f4>
                              next_to_watch        <f3>
                              jiffies              <101a19ce8>
                              next_to_watch.status <0>
                            MAC Status             <40080083>
                            PHY Status             <796d>
                            PHY 1000BASE-T Status  <3800>
                            PHY Extended Status    <3000>
                            PCI Status             <10>
Apr 07 14:21:29 pve kernel: e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly

However, the VM was not hung and unresponsive,which is also a better behavior
 

michaels2408

New Member
Oct 14, 2020
8
1
3
54
chudak,
When did this problem start for you? I have a proxmox secondary node, PVE2, running on an Asus board with the Intel Pro/1000 and it just recently , last two months, started behaving like this.
[ 1902.246655] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang: TDH <bd> TDT <e5> next_to_use <e5> next_to_clean <bc> buffer_info[next_to_clean]: time_stamp <100061a8f> next_to_watch <bd> jiffies <100061c48> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3c00> PHY Extended Status <3000> PCI Status <10>

Virtual machine is accessed across a 1gb network from the primary proxmox node named PVE. I guess I need to go back through old logs to see if I can spot when the problem started.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!