pve 6.4 crashing frequently with e1000e "Detected Hardware Unit Hang"

Aug 19, 2019
57
7
13
Good morning,

our PVE 6.4-13 with kernel 5.4.128-1-pve crashes unplanned every few days with the following error msg:

Code:
Sep 01 01:55:42 tokoeka sshd[2837222]: Disconnected from invalid user developer 162.243.91.84 port 48918 [preauth]
Sep 01 01:55:55 tokoeka sshd[2837252]: rexec line 24: Deprecated option UsePrivilegeSeparation
Sep 01 01:55:58 tokoeka kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
TDH                  <a7>
TDT                  <e0>
next_to_use          <e0>
next_to_clean        <a6>
buffer_info[next_to_clean]:
time_stamp           <110102cdf>
next_to_watch        <b3>
jiffies              <110102f79>
next_to_watch.status <0>


I saw a discussion on this before on the linux kernel list, but it should already be solved in the 5.x kernels. There are workarounds but they affect perfomance. Any other ideas? Would the upgrade to PVE 7 help here?

Thanks for some ideas,
Thommie
 
Hi,

Some (normally older) intel NICs using the e1000e driver are affected by a HW/FW issue when TSO (TCP Segmentation Offloading) and/or GSO (Generis Segmentation Offloading) is active. A kernel patch that heuristically disabled those features on a superset of affected devices was NAK'd as it would also select to many devices that are OK, see this reply from an Intel kernel dev:

There are many PCH2 devices with different SKU's. Not all devices have
this problem (Tx hand). We do not want to set disabling TSO as the
default version. Let's keep this option for all other users.
Also, this is very old known HW bug - unfortunately we didn't fixed it.
Our more new devices have not this problem.
-- http://patchwork.ozlabs.org/project/netdev/patch/1623942.pXzBnfQ100@rocinante.m.i2n/#2176391

If this is the issue you're affected by, then the only actual workaround is to disable TSO and GSO:
Bash:
ethtool -K <interface> tso off gso off

That should not really have a big impact on network throughput, but may increase CPU usage a bit.

For doing that automatically on boot you could add a post-up line to the respective bridge stanza in the /etc/network/interfaces configuration.
 
  • Like
Reactions: thommie

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!