e1000 driver hang

alatteri · Dec 2, 2019

5.3.10-1-pve

although I haven't had the issue in several day, there is a chance what I see in the log happened before I rebooted with the new kernel.

alatteri · Dec 3, 2019

Nope, definitely still happening even with 5.3.10

spirit · Dec 5, 2019

alatteri said:
Nope, definitely still happening even with 5.3.10

still same error message ?

alatteri · Dec 5, 2019

correct

Apollon77 · Dec 8, 2019

I'm now on 5.3.10 kernel too with pve 6.1 ... before that I had again such a case with not only the messages but also a "ethernet restart" ... lets see if it is different now

alatteri · Dec 11, 2019

still happens even after upgrade to pve 6.1

spirit · Dec 11, 2019

maybe related:
https://bugzilla.kernel.org/show_bug.cgi?id=205047

but the error log seem to be different, so maybe another bug :/

can you send you kern.log/dmesg with last 5.3.10 kernel ?

alatteri · Dec 11, 2019

Dec 11 13:13:00 vmhost03 systemd[1]: Starting Proxmox VE replication runner...
Dec 11 13:13:01 vmhost03 systemd[1]: pvesr.service: Succeeded.
Dec 11 13:13:01 vmhost03 systemd[1]: Started Proxmox VE replication runner.
Dec 11 13:13:35 vmhost03 corosync[1132]: [KNET ] link: host: 1 link: 0 is down
Dec 11 13:13:35 vmhost03 corosync[1132]: [KNET ] link: host: 2 link: 0 is down
Dec 11 13:13:35 vmhost03 corosync[1132]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Dec 11 13:13:35 vmhost03 corosync[1132]: [KNET ] host: host: 1 has no active links
Dec 11 13:13:35 vmhost03 corosync[1132]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Dec 11 13:13:35 vmhost03 corosync[1132]: [KNET ] host: host: 2 has no active links
Dec 11 13:13:36 vmhost03 corosync[1132]: [TOTEM ] Token has not been received in 1237 ms
Dec 11 13:13:36 vmhost03 corosync[1132]: [TOTEM ] A processor failed, forming new configuration.
Dec 11 13:13:36 vmhost03 kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
TDH <dd>
TDT <37>
next_to_use <37>
next_to_clean <dd>
buffer_info[next_to_clean]:
time_stamp <103aef7d9>
next_to_watch <de>
jiffies <103aef991>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
Dec 11 13:13:36 vmhost03 pvestatd[1148]: storage 'vault' is not online
Dec 11 13:13:38 vmhost03 corosync[1132]: [TOTEM ] A new membership (3.4d0) was formed. Members left: 1 2
Dec 11 13:13:38 vmhost03 corosync[1132]: [TOTEM ] Failed to receive the leave message. failed: 1 2
Dec 11 13:13:38 vmhost03 corosync[1132]: [CPG ] downlist left_list: 2 received
Dec 11 13:13:38 vmhost03 corosync[1132]: [QUORUM] This node is within the non-primary component and will NOT provide any services.
Dec 11 13:13:38 vmhost03 corosync[1132]: [QUORUM] Members[1]: 3
Dec 11 13:13:38 vmhost03 corosync[1132]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 11 13:13:38 vmhost03 pmxcfs[1007]: [status] notice: node lost quorum
Dec 11 13:13:38 vmhost03 pmxcfs[1007]: [dcdb] notice: members: 3/1007
Dec 11 13:13:38 vmhost03 pmxcfs[1007]: [status] notice: members: 3/1007
Dec 11 13:13:38 vmhost03 pmxcfs[1007]: [dcdb] crit: received write while not quorate - trigger resync
Dec 11 13:13:38 vmhost03 pmxcfs[1007]: [dcdb] crit: leaving CPG group
Dec 11 13:13:38 vmhost03 pve-ha-lrm[1190]: unable to write lrm status file - unable to open file '/etc/pve/nodes/vmhost03/lrm_status.tmp.1190' - Permission denied
Dec 11 13:13:38 vmhost03 kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
TDH <dd>
TDT <37>
next_to_use <37>
next_to_clean <dd>
buffer_info[next_to_clean]:
time_stamp <103aef7d9>
next_to_watch <de>
jiffies <103aefb89>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
Dec 11 13:13:38 vmhost03 pmxcfs[1007]: [dcdb] notice: start cluster connection
Dec 11 13:13:38 vmhost03 pmxcfs[1007]: [dcdb] crit: cpg_join failed: 14
Dec 11 13:13:38 vmhost03 pmxcfs[1007]: [dcdb] crit: can't initialize service
Dec 11 13:13:40 vmhost03 kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
TDH <dd>
TDT <37>
next_to_use <37>
next_to_clean <dd>
buffer_info[next_to_clean]:
time_stamp <103aef7d9>
next_to_watch <de>
jiffies <103aefd80>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
Dec 11 13:13:41 vmhost03 kernel: e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
Dec 11 13:13:41 vmhost03 kernel: vmbr0: port 1(eno1) entered disabled state
Dec 11 13:13:44 vmhost03 pmxcfs[1007]: [dcdb] notice: members: 3/1007
Dec 11 13:13:44 vmhost03 pmxcfs[1007]: [dcdb] notice: all data is up to date
Dec 11 13:13:46 vmhost03 pvestatd[1148]: storage 'vault' is not online
Dec 11 13:13:48 vmhost03 kernel: e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Dec 11 13:13:48 vmhost03 kernel: vmbr0: port 1(eno1) entered blocking state
Dec 11 13:13:48 vmhost03 kernel: vmbr0: port 1(eno1) entered forwarding state
Dec 11 13:13:51 vmhost03 corosync[1132]: [KNET ] rx: host: 2 link: 0 is up
Dec 11 13:13:51 vmhost03 corosync[1132]: [KNET ] rx: host: 1 link: 0 is up
Dec 11 13:13:51 vmhost03 corosync[1132]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Dec 11 13:13:51 vmhost03 corosync[1132]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Dec 11 13:13:51 vmhost03 corosync[1132]: [TOTEM ] A new membership (1.4d4) was formed. Members joined: 1 2
Dec 11 13:13:51 vmhost03 corosync[1132]: [CPG ] downlist left_list: 0 received
Dec 11 13:13:51 vmhost03 corosync[1132]: [CPG ] downlist left_list: 0 received
Dec 11 13:13:51 vmhost03 corosync[1132]: [CPG ] downlist left_list: 0 received
Dec 11 13:13:51 vmhost03 pmxcfs[1007]: [dcdb] notice: members: 1/1205, 2/1031, 3/1007
Dec 11 13:13:51 vmhost03 pmxcfs[1007]: [dcdb] notice: starting data syncronisation
Dec 11 13:13:51 vmhost03 pmxcfs[1007]: [status] notice: members: 1/1205, 2/1031, 3/1007
Dec 11 13:13:51 vmhost03 pmxcfs[1007]: [status] notice: starting data syncronisation
Dec 11 13:13:51 vmhost03 corosync[1132]: [QUORUM] This node is within the primary component and will provide service.
Dec 11 13:13:51 vmhost03 corosync[1132]: [QUORUM] Members[3]: 1 2 3
Dec 11 13:13:51 vmhost03 corosync[1132]: [MAIN ] Completed service synchronization, ready to provide service.
Dec 11 13:13:51 vmhost03 pmxcfs[1007]: [status] notice: node has quorum
Dec 11 13:13:51 vmhost03 pmxcfs[1007]: [dcdb] notice: received sync request (epoch 1/1205/0000000F)
Dec 11 13:13:51 vmhost03 pmxcfs[1007]: [status] notice: received sync request (epoch 1/1205/0000000F)
Dec 11 13:13:51 vmhost03 pmxcfs[1007]: [dcdb] notice: received all states
Dec 11 13:13:51 vmhost03 pmxcfs[1007]: [dcdb] notice: leader is 1/1205
Dec 11 13:13:51 vmhost03 pmxcfs[1007]: [dcdb] notice: synced members: 1/1205, 2/1031
Dec 11 13:13:51 vmhost03 pmxcfs[1007]: [dcdb] notice: waiting for updates from leader
Dec 11 13:13:51 vmhost03 pmxcfs[1007]: [status] notice: received all states
Dec 11 13:13:51 vmhost03 pmxcfs[1007]: [status] notice: all data is up to date

spirit · Dec 12, 2019

mmm,
maybe this one:
https://bugzilla.kernel.org/show_bug.cgi?id=47331

Code:

SB 2019-09-15 12:55:16 UTC
This is still an issue.

I'm running OpenWRT with Kernel 4.14.131.  Any reasonable load on the Intel onboard I217LM NIC causes it to hardware fault repeatedly. 

[  917.996439] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
[  917.996439]   TDH                  <db>
[  917.996439]   TDT                  <f1>
[  917.996439]   next_to_use          <f1>
[  917.996439]   next_to_clean        <db>
[  917.996439] buffer_info[next_to_clean]:
[  917.996439]   time_stamp           <10000efce>
[  917.996439]   next_to_watch        <db>
[  917.996439]   jiffies              <10000f168>
[  917.996439]   next_to_watch.status <0>
[  917.996439] MAC Status             <80083>
[  917.996439] PHY Status             <796d>
[  917.996439] PHY 1000BASE-T Status  <3800>
[  917.996439] PHY Extended Status    <3000>
[  917.996439] PCI Status             <10>

I can confirm the "ethtool -K enp0s25 gso off gro off tso off" workaround does indeed appear to work

can you try to disable gso gro tso ?

Tacid · Dec 12, 2019

George Michalopoulos said:
Ι tried to transfer the same file (103Gb, scp transfer in a server in the same data center, Hetzner) before and after running this command..

103GB 67.7MB/s 25:58 this is before
103GB 37.7MB/s 46:38 this is after

the transfer rate is almost the half...

Strange, that's not what you should expect. Disabling tso and gso will retrun full speed on interface, because latencies is added in the driver patch when using segmentation offload on intel ethernet card.

sirkorro · Dec 19, 2019

Do I need to
execute "ethtool -K eth0 gso off gro off tso off" after each reboot?
I saw somewhere that people are executing this as post-up command,
question is: should I add post-up to interface itself or to vmbr0 bridge?

sg90 · Dec 20, 2019

sirkorro said:
Do I need to
execute "ethtool -K eth0 gso off gro off tso off" after each reboot?
I saw somewhere that people are executing this as post-up command,
question is: should I add post-up to interface itself or to vmbr0 bridge?

Yes it's lost after a reboot / nic reset, add the post-up to the interface itself as it needs to be applied to the physical NIC itself and not the bridge.

janssensm · Dec 29, 2019

So glad I found this post, so thanks for posting!
For quite a while i had noticed hick-ups under heavy (network) load, happened once in a while.
But recently it seemed to have gotten worse, not sure it was a result of changes I committed to my system.
In the logs I found the behavior at least since kernel 4.15.18-16-pve (SMP PVE 4.15.18-41).
Currently it's happening and reproducible on 5.3.13-1-pve (SMP PVE 5.3.13-1).
Adding "post-up /sbin/ethtool -K enp0s25 tso off gso off" to the enp0s25 iface in /etc/network/interfaces solved it.

spirit · Dec 29, 2019

Do you use NAT in your setup ?

does disabling tso only is enough ?

I have found a discussion about tso and nat, and a reply from an intel driver dev guy
https://patchwork.ozlabs.org/patch/1098997/

Code:

Sasha NeftinMay 22, 2019, 10:58 a.m. UTC | #3
On 5/21/2019 18:42, Juliana Rodrigueiro wrote:
> So I ask myself, how actually feasible is it to gamble the usage of "ethtool"
> to turn on or off TSO every time the network configuration changes?
Hello Juliana,
There are many PCH2 devices with different SKU's.  Not all devices have
this problem (Tx hand). We do not want to set disabling TSO as the
default version. Let's keep this option for all other users.
Also, this is very old known HW bug - unfortunately we didn't fixed it.
Our more new devices have not this problem.

TLDR, don't expect to be this fixed, buy a new card. Thanks Intel :/

janssensm · Dec 30, 2019

spirit said:
Do you use NAT in your setup ?

Yep, I run Sophos UTM in a VM, which has several virtio nics and a passthrough Intel Nic.
Inside UTM gui I cannot change offload settings as in pfsense, but apparently Sophos solves it by udev rules for several types of nics.
In my case the passthrough nic gets its rule by this part of the rule file (also based on ethtool, see [1]):

Code:

# e1000e: disable TSO for Intel 82574L (errata 17, #30345)
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x10d3", RUN+="/lib/udev/nic-disable-tso"
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x10f6", RUN+="/lib/udev/nic-disable-tso"

And it's working, checking inside UTM learns that for the Intel nic tso=off, gso=on, gro=on, for the virtio nics tso=on, gso=on, gro=on.
I searched through the UTM kernel messages and and could not find nic hangs or resets. So it seems inside the UTM VM all is functioning well.

spirit said:
does disabling tso only is enough ?

Yes, tso=off is enough, so I have adjusted the line for post-up to only disable tso and will monitor it running like that for a while.

It appears that pfsense devs have disabled tso by default, from the documentation [2] and [3]:

Checking this option will disable hardware TCP segmentation offloading (TSO, TSO4, TSO6). TSO causes the NIC to handle splitting up packets into MTU-sized chunks rather than handling that at the OS level. This can be faster for servers and appliances as it allows the OS to offload that task to dedicated hardware, but when acting as a firewall or router this behavior is highly undesirable as it actually increases the load as this task has already been performed elsewhere on the network, thus breaking the end-to-end principle by modifying packets that did not originate on this host.

Warning
This option is not desirable for routers and firewalls, but can benefit workstations and appliances. It is disabled by default, and should remain disabled unless the firewall is acting primarily or solely in an appliance/endpoint role.

Do not uncheck this option unless directed to do so by a support representative. This offloading is broken in some hardware drivers, and can negatively impact performance on affected network cards and roles.

This documentation is primarily written with non-virtualized hardware in mind, I guess.
So perhaps the best advise when running a virtualized router/firewall would be to disable tso not only inside this vm, but the host as well?

spirit said:
TLDR, don't expect to be this fixed, buy a new card. Thanks Intel :/

If this offloading feature has been so hit-and-miss for years, perhaps it's logic should be reversed. Disabled by default, enable by choice.

[1] https://community.sophos.com/produc...9-312-intel-82572ei-e1000e-hardware-unit-hang
[2] https://docs.netgate.com/pfsense/en...ing.html#hardware-tcp-segmentation-offloading
[3] https://docs.netgate.com/pfsense/en...-pfsense-software-to-work-with-proxmox-virtio

spirit · Dec 30, 2019

>>If this offloading feature has been so hit-and-miss for years, perhaps it's logic should be reversed. Disabled by default, enable by choice.
I don't known because not everybody is using routed setup (as currently this break live migration until we have anycast gateway with coming sdn feature).
and maybe it's really an intel bug. (I'll need to test with my mellanox card with a routed setup + nat to compare)

Maybe it could be disabled by default for some specific intel model, but we should have a list of specific devices like your udev rule for example

ATTRS{vendor}=="0x8086", ATTRS{device}=="0x10d3",
ATTRS{vendor}=="0x8086", ATTRS{device}=="0x10f6"

@users in this threads, can you send the result of
# lspci -nn

to see exact model of your intel nic ?

alatteri · Dec 30, 2019

00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection I219-V [8086:1570] (rev 21)
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (4) I219-V [8086:15d8] (rev 21)

from several different generations of Intel NUCs

sirkorro · Dec 30, 2019

00:19.0 Ethernet controller [0200]: Intel Corporation 82579LM Gigabit Network Connection [8086:1502] (rev 06)

janssensm · Dec 30, 2019

00:19.0 Ethernet controller [0200]: Intel Corporation 82567LM-3 Gigabit Network Connection [8086:10de] (rev 02)
10:02.0 Ethernet controller [0200]: Intel Corporation 82541PI Gigabit Ethernet Controller [8086:107c] (rev 05)
30:00.0 Ethernet controller [0200]: Intel Corporation 82574L Gigabit Network Connection [8086:10d3]

# misc rules for NICs

# e1000: errata for Intel 82546GB adapters corrupting memory if TSO enabled (#25910)
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x1079", RUN+="/lib/udev/nic-disable-tso"
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x107b", RUN+="/lib/udev/nic-disable-tso"

# e1000e: errata for Intel 82583V adapters corrupting memory if TSO enabled (#27887)
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x150c", RUN+="/lib/udev/nic-disable-tso"

# e1000e: disable ASPM on Intel 82583V (#30711)
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x150c", RUN+="/lib/udev/nic-disable-aspm"

# e1000e: disable TSO for Intel 82574L (errata 17, #30345)
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x10d3", RUN+="/lib/udev/nic-disable-tso"
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x10f6", RUN+="/lib/udev/nic-disable-tso"

# e1000e: disable TSO for 82572EI (errata 7, #30669)
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x107d", RUN+="/lib/udev/nic-disable-tso"
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x107e", RUN+="/lib/udev/nic-disable-tso"
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x107f", RUN+="/lib/udev/nic-disable-tso"
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x10b9", RUN+="/lib/udev/nic-disable-tso"

# e1000e: disable TSO for 82571EB (errata 7, #34608)
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x105f", RUN+="/lib/udev/nic-disable-tso"

# bnx2x: avoid driver bug by disabling GRO (#28846)
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x14e4", ATTRS{device}=="0x168e", RUN+="/lib/udev/nic-disable-gro"

# e1000e: diable GRO for some affected versions of 82546GB
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x8086", ATTRS{device}=="0x10b5", ATTRS{subsystem_vendor}=="0x8086", ATTRS{subsystem_device}=="0x1199", RUN+="/lib/udev/nic-disable-gro"

#i40e: disable TSO to avoid network issues on X710 for 10GbE SFP+
SUBSYSTEM=="net", ACTION=="add", ATTRS{vendor}=="0x8086",ATTRS{device}=="0x1572", RUN+="/lib/udev/nic-disable-tso"

Apollon77 · Jan 7, 2020

I had such a "hang" again today ... so also "ethtool -K eno1 tso off gso off " did not helped with me :-(
kernel 5.3.10-1-pve

e1000 driver hang

Renowned Member

Renowned Member

Distinguished Member

Renowned Member

Well-Known Member

Renowned Member

Distinguished Member

Renowned Member

Distinguished Member

Active Member

Member

Renowned Member

Famous Member

Distinguished Member

Famous Member

Distinguished Member

Renowned Member

Member

Famous Member

Well-Known Member

We value your privacy