e1000 driver hang

Hi, the ethtool fix works fine, and I've set-up a post-up action to run in each time eno1 is up.
But after adding a new network interface on one of my vm with the VLAN tag 20, the host freeze issue returns.
mmm, interesting, I was not aware about that.

maybe can you use vlan-aware bridge ? (tag will occur on bridge, and no ethX.y interface will be created)
 
  • Like
Reactions: sdettmer
Sadly the problem is returned after upgrading to proxmox 8.

I had a success with:
Code:
iface eno2 inet manual
 post-up /usr/sbin/ethtool -K $IFACE gso off tso off 2> /dev/null

Now it flaps every now and then, if there is enough last.
 
  • Like
Reactions: logiczny
Sadly the problem is returned after upgrading to proxmox 8.

I had a success with:
Code:
iface eno2 inet manual
 post-up /usr/sbin/ethtool -K $IFACE gso off tso off 2> /dev/null

Now it flaps every now and then, if there is enough last.

note that it can also be done cleany with ifupdown2

Code:
iface eno2 inet manual
         gso-offload off
         tso-offload off

they are also "gro-offload , lro-offload,ufo-offload,tx-offload,rx-offload"

(what is your nic model ? an intel card in a nuc ? because they are known to be buggy since year with offloading)
 
note that it can also be done cleany with ifupdown2

Code:
iface eno2 inet manual
         gso-offload off
         tso-offload off

they are also "gro-offload , lro-offload,ufo-offload,tx-offload,rx-offload"

(what is your nic model ? an intel card in a nuc ? because they are known to be buggy since year with offloading)
My board is E3C246D2I, so the NIC its intel i219LM.

I already tested and turned off everything you mentioned. It still flaps...
 
Ok so everything was working fine for a few weeks, but now have experienced 2 drop outs (no where near as bad as before)

Code:
Jul 16 13:17:01 proxmox CRON[898372]: pam_unix(cron:session): session closed for user root
Jul 16 13:19:14 proxmox kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
  TDH                  <54>
  TDT                  <ab>
  next_to_use          <ab>
  next_to_clean        <53>
buffer_info[next_to_clean]:
  time_stamp           <104d87c9b>
  next_to_watch        <54>
  jiffies              <104d87e28>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
Jul 16 13:19:16 proxmox kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
  TDH                  <54>
  TDT                  <ab>
  next_to_use          <ab>
  next_to_clean        <53>
buffer_info[next_to_clean]:
  time_stamp           <104d87c9b>
  next_to_watch        <54>
  jiffies              <104d88020>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
Jul 16 13:19:18 proxmox kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
  TDH                  <54>
  TDT                  <ab>
  next_to_use          <ab>
  next_to_clean        <53>
buffer_info[next_to_clean]:
  time_stamp           <104d87c9b>
  next_to_watch        <54>
  jiffies              <104d88210>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
Jul 16 13:19:20 proxmox kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
  TDH                  <54>
  TDT                  <ab>
  next_to_use          <ab>
  next_to_clean        <53>
buffer_info[next_to_clean]:
  time_stamp           <104d87c9b>
  next_to_watch        <54>
  jiffies              <104d88408>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
Jul 16 13:19:20 proxmox kernel: e1000e 0000:00:1f.6 enp0s31f6: Reset adapter unexpectedly
Jul 16 13:19:20 proxmox kernel: vmbr0: port 1(enp0s31f6) entered disabled state
Jul 16 13:19:24 proxmox kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
 
Hello all.

I read all 10 pages here, very interesting to follow the ongoing investigations. I, too, ran into the issue using an Intel NUC8i7BEH with the infamous I219-V ethernet controller. I switched from an Intel NUC6i3SYH to this newer device and was immediately confused that after some uncertain uptime the system does not respond anymore - most obvious by the HDD LED not being active anymore :oops:

I have done the changes as suggested in https://forum.proxmox.com/threads/e1000-driver-hang.58284/post-463259 and will monitor the NUC closely.

Thanks a lot for all your efforts up to this point :)
 
  • Like
Reactions: sdettmer
Hi all,

thanks so much for all the details!

I used post-up in config ("post-up /usr/sbin/ethtool -K $IFACE tso off gso off")
iface lo inet loopback

iface eno2 inet manual
# BEGIN ANSIBLE MANAGED BLOCK
post-up /usr/bin/logger -p debug -t ifup "Disabling tso and gso (offloading) for $IFACE..." && \
/usr/sbin/ethtool -K $IFACE tso off gso off && \
/usr/bin/logger -p debug -t ifup "Disabled tso and gso (offloading) for $IFACE." || \
true
# END ANSIBLE MANAGED BLOCK

auto vmbr0
iface vmbr0 inet static
address ....

by using the an ansible playbook (network-interface-disable-segment-offloading.yaml).
- hosts: pve
gather_facts: no
vars:
conf: '/etc/network/interfaces'
hook: |
post-up /usr/bin/logger -p debug -t ifup "Disabling tso and gso (offloading) for $IFACE..." && \
/usr/sbin/ethtool -K $IFACE tso off gso off && \
/usr/bin/logger -p debug -t ifup "Disabled tso and gso (offloading) for $IFACE." || \
true
tasks:
- name: Disabling segement offloading on Ethernet eno2
ansible.builtin.blockinfile:
path: "{{ conf }}"
insertafter: ^iface eno2 inet manual
block: "{{ hook | indent(8, first=true) }}"

EDIT: and since a week I did not see any "Detected Hardware Unit Hang:" messages anymore with my "Intel Corporation Ethernet Connection (11) I219-LM" e1000e.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!