e1000 driver hang

Hi, the ethtool fix works fine, and I've set-up a post-up action to run in each time eno1 is up.
But after adding a new network interface on one of my vm with the VLAN tag 20, the host freeze issue returns.
mmm, interesting, I was not aware about that.

maybe can you use vlan-aware bridge ? (tag will occur on bridge, and no ethX.y interface will be created)
 
  • Like
Reactions: sdettmer
Sadly the problem is returned after upgrading to proxmox 8.

I had a success with:
Code:
iface eno2 inet manual
 post-up /usr/sbin/ethtool -K $IFACE gso off tso off 2> /dev/null

Now it flaps every now and then, if there is enough last.
 
  • Like
Reactions: logiczny
Sadly the problem is returned after upgrading to proxmox 8.

I had a success with:
Code:
iface eno2 inet manual
 post-up /usr/sbin/ethtool -K $IFACE gso off tso off 2> /dev/null

Now it flaps every now and then, if there is enough last.

note that it can also be done cleany with ifupdown2

Code:
iface eno2 inet manual
         gso-offload off
         tso-offload off

they are also "gro-offload , lro-offload,ufo-offload,tx-offload,rx-offload"

(what is your nic model ? an intel card in a nuc ? because they are known to be buggy since year with offloading)
 
note that it can also be done cleany with ifupdown2

Code:
iface eno2 inet manual
         gso-offload off
         tso-offload off

they are also "gro-offload , lro-offload,ufo-offload,tx-offload,rx-offload"

(what is your nic model ? an intel card in a nuc ? because they are known to be buggy since year with offloading)
My board is E3C246D2I, so the NIC its intel i219LM.

I already tested and turned off everything you mentioned. It still flaps...
 
Ok so everything was working fine for a few weeks, but now have experienced 2 drop outs (no where near as bad as before)

Code:
Jul 16 13:17:01 proxmox CRON[898372]: pam_unix(cron:session): session closed for user root
Jul 16 13:19:14 proxmox kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
  TDH                  <54>
  TDT                  <ab>
  next_to_use          <ab>
  next_to_clean        <53>
buffer_info[next_to_clean]:
  time_stamp           <104d87c9b>
  next_to_watch        <54>
  jiffies              <104d87e28>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
Jul 16 13:19:16 proxmox kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
  TDH                  <54>
  TDT                  <ab>
  next_to_use          <ab>
  next_to_clean        <53>
buffer_info[next_to_clean]:
  time_stamp           <104d87c9b>
  next_to_watch        <54>
  jiffies              <104d88020>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
Jul 16 13:19:18 proxmox kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
  TDH                  <54>
  TDT                  <ab>
  next_to_use          <ab>
  next_to_clean        <53>
buffer_info[next_to_clean]:
  time_stamp           <104d87c9b>
  next_to_watch        <54>
  jiffies              <104d88210>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
Jul 16 13:19:20 proxmox kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
  TDH                  <54>
  TDT                  <ab>
  next_to_use          <ab>
  next_to_clean        <53>
buffer_info[next_to_clean]:
  time_stamp           <104d87c9b>
  next_to_watch        <54>
  jiffies              <104d88408>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3c00>
PHY Extended Status    <3000>
PCI Status             <10>
Jul 16 13:19:20 proxmox kernel: e1000e 0000:00:1f.6 enp0s31f6: Reset adapter unexpectedly
Jul 16 13:19:20 proxmox kernel: vmbr0: port 1(enp0s31f6) entered disabled state
Jul 16 13:19:24 proxmox kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
 
Hello all.

I read all 10 pages here, very interesting to follow the ongoing investigations. I, too, ran into the issue using an Intel NUC8i7BEH with the infamous I219-V ethernet controller. I switched from an Intel NUC6i3SYH to this newer device and was immediately confused that after some uncertain uptime the system does not respond anymore - most obvious by the HDD LED not being active anymore :oops:

I have done the changes as suggested in https://forum.proxmox.com/threads/e1000-driver-hang.58284/post-463259 and will monitor the NUC closely.

Thanks a lot for all your efforts up to this point :)
 
  • Like
Reactions: sdettmer
Hi all,

thanks so much for all the details!

I used post-up in config ("post-up /usr/sbin/ethtool -K $IFACE tso off gso off")
iface lo inet loopback

iface eno2 inet manual
# BEGIN ANSIBLE MANAGED BLOCK
post-up /usr/bin/logger -p debug -t ifup "Disabling tso and gso (offloading) for $IFACE..." && \
/usr/sbin/ethtool -K $IFACE tso off gso off && \
/usr/bin/logger -p debug -t ifup "Disabled tso and gso (offloading) for $IFACE." || \
true
# END ANSIBLE MANAGED BLOCK

auto vmbr0
iface vmbr0 inet static
address ....

by using the an ansible playbook (network-interface-disable-segment-offloading.yaml).
- hosts: pve
gather_facts: no
vars:
conf: '/etc/network/interfaces'
hook: |
post-up /usr/bin/logger -p debug -t ifup "Disabling tso and gso (offloading) for $IFACE..." && \
/usr/sbin/ethtool -K $IFACE tso off gso off && \
/usr/bin/logger -p debug -t ifup "Disabled tso and gso (offloading) for $IFACE." || \
true
tasks:
- name: Disabling segement offloading on Ethernet eno2
ansible.builtin.blockinfile:
path: "{{ conf }}"
insertafter: ^iface eno2 inet manual
block: "{{ hook | indent(8, first=true) }}"

EDIT: and since a week I did not see any "Detected Hardware Unit Hang:" messages anymore with my "Intel Corporation Ethernet Connection (11) I219-LM" e1000e.
 
Last edited:
Another solution to the problem:

# ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 128
RX Mini: 0
RX Jumbo: 0
TX: 256

# ethtool -G eth0 rx 4096
# ethtool -G eth0 tx 4096

# ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096

Added it to my file /etc/network/interfaces:
pre-up /sbin/ethtool -G eth0 rx 4096
pre-up /sbin/ethtool -G eth0 tx 4096
 
Last edited:
Added it to my file /etc/network/interfaces:
pre-up /sbin/ethtool -G eth0 rx 4096
I appreciate the share - thank you.

How long have you had this running error-free in /etc/network/interfaces?
Did you have /etc/network/interfaces loaded with a different configuration prior?
 
I appreciate the share - thank you.

How long have you had this running error-free in /etc/network/interfaces?
Did you have /etc/network/interfaces loaded with a different configuration prior?
It's been working for a week now.
I just used two commands:
# ethtool -G eth0 rx 4096
# ethtool -G eth0 tx 4096
And since then everything has been stable.
Stitches
pre-up /sbin/ethtool -G eth0 rx 4096
pre-up /sbin/ethtool -G eth0 tx 4096
added by analogy, as in the example
iface eno2 inet manual
post-up /usr/sbin/ethtool -K $IFACE gso off tso off 2> /dev/null
This option worked for me, but had no effect.
Since then the server has not been rebooted.
 
  • Like
Reactions: klausmy
The option with "ethtool -G eth0 rx 4096" allowed the network card to work a little more, but still did not solve the problem.
Found the root of the problem:

Dec 15 01:34:50 pve kernel: [90765.947260] ------------[ cut here ]------------
Dec 15 01:34:50 pve kernel: [90765.947486] igb: Failed to read reg 0x4094!
Dec 15 01:34:50 pve kernel: [90765.947744] WARNING: CPU: 46 PID: 489374 at drivers/net/ethernet/intel/igb/igb_main.c:747 igb_rd32.cold+0x3a/0x46 [igb]
Dec 15 01:34:50 pve kernel: [90765.947969] Modules linked in: tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables softdo>
Dec 15 01:34:50 pve kernel: [90765.948021] enclosure scsi_transport_sas crc32_pclmul nvme igb xhci_pci xhci_pci_renesas i2c_algo_bit ahci nvme_core dca libahci tg3 xhci_hcd i2c_piix4 megaraid_sas
Dec 15 01:34:50 pve kernel: [90765.950543] CPU: 46 PID: 489374 Comm: kworker/46:2 Tainted: P O 5.15.131-1-pve #1
Dec 15 01:34:50 pve kernel: [90765.950838] Hardware name: ASUSTeK COMPUTER INC. RS520A-E11-RS12U/KMPA-U16 Series, BIOS 0601 04/08/2022
Dec 15 01:34:50 pve kernel: [90765.951147] Workqueue: events igb_watchdog_task [igb]
Dec 15 01:34:50 pve kernel: [90765.951455] RIP: 0010:igb_rd32.cold+0x3a/0x46 [igb]
Dec 15 01:34:50 pve kernel: [90765.951770] Code: c7 c6 74 94 37 c0 e8 9b 00 e1 d5 48 8b bb 30 ff ff ff e8 fa 1d 76 d5 84 c0 74 16 44 89 ee 48 c7 c7 40 a1 37 c0 e8 d6 cf d7 d5 <0f> 0b e9 f8 fb fd ff e9 13 fc fd>
Dec 15 01:34:50 pve kernel: [90765.952401] RSP: 0018:ffffb9b303d27db8 EFLAGS: 00010286
Dec 15 01:34:50 pve kernel: [90765.952731] RAX: 0000000000000000 RBX: ffff9199eea60ed0 RCX: ffff91b88eba0588
Dec 15 01:34:50 pve kernel: [90765.953056] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff91b88eba0580
Dec 15 01:34:50 pve kernel: [90765.953382] RBP: ffffb9b303d27dd0 R08: 0000000000000003 R09: 0000000000000001
Dec 15 01:34:50 pve kernel: [90765.953713] R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffff
Dec 15 01:34:50 pve kernel: [90765.954041] R13: 0000000000004094 R14: 00000000000146a3 R15: 0000000002e3383e
Dec 15 01:34:50 pve kernel: [90765.954369] FS: 0000000000000000(0000) GS:ffff91b88eb80000(0000) knlGS:0000000000000000
Dec 15 01:34:50 pve kernel: [90765.954704] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 15 01:34:50 pve kernel: [90765.955044] CR2: 0000000000000030 CR3: 00000001b03ce000 CR4: 0000000000350ee0
Dec 15 01:34:50 pve kernel: [90765.955380] Call Trace:
Dec 15 01:34:50 pve kernel: [90765.955724] <TASK>
Dec 15 01:34:50 pve kernel: [90765.956060] ? show_regs.cold+0x1a/0x1f
Dec 15 01:34:50 pve kernel: [90765.956401] ? igb_rd32.cold+0x3a/0x46 [igb]
Dec 15 01:34:50 pve kernel: [90765.956750] ? __warn+0x8c/0x100
Dec 15 01:34:50 pve kernel: [90765.957084] ? igb_rd32.cold+0x3a/0x46 [igb]
Dec 15 01:34:50 pve kernel: [90765.957424] ? report_bug+0xa4/0xd0
Dec 15 01:34:50 pve kernel: [90765.957768] ? handle_bug+0x39/0x90
Dec 15 01:34:50 pve kernel: [90765.958099] ? exc_invalid_op+0x19/0x70
Dec 15 01:34:50 pve kernel: [90765.958430] ? asm_exc_invalid_op+0x1b/0x20
Dec 15 01:34:50 pve kernel: [90765.958767] ? igb_rd32.cold+0x3a/0x46 [igb]
Dec 15 01:34:50 pve kernel: [90765.959108] ? igb_rd32.cold+0x3a/0x46 [igb]
Dec 15 01:34:50 pve kernel: [90765.959440] igb_update_stats+0x3c1/0x880 [igb]
Dec 15 01:34:50 pve kernel: [90765.959780] igb_watchdog_task+0xa8/0x480 [igb]
Dec 15 01:34:50 pve kernel: [90765.960111] process_one_work+0x22b/0x3d0
Dec 15 01:34:50 pve kernel: [90765.960437] worker_thread+0x53/0x420
Dec 15 01:34:50 pve kernel: [90765.960770] ? process_one_work+0x3d0/0x3d0
Dec 15 01:34:50 pve kernel: [90765.961098] kthread+0x12a/0x150
Dec 15 01:34:50 pve kernel: [90765.961427] ? set_kthread_struct+0x50/0x50
Dec 15 01:34:50 pve kernel: [90765.961759] ret_from_fork+0x22/0x30
Dec 15 01:34:50 pve kernel: [90765.962085] </TASK>
Dec 15 01:34:50 pve kernel: [90765.962406] ---[ end trace ac2f404cbbb250cb ]---
Dec 15 01:34:50 pve kernel: [90765.962735] igb 0000:81:00.0 enp129s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
Dec 15 01:34:52 pve kernel: [90767.910867] igb 0000:81:00.0 enp129s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
Dec 15 01:34:54 pve kernel: [90769.894864] igb 0000:81:00.0 enp129s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
Dec 15 01:34:56 pve kernel: [90771.910899] igb 0000:81:00.0 enp129s0f0: malformed Tx packet detected and dropped, LVMMC:0xffffffff
.....and so long....

Judging by the information on the network, the problem is with the network card of the model "intel i350"
 
In general, I bought a new network card (Broadcom NetXtreme BCM59719-4P) compatible with my server. I switched the entire network to it and there were no more problems.
 
  • Like
Reactions: tomee
Same problem here. Had interruptions and errors while transfering files with restic over sftp (sftp error 255) from a vm. dmsg on proxmox host gave me a hint to this thread.

For now it is working for me:

Code:
root@pve01:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface eno1 inet manual
        post-up /usr/sbin/ethtool -K $IFACE gso off tso off gro off 2> /dev/null

auto vmbr0
iface vmbr0 inet static
        address 192.168.76.10/24
        gateway 192.168.76.1
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0
        post-up /usr/sbin/ethtool -K $IFACE gso off tso off gro off 2> /dev/null

iface vmbr0 inet6 static
        address fe80::ea6a:64ff:fed7:d6c/64
        gateway fd00::b2f2:8ff:feb8:7368
        post-up /usr/sbin/ethtool -K $IFACE gso off tso off gro off 2> /dev/null

iface wlp2s0 inet manual

Code:
root@pve01:~# lspci|grep Ethernet
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-V (rev 10)

PVE 8.1.3
Code:
root@pve01:~# uname -a
Linux pve01 6.5.11-7-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-7 (2023-12-05T09:44Z) x86_64 GNU/Linux
 
  • Like
Reactions: KingPrawn22
Same problem here. Had interruptions and errors while transfering files with restic over sftp (sftp error 255) from a vm. dmsg on proxmox host gave me a hint to this thread.

For now it is working for me:

Code:
root@pve01:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface eno1 inet manual
        post-up /usr/sbin/ethtool -K $IFACE gso off tso off gro off 2> /dev/null

auto vmbr0
iface vmbr0 inet static
        address 192.168.76.10/24
        gateway 192.168.76.1
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0
        post-up /usr/sbin/ethtool -K $IFACE gso off tso off gro off 2> /dev/null

iface vmbr0 inet6 static
        address fe80::ea6a:64ff:fed7:d6c/64
        gateway fd00::b2f2:8ff:feb8:7368
        post-up /usr/sbin/ethtool -K $IFACE gso off tso off gro off 2> /dev/null

iface wlp2s0 inet manual

Code:
root@pve01:~# lspci|grep Ethernet
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-V (rev 10)

PVE 8.1.3
Code:
root@pve01:~# uname -a
Linux pve01 6.5.11-7-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.11-7 (2023-12-05T09:44Z) x86_64 GNU/Linux
Thanks! With several resolutions in this thread that seemed to work for some and not for others, I used this and the problem looks to be resolved.
 
I don't know if the solution is what's causing my new issue here:
https://forum.proxmox.com/threads/n...-shutdown-to-reenable-gpu-passthrough.144046/

I was experiencing the same issue mentioned in this thread, and the mentioned solution worked for me:
cat /etc/network/interfaces
auto lo
iface lo inet loopback

iface enp0s31f6 inet manual
# https://forum.proxmox.com/threads/e1000-driver-hang.58284/page-10
# fix for proxmox kernel: e1000e 0000:00:1f.6 enp0s31f6: Reset adapter unexpectedly
post-up /usr/bin/logger -p debug -t ifup "Disabling segmentation offload for eno1" && /sbin/ethtool -K $IFACE tso off gso off && /usr/bin/logger -p debug -t ifup "Disabled offload for eno1"

auto vmbr0
iface vmbr0 inet static
address 192.168.0.253/24
gateway 192.168.0.1
bridge-ports enp0s31f6
bridge-stp off
bridge-fd 0
# https://forum.proxmox.com/threads/e1000-driver-hang.58284/page-10
# fix for proxmox kernel: e1000e 0000:00:1f.6 enp0s31f6: Reset adapter unexpectedly
post-up /usr/bin/logger -p debug -t ifup "Disabling segmentation offload for eno1" && /sbin/ethtool -K $IFACE tso off gso off && /usr/bin/logger -p debug -t ifup "Disabled offload for eno1"

source /etc/network/interfaces.d/*

However, the new issue now is the NIC is getting disabled randomly and is not re-enabling automatically:
Mar 28 08:32:06 proxmox kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
Mar 28 08:32:06 proxmox kernel: vmbr0: port 1(enp0s31f6) entered disabled state

Bad thing is my GPU is passed through so I can't access the Proxmox terminal to re-up it again; I always need to force shut down the server.

Specs:
Intel I219V
CPU(s) 8 x 12th Gen Intel(R) Core(TM) i3-12100 (1 Socket)
Kernel Version Linux 6.5.13-1-pve (2024-02-05T13:50Z)
 
Last edited:
I was getting these freezes and errors on a Proxmox I recently installed on a Lenovo mini PC. The fix kingp0dd posted just above was working for me but I recently noticed there had been some kernel updates come down from Proxmox so I just reverted the changes to go back to the original Interfaces file. So far no errors and everything is working normally. I'll let it run like this until I see otherwise, perhaps there was a driver update in one of those new kernel packages.


EDIT: OK everything was great for a few hours but I've started to get the hang errors again so I'm going to put the fix back into the Interfaces files.
 
Last edited:
  • Like
Reactions: kingp0dd
Its 4/22/2024, I updated my proxmox to 8.0 and this problem came up again.

LOG:
Apr 21 21:32:23 nyc kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
TDH <0>
TDT <8>
next_to_use <8>
next_to_clean <0>
buffer_info[next_to_clean]:
time_stamp <10065fc97>
next_to_watch <0>
jiffies <100660008>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
----------------------------------------------------------------------------------------
Version:

pve-manager/8.1.10/4b06efb5db453f29 (running kernel: 6.5.13-5-pve)
----------------------------------------------------------------------------------------
FIX was this:

ethtool -K <interface> tso off gso off

in my case:
ethtool -K eno1 tso off gso off
----------------------------------------------------------------------------------------
NOTE:
Not sure if it degrades performance or not, also not sure why this issue came up a few years later again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!