Proxmox 6.8.12-9-pve kernel has introduced a problem with e1000e Driver and network connection lost after some hours

SelfMan · May 27, 2025

6.8.12-11-pve was released for both PVE and PBS, there were some changes to the ABI.
https://git.proxmox.com/?p=pve-kernel.git;a=shortlog;h=refs/heads/bookworm-6.8

ABI stands for Application Binary Interface. It defines the low-level interface between the kernel and its modules (such as device drivers), specifying how compiled code interacts with the kernel at the binary level. This includes details like register usage, memory layout, calling conventions, and symbol versions of exported kernel functions and variables

blaze35 · May 28, 2025

I still have this issue also with kernel 6.8.12-11-pve. Will this be fixed in a future kernel or do I have to apply the ethtool workaround?

Zaphood · May 28, 2025

For what it's worth (and may help in the debug process): I have 3 Systems in my Home Lab with Intel 217/219 NICs.

Only the 217-LM (Rev04) is affected by freezes. The other systems with 219-V (Rev 21) and 219-V (Rev 31) are working fine with all kernels up
to 6.8.12-10-pve (did not test -11 so far).

Main production Servers with Intel X710 and X550 are also working fine with kernels -8, -9 and -10.

PwrBank · May 30, 2025

We have a few machines with `Intel Corporation Ethernet Connection (17) I219-LM (rev 11)` that are having this issue on the 6.8.12-11-pve kernel.

leandrosgf · May 30, 2025

PwrBank said:
We have a few machines with `Intel Corporation Ethernet Connection (17) I219-LM (rev 11)` that are having this issue on the 6.8.12-11-pve kernel.

Thanks for letting us know. We should wait for a fix in a future kernel. We hope to have someone from Proxmox looking into this soon.
Meanwhile, still running the pinned 6.8.12-8

NullBy7e · Saturday at 12:02

I also have this issue, all my LXCs lose connection on sporadic intervals but always many hours apart, I have uptimekuma set up on a different server to monitor this for me and it shows.

Network card:

07:00.0 Ethernet controller: Intel Corporation I210 Gigabit Network Connection (rev 03)

PVE:

proxmox-ve: 8.4.0 (running kernel: 6.8.12-10-pve)

SelfMan · Sunday at 13:11

I've upgraded my system on Tuesday from 6.8.12-10-pve to 6.8.12-11-pve and it is still online, so yesterday I updated another system, which is technically identical. That one froze just few hours after the upgrade. The main difference is that the second one is also running PBS in a PVE VM.

MALEFX · Sunday at 13:19

Is it possible that on a network card so common in millions of PCs this very serious bug is still present? absurd

Zaphood · Sunday at 14:00

I am under the impression, that this problem seems to occur only on Intel 217/219 with revisions <21…?

SelfMan · Sunday at 20:51

I forgot tomention that I had implemented the offloading config in /etc/network/interfaces on my system. So that did help here.

PwrBank · Monday at 15:54

MALEFX said:
Is it possible that on a network card so common in millions of PCs this very serious bug is still present? absurd

Yeah, normally the Proxmox Team and the upstream is pretty good about this, but this is kinda nuts. It's been a couple of months now.

SelfMan · Monday at 16:56

PwrBank said:
Yeah, normally the Proxmox Team and the upstream is pretty good about this, but this is kinda nuts. It's been a couple of months now.

Sadly not months but years. This thread exists from September 2019

A

Thread 'e1000 driver hang'

Sep 23, 2019

In the past week we are seeing random e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang failuresacross all our nodes, even different hardware hosts. Must do a reset of the host.

There are lots of references to this issue going back 5+ years. Was there a driver change with the latest updates? We've run years with this hardware without issue. Now just this week its popping up all over.

Kernel Version Linux 5.0.21-2-pve #1 SMP PVE 5.0.21-3 (Thu, 05 Sep 2019 13:56:01 +0200)
PVE Manager Version pve-manager/6.0-7/2898402

Sep 22 20:03:08 vmhost03 kernel: [154458.471981] e1000e...

PwrBank · Monday at 17:17

SelfMan said:
Sadly not months but years. This thread exists from September 2019

It's odd that this kernel update brought this regression with it though

SelfMan · Monday at 19:36

Looking at the PVE-Kernel git, there are almost no ethernet related updates.
https://git.proxmox.com/?p=pve-kernel.git&a=search&h=HEAD&st=commit&s=ethernet

I am wondering if at the end the error isn't Intel NIC but vmbr0 (bridge driver) related.
I am a linux noob and don't know much about the inner workings of the network stack here. It's a pure guess as I haven't seen any Windows server having a problem like this.

mgdfp · 2025-06-05T21:50:57+0200

I have similar problems, my proxmox host drops out from the network occasionally, it has been stable for over one year but it just started happening now so I guess it is related to an upgrade as mentioned earlier in this thread. I have to plug the ethernet cable out and in to get it back.

I am running 6.8.12-11-pve kernal.

This is what I find if I run ethtool -i enp0s31f6

Code:

driver: e1000e
version: 6.8.12-11-pve
firmware-version: 2.3-4
expansion-rom-version:
bus-info: 0000:00:1f.6
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

It just happened now and the logs informed me about hardware Unit Hang:
dmesg | tail -100

Code:

MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[292707.471193] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                  TDH                  <a1>
                  TDT                  <a6>
                  next_to_use          <a6>
                  next_to_clean        <a0>
                buffer_info[next_to_clean]:
                  time_stamp           <111694031>
                  next_to_watch        <a1>
                  jiffies              <1116dd941>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[292709.455165] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                  TDH                  <a1>
                  TDT                  <a6>
                  next_to_use          <a6>
                  next_to_clean        <a0>
                buffer_info[next_to_clean]:
                  time_stamp           <111694031>
                  next_to_watch        <a1>
                  jiffies              <1116de101>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[292711.439132] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                  TDH                  <a1>
                  TDT                  <a6>
                  next_to_use          <a6>
                  next_to_clean        <a0>
                buffer_info[next_to_clean]:
                  time_stamp           <111694031>
                  next_to_watch        <a1>
                  jiffies              <1116de8c1>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[292713.486122] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                  TDH                  <a1>
                  TDT                  <a6>
                  next_to_use          <a6>
                  next_to_clean        <a0>
                buffer_info[next_to_clean]:
                  time_stamp           <111694031>
                  next_to_watch        <a1>
                  jiffies              <1116df0c0>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[292715.470168] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                  TDH                  <a1>
                  TDT                  <a6>
                  next_to_use          <a6>
                  next_to_clean        <a0>
                buffer_info[next_to_clean]:
                  time_stamp           <111694031>
                  next_to_watch        <a1>
                  jiffies              <1116df880>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[292717.454083] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                  TDH                  <a1>
                  TDT                  <a6>
                  next_to_use          <a6>
                  next_to_clean        <a0>
                buffer_info[next_to_clean]:
                  time_stamp           <111694031>
                  next_to_watch        <a1>
                  jiffies              <1116e0040>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[292718.471934] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
[292718.558360] vmbr0: port 1(enp0s31f6) entered disabled state
[292726.136923] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[292726.136966] vmbr0: port 1(enp0s31f6) entered blocking state
[292726.136974] vmbr0: port 1(enp0s31f6) entered forwarding state

Code:

Jun 05 20:31:37 pve-acer-veriton kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
Jun 05 20:31:39 pve-acer-veriton kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
...
Jun 05 20:36:45 pve-acer-veriton kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
Jun 05 20:36:47 pve-acer-veriton kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
Jun 05 20:36:48 pve-acer-veriton kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
Jun 05 20:36:48 pve-acer-veriton kernel: vmbr0: port 1(enp0s31f6) entered disabled state
Jun 05 20:36:56 pve-acer-veriton kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jun 05 20:36:56 pve-acer-veriton kernel: vmbr0: port 1(enp0s31f6) entered blocking state
Jun 05 20:36:56 pve-acer-veriton kernel: vmbr0: port 1(enp0s31f6) entered forwarding state
Jun 05 20:37:43 pve-acer-veriton systemd[1252867]: Listening on dirmngr.socket - GnuPG network certificate management daemon.

ip -s link show enp0s31f6

Code:

2: enp0s31f6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP mode DEFAULT group default qlen 1000
    link/ether d4:61:37:01:c8:33 brd ff:ff:ff:ff:ff:ff
    RX:    bytes   packets errors dropped  missed   mcast           
     35711121667  38257968      0    9749    2413 1347507
    TX:    bytes   packets errors dropped carrier collsns           
    221063815294 157587082      0       0       0       0

After consultation with chatgpt I did the following:

Disabled Energy Efficient Ethernet (EEE)
EEE can apparently cause link flapping or power-saving quirks.

Created a /etc/systemd/system/disable-eee.service file.

Code:

[Unit]
Description=Disable EEE on enp0s31f6
After=network.target

[Service]
ExecStart=/sbin/ethtool --set-eee enp0s31f6 eee off
Type=oneshot
RemainAfterExit=true

[Install]
WantedBy=multi-user.target

activate with:

Code:

systemctl daemon-reexec
systemctl enable --now disable-eee.service

Tuned e1000e driver settings
Created /etc/modprobe.d/e1000e.conf and filled it with:

Code:

options e1000e InterruptThrottleRate=0,0 RxIntDelay=0 TxIntDelay=0
options e1000e enable_eee=0

Applied those changes:

Code:

update-initramfs -u -k all
reboot

That seemed to work, my host was stable for 2 days but today it acted up again, I found this post and applied the ethtool fix suggested here, I put it in /etc/network/interfaces as such:

Code:

iface vmbr0 inet static
    address 192.168.X.X/24
    gateway 192.168.X.X
    bridge-ports enp0s31f6
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
        post-up ethtool -K enp0s31f6 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off

I guess all I can do now is to wait and see if this helps..

Does anyone know if what I have done is legit or if it can have unintended consequences?
I noticed the none of you guys have done the EEE disabling or the driver tuning.. Is this something that I should perhaps remove?

SelfMan · 2025-06-06T00:57:50+0200

I usually do remove the power management stuff on windows, because of "issues" with devices going to sleep and then not communicating, but in this case I am not sure if it's the culprit as all references do point to the offloading problem.
At least on my end, that solved the stuck network adapter issue.

Search

Search

Proxmox 6.8.12-9-pve kernel has introduced a problem with e1000e Driver and network connection lost after some hours

SelfMan

Member

blaze35

New Member

Zaphood

Member

PwrBank

Member

leandrosgf

New Member

NullBy7e

New Member

SelfMan

Member

MALEFX

New Member

Zaphood

Member

SelfMan

Member

PwrBank

Member

SelfMan

Member

Thread 'e1000 driver hang'

PwrBank

Member

SelfMan

Member

mgdfp

Active Member

SelfMan

Member

We value your privacy