Proxmox 6.8.12-9-pve kernel has introduced a problem with e1000e Driver and network connection lost after some hours

I have similar problems, my proxmox host drops out from the network occasionally, it has been stable for over one year but it just started happening now so I guess it is related to an upgrade as mentioned earlier in this thread. I have to plug the ethernet cable out and in to get it back.

I am running 6.8.12-11-pve kernal.

This is what I find if I run ethtool -i enp0s31f6
Code:
driver: e1000e
version: 6.8.12-11-pve
firmware-version: 2.3-4
expansion-rom-version:
bus-info: 0000:00:1f.6
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

It just happened now and the logs informed me about hardware Unit Hang:
dmesg | tail -100

Code:
MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[292707.471193] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                  TDH                  <a1>
                  TDT                  <a6>
                  next_to_use          <a6>
                  next_to_clean        <a0>
                buffer_info[next_to_clean]:
                  time_stamp           <111694031>
                  next_to_watch        <a1>
                  jiffies              <1116dd941>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[292709.455165] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                  TDH                  <a1>
                  TDT                  <a6>
                  next_to_use          <a6>
                  next_to_clean        <a0>
                buffer_info[next_to_clean]:
                  time_stamp           <111694031>
                  next_to_watch        <a1>
                  jiffies              <1116de101>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[292711.439132] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                  TDH                  <a1>
                  TDT                  <a6>
                  next_to_use          <a6>
                  next_to_clean        <a0>
                buffer_info[next_to_clean]:
                  time_stamp           <111694031>
                  next_to_watch        <a1>
                  jiffies              <1116de8c1>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[292713.486122] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                  TDH                  <a1>
                  TDT                  <a6>
                  next_to_use          <a6>
                  next_to_clean        <a0>
                buffer_info[next_to_clean]:
                  time_stamp           <111694031>
                  next_to_watch        <a1>
                  jiffies              <1116df0c0>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[292715.470168] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                  TDH                  <a1>
                  TDT                  <a6>
                  next_to_use          <a6>
                  next_to_clean        <a0>
                buffer_info[next_to_clean]:
                  time_stamp           <111694031>
                  next_to_watch        <a1>
                  jiffies              <1116df880>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[292717.454083] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
                  TDH                  <a1>
                  TDT                  <a6>
                  next_to_use          <a6>
                  next_to_clean        <a0>
                buffer_info[next_to_clean]:
                  time_stamp           <111694031>
                  next_to_watch        <a1>
                  jiffies              <1116e0040>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3800>
                PHY Extended Status    <3000>
                PCI Status             <10>
[292718.471934] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
[292718.558360] vmbr0: port 1(enp0s31f6) entered disabled state
[292726.136923] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[292726.136966] vmbr0: port 1(enp0s31f6) entered blocking state
[292726.136974] vmbr0: port 1(enp0s31f6) entered forwarding state


journalctl --since "10 minutes ago" --no-pager | grep -Ei 'network|link|enp0s31f6|vmbr0|e1000e'

Code:
Jun 05 20:31:37 pve-acer-veriton kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
Jun 05 20:31:39 pve-acer-veriton kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
...
Jun 05 20:36:45 pve-acer-veriton kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
Jun 05 20:36:47 pve-acer-veriton kernel: e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
Jun 05 20:36:48 pve-acer-veriton kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
Jun 05 20:36:48 pve-acer-veriton kernel: vmbr0: port 1(enp0s31f6) entered disabled state
Jun 05 20:36:56 pve-acer-veriton kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jun 05 20:36:56 pve-acer-veriton kernel: vmbr0: port 1(enp0s31f6) entered blocking state
Jun 05 20:36:56 pve-acer-veriton kernel: vmbr0: port 1(enp0s31f6) entered forwarding state
Jun 05 20:37:43 pve-acer-veriton systemd[1252867]: Listening on dirmngr.socket - GnuPG network certificate management daemon.

ip -s link show enp0s31f6

Code:
2: enp0s31f6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP mode DEFAULT group default qlen 1000
    link/ether d4:61:37:01:c8:33 brd ff:ff:ff:ff:ff:ff
    RX:    bytes   packets errors dropped  missed   mcast         
     35711121667  38257968      0    9749    2413 1347507
    TX:    bytes   packets errors dropped carrier collsns         
    221063815294 157587082      0       0       0       0

After consultation with chatgpt I did the following:

Disabled Energy Efficient Ethernet (EEE)
EEE can apparently cause link flapping or power-saving quirks.

Created a /etc/systemd/system/disable-eee.service file.

Code:
[Unit]
Description=Disable EEE on enp0s31f6
After=network.target

[Service]
ExecStart=/sbin/ethtool --set-eee enp0s31f6 eee off
Type=oneshot
RemainAfterExit=true

[Install]
WantedBy=multi-user.target

activate with:
Code:
systemctl daemon-reexec
systemctl enable --now disable-eee.service

Tuned e1000e driver settings
Created /etc/modprobe.d/e1000e.conf and filled it with:
Code:
options e1000e InterruptThrottleRate=0,0 RxIntDelay=0 TxIntDelay=0
options e1000e enable_eee=0

Applied those changes:
Code:
update-initramfs -u -k all
reboot

That seemed to work, my host was stable for 2 days but today it acted up again, I found this post and applied the ethtool fix suggested here, I put it in /etc/network/interfaces as such:

Code:
iface vmbr0 inet static
    address 192.168.X.X/24
    gateway 192.168.X.X
    bridge-ports enp0s31f6
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
        post-up ethtool -K enp0s31f6 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off

I guess all I can do now is to wait and see if this helps..

Does anyone know if what I have done is legit or if it can have unintended consequences?
I noticed the none of you guys have done the EEE disabling or the driver tuning.. Is this something that I should perhaps remove?
Thank you! This is basically what I went through as well. Unfortunately, it did not help. On top of that, my USB NICs were added to create a bond (active-backup) but _even then_ that didn't work since the first NIC was alive but useless so the second NIC didn't come into play. The only way I could get things going was to use round-robin and I have lots of packet loss and latency. Unplugging the first NIC 'fixed' it until I could reboot.

Some are mentioning 219 (21) as the problem but I'm running rev. 10 and cannot make things work. I think I can afford downgrading my kernel to .8 (as some are suggesting) now that I have a backup NIC.
```
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (7) I219-V [8086:15bc] (rev 10)
DeviceName: Onboard - Ethernet
Subsystem: Lenovo Ethernet Connection (7) I219-V [17aa:312a]
Kernel driver in use: e1000e
Kernel modules: e1000e
```
 
Last edited:
Is there any progress on this topic. I just updated proxmox as I thought this should be fixed as a long time has gone... directly after a reboot network gone.
I thought I bricked my device. After a simple disconnect-reconnect of the ethernet cable I directly found the device. So I guess we still have the issue of the kernel not supporting this NIC correctly, right?
 
  • Like
Reactions: sammyke007
Well, would be great if somebody from the Proxmox Team would take the time to look into this...
Yes please. We need your help on this. Yesterday I updated my machine. Directly after a reboot the machine was lost on the network. Cable out, cable in -> There again. Tomorrow morning the device was lost on the network again.:(
Please offer an option for those buggy NIC kernels :)
 
My advice would be to roll back or to get a different NIC. This is a driver kernel bug that goes back years which has resurfaced.
 
My advice would be to roll back or to get a different NIC. This is a driver kernel bug that goes back years which has resurfaced.
I'd guess that a lot of home labs are running on Intel NUC or some Lenovo/Fujitsu/Dell SFF PC, which are all rocking internal NICs. So no option there to change the NIC, as these systems do not offer a PCI slot to add a different NIC. Hence it would be great if the Team at Proxmox would look into the matter and maybe provide a workaround.
 
  • Like
Reactions: Astronaut
I'd guess that a lot of home labs are running on Intel NUC or some Lenovo/Fujitsu/Dell SFF PC, which are all rocking internal NICs. So no option there to change the NIC, as these systems do not offer a PCI slot to add a different NIC. Hence it would be great if the Team at Proxmox would look into the matter and maybe provide a workaround.
Please Open a Ticket.
 
Unfortunately I am unable to conduct tests with the team then. Two of my systems are in my house, which are working fine with the NICs I219-V (rev 31) and I219-V (rev 21). The affected one is on a remote site, so I am unable to run any tests there, as I would lock myself out if anything goes sideways. This system has an Intel I217-LM (rev 04) which IS affected. So for the time being, I am running it with a pinned .8 Kernel, hoping that someone would be able to open a ticket AND have the system on site for tests.
 
I received the answer:

Hello,
You don't need a subscription[0] to open a bug-report in our bugzilla: https://bugzilla.proxmox.com.
That being said - this particular issue - that some Intel NICs tend to run into unit hangs with some kernel-versions is known to some extent[1].


Usually installing the latest firmware for the NIC if available or disabling offloading resolves the issue.


Did you try these mitigations? If they don't help you can also try running the 6.14 opt-in kernel:


https://forum.proxmox.com/threads/o...e-8-available-on-test-no-subscription.164497/

Sadly the issue has been around - and every single fix sent to the kernel mailing list and applied in a kernel version usually

causes some other Intel NICs to have similar issues - so there is no simple fix that fixes all Intel NICs, with all firmware-versions

provided by all Hardware vendors.

If none of the suggestions help - and the issue is not in the list in [1] - feel free to open a new bugzilla entry - and provide
the journal since booting/dmesg and pveversion -v outputs that show the exact issue.


I hope this helps!


stoiko
 
  • Like
Reactions: leandrosgf
After many tries beside changing the kernel version, I switched to an USB3 GBps (ASIX AX88179) adapter without any issue since.
Maybe I will try again in a few updates if a fix comes.
 
I upgraded on the 26th and so far (6 days) this "e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:" has hit me twice.

Proxmox: 8.4.1
Kernel: 6.8.12-11-pve
NIC: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)

This is my home system so I've just renegotiated the device when it hangs with: ethtool --negotiate eno1

Reading through this thread, I will go through trying Firmware Update, 6.14 opt-in kernel and lastly tuning the NIC with ethtool in /etc/networks/interfaces.

Will report back.
 
I have just installed a new system, rocking a Lenovo Thinkcentre 920q with I219-LM (Rev 10) which has the same problem. I am under the impression that only the LM-Version seems to be affected, not the V? I am really puzzled, that a Thinkcentre 700 (which is 2 years older) has a more modern and more professional NIC (I219-V Rev 31) than the 920q. Hard to tell which system to buy, when the vendors do not show the Details of the NIC in the specs.
 
Last edited:
Thank you! This is basically what I went through as well. Unfortunately, it did not help. On top of that, my USB NICs were added to create a bond (active-backup) but _even then_ that didn't work since the first NIC was alive but useless so the second NIC didn't come into play. The only way I could get things going was to use round-robin and I have lots of packet loss and latency. Unplugging the first NIC 'fixed' it until I could reboot.

Some are mentioning 219 (21) as the problem but I'm running rev. 10 and cannot make things work. I think I can afford downgrading my kernel to .8 (as some are suggesting) now that I have a backup NIC.
```
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (7) I219-V [8086:15bc] (rev 10)
DeviceName: Onboard - Ethernet
Subsystem: Lenovo Ethernet Connection (7) I219-V [17aa:312a]
Kernel driver in use: e1000e
Kernel modules: e1000e
```

Just as a follow-up: I ran the community script and once I rebooted I appear to be in the clear of any issues.
 
i am working on some scripts to automate this

## Features
- **Auto-detection**: Automatically finds the primary network interface
- Works with any kernel version (no need to downgrade)
- **Bridge support**: Special handling for Proxmox bridge interfaces (vmbr0, etc.)
- **Physical interface restart**: For bridges, also restarts underlying physical interfaces
- **Hardware hang detection**: Detects and handles Intel e1000e controller hangs from kernel logs
- **Hardware-level reset**: Module reload and PCI reset for hardware hangs (e1000e, etc.)
- **Driver-aware resets**: Different reset strategies based on network controller driver
- **Connectivity checking**: Tests network before and after restart with multiple targets
- **Retry logic**: Attempts multiple times if first try fails
- **DHCP renewal**: Automatically attempts DHCP lease renewal
- **Extended diagnostics**: Comprehensive logging and troubleshooting information
- **Logging**: Comprehensive logging to `/var/log/network-fix.log`
- **Safe operation**: Checks for root privileges and interface existence
- **Interactive mode**: Prompts before restarting if network appears working
ethtool approach as a fallback option if module reload doesn't work
**Command output of a successful test**
```
root@proxmox:~/Network_Tools# ./fix-network.sh
=== Proxmox Network Interface Restart Script ===
[2025-07-12 23:04:13] [INFO] Script started
Using network interface: vmbr0
[2025-07-12 23:04:13] [INFO] Using network interface: vmbr0
Checking current network connectivity...
Network appears to be working. Are you sure you want to restart the interface? (y/N)
y
Attempt 1/3 to restart network interface...
[2025-07-12 23:04:16] [INFO] Attempting to restart interface: vmbr0
[2025-07-12 23:04:16] [INFO] Interface vmbr0 is a bridge with members: eno2 fwpr100p0 fwpr102p0 fwpr103p0 fwpr104p0 fwpr105p0 fwpr106p0 fwpr108p0 fwpr109p0 fwpr110p0 fwpr113p0 fwpr117p0 fwpr119p0
Detected bridge interface vmbr0 with members: eno2 fwpr100p0 fwpr102p0 fwpr103p0 fwpr104p0 fwpr105p0 fwpr106p0 fwpr108p0 fwpr109p0 fwpr110p0 fwpr113p0 fwpr117p0 fwpr119p0
Restarting bridge member: eno2
[2025-07-12 23:04:16] [INFO] Restarting physical bridge member: eno2
[2025-07-12 23:04:16] [INFO] Bridge member eno2 uses driver: e1000e
Intel e1000e controller detected for bridge member eno2, using proactive hardware reset
[2025-07-12 23:04:16] [INFO] Intel e1000e controller detected for bridge member eno2, applying proactive hardware reset
[2025-07-12 23:04:16] [INFO] Attempting hardware-level reset for eno2 (driver: e1000e)
Performing hardware reset for eno2...
Step 1: Attempting ethtool reset...
[2025-07-12 23:04:16] [INFO] Attempting ethtool reset for eno2
[2025-07-12 23:04:16] [INFO] ethtool reset successful for eno2
Actual changes:
tx-checksum-ipv4: off [requested on]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [requested on]
tx-checksum-fcoe-crc: off [requested on]
tx-checksum-sctp: off [requested on]
rx-checksum: on
Step 2: Detected Intel e1000e controller, attempting module reload...
[2025-07-12 23:04:21] [INFO] Attempting e1000e module reload for hardware hang recovery
[2025-07-12 23:04:21] [INFO] Interface eno2 PCI address: 0000:00:1f.6
Removing e1000e module...
[2025-07-12 23:04:23] [INFO] e1000e module removed successfully
Reloading e1000e module...
[2025-07-12 23:04:26] [INFO] e1000e module reload completed successfully
Step 3: Applying Proxmox community workaround...
Applying Proxmox forum workaround: disabling problematic features...
[2025-07-12 23:04:31] [INFO] Disabling problematic ethtool features for eno2 (Proxmox forum workaround)
[2025-07-12 23:04:31] [INFO] Disabled gso for eno2
[2025-07-12 23:04:31] [INFO] Disabled gro for eno2
[2025-07-12 23:04:31] [INFO] Disabled tso for eno2
[2025-07-12 23:04:31] [INFO] Disabled tx for eno2
[2025-07-12 23:04:31] [INFO] Disabled rx for eno2
Actual changes:
tx-vlan-hw-insert: off [not requested]
rx-vlan-hw-parse: off
[2025-07-12 23:04:31] [INFO] Disabled rxvlan for eno2
[2025-07-12 23:04:31] [INFO] Disabled txvlan for eno2
[2025-07-12 23:04:31] [INFO] Disabled sg for eno2
[2025-07-12 23:04:31] [INFO] Current features for eno2:
Creating persistent configuration for eno2...
[2025-07-12 23:04:31] [INFO] Creating persistent ethtool configuration for eno2
[2025-07-12 23:04:32] [INFO] Created and enabled persistent ethtool workaround service for eno2
✓ Persistent configuration created: /etc/systemd/system/ethtool-workaround-eno2.service
Hardware reset procedure completed for eno2
[2025-07-12 23:04:35] [INFO] Hardware reset procedure completed successfully for eno2
[2025-07-12 23:04:35] [DEBUG] Skipping virtual interface: fwpr100p0
[2025-07-12 23:04:35] [DEBUG] Skipping virtual interface: fwpr102p0
[2025-07-12 23:04:35] [DEBUG] Skipping virtual interface: fwpr103p0
[2025-07-12 23:04:35] [DEBUG] Skipping virtual interface: fwpr104p0
[2025-07-12 23:04:35] [DEBUG] Skipping virtual interface: fwpr105p0
[2025-07-12 23:04:35] [DEBUG] Skipping virtual interface: fwpr106p0
[2025-07-12 23:04:35] [DEBUG] Skipping virtual interface: fwpr108p0
[2025-07-12 23:04:35] [DEBUG] Skipping virtual interface: fwpr109p0
[2025-07-12 23:04:35] [DEBUG] Skipping virtual interface: fwpr110p0
[2025-07-12 23:04:35] [DEBUG] Skipping virtual interface: fwpr113p0
[2025-07-12 23:04:35] [DEBUG] Skipping virtual interface: fwpr117p0
[2025-07-12 23:04:35] [DEBUG] Skipping virtual interface: fwpr119p0
Bridge member summary: 1 physical, 12 virtual (skipped)
[2025-07-12 23:04:35] [INFO] Bridge member summary: 1 physical interfaces processed, 12 virtual interfaces skipped
Waiting for bridge members to stabilize...
Bringing down interface vmbr0...
[2025-07-12 23:04:40] [INFO] Interface vmbr0 brought down successfully
Bringing up interface vmbr0...
[2025-07-12 23:04:43] [INFO] Interface vmbr0 brought up successfully
Restarting networking service...
[2025-07-12 23:04:50] [INFO] Networking service restarted successfully
Waiting for bridge to stabilize...
[2025-07-12 23:05:00] [INFO] Interface status after restart: 5: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
Attempting DHCP renewal...
[2025-07-12 23:05:00] [INFO] Attempting DHCP renewal for interface vmbr0
[2025-07-12 23:05:02] [INFO] DHCP release successful for vmbr0
[2025-07-12 23:05:04] [INFO] DHCP renewal successful for vmbr0
Verifying network connectivity...
Bridge interface detected, allowing extra time for stabilization...
Attempt 1/20: Network still not reachable, waiting 5s...
Attempt 2/20: Network still not reachable, waiting 5s...
Network connectivity restored!
[2025-07-12 23:05:29] [INFO] Network connectivity verified after 3 attempts
Network interface restart completed successfully!
[2025-07-12 23:05:29] [INFO] Network interface restart completed successfully on attempt 1
root@proxmox:~/Network_Tools#
```
 
Last edited: