Proxmox 6.8.12-9-pve kernel has introduced a problem with e1000e Driver and network connection lost after some hours

Yes. Just recognized as the issue returned. But I don´t have this as a backup. Just .10 and .9 and an older 5.x....Kernel, but will this be compatible?
You can try to pin the .8 version and see if its still there. I was facing the same problem seeing .9 and .10 versions available, so I installed the .8 version manually and it's working fine ( seems to be compatible but needs to keep running for the next hours or days to confirm there is no compatible issues.

By the way, if there is no compatible issues today, we have no guarantee that it could happen further, so that why we need to have someone looking into ASAP.
 
My nic is l219-lm from a thinkcentre m920q and I was facing same issue. Rolled back to .8 seems to be fixed.
That was a nightmare to debug lol
 
Has someone tried Kernel 6.14 which is contained in Proxmox 8.4 as opt in?
I running Linux 6.14.0-2-pve at the moment but it has same problem as 6.8.12-10 :(

I have
Code:
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (11) I219-LM
        DeviceName: Onboard Lan
        Subsystem: Hewlett-Packard Company Ethernet Connection (11) I219-LM
        Flags: bus master, fast devsel, latency 0, IRQ 125, IOMMU group 8
        Memory at e1200000 (32-bit, non-prefetchable) [size=128K]
        Capabilities: [c8] Power Management version 3
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Kernel driver in use: e1000e
        Kernel modules: e1000e
 
Last edited:
  • Like
Reactions: leandrosgf
I running Linux 6.14.0-2-pve at the moment but it has same problem as 6.8.12-10 :(

I have
Code:
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (11) I219-LM
        DeviceName: Onboard Lan
        Subsystem: Hewlett-Packard Company Ethernet Connection (11) I219-LM
        Flags: bus master, fast devsel, latency 0, IRQ 125, IOMMU group 8
        Memory at e1200000 (32-bit, non-prefetchable) [size=128K]
        Capabilities: [c8] Power Management version 3
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Kernel driver in use: e1000e
        Kernel modules: e1000e
Yep, it seems to be something with all NICs using e1000e.
This is mine:

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (17) I219-LM (rev 11)
Subsystem: Dell Ethernet Connection (17) I219-LM
Flags: bus master, fast devsel, latency 0, IRQ 124, IOMMU group 9
Memory at 70500000 (32-bit, non-prefetchable) [size=128K]
Capabilities: [c8] Power Management version 3
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Kernel driver in use: e1000e
Kernel modules: e1000e

I have changed the subject of this topic to cover this better and maybe atract someone that knows what is happening and a possible fix or workaround using Kernels from 6.8.12-9 or above
 
  • Like
Reactions: Silas95
Just change the kernel didnt solve.

I found more threads and now I'm trying the ethtool to disable some offload stuff


Hones I wasnt planning to use this nic, so much trouble :(
 
My nic is l219-lm from a thinkcentre m920q and I was facing same issue. Rolled back to .8 seems to be fixed.
That was a nightmare to debug lol
I have M920q with the same onboard NIC, with 6.8.12-10, haven't seen any issues yet (running for 8-9 months, with a couple of PVE kernels).

I'm running Zabbix monitoring on that host+NIC so if there are problems with the network, I will surely see them.
 
  • Like
Reactions: fabricionaweb
I have M920q with the same onboard NIC, with 6.8.12-10, haven't seen any issues yet (running for 8-9 months, with a couple of PVE kernels).

I'm running Zabbix monitoring on that host+NIC so if there are problems with the network, I will surely see them.
Thats nice, we can see many reports here and in the other thread. Maybe is something in the setup.
Here I have a bridge and I pass the bridge down to a VM where I use it as WAN. It was most problematic to me when doing uploads.

After running the ethtool -K eno1 gso off gro off tso off tx off rx off rxvlan off txvlan off to turn off (a lot of) features on a kernel 6.8.12-8, it seems much more stable now. I spammed 100gb over iperf and so far no disconnections \o/

I will try now with less features off (just `tso off gso off gro off`) and maybe retry with the newer kernel
 
Last edited:
I will try now with less features off (just `tso off gso off gro off`) and maybe retry with the newer kernel

I tried and it worked or at least have been worked 24 hours :)
Last 24 hours has not drop connection a connection and no problem also in system log.

My kernel is Linux 6.14.0-2-pve and I used following
Code:
ethtool -K eno1 gso off gro off tso off
 
  • Like
Reactions: fabricionaweb
I had one issue here disabling only those, I notice after a while my link went down to 100Mbps.
Im testing disabling everything now, because it can still be a fault cable for instance.
 
I had one issue here disabling only those, I notice after a while my link went down to 100Mbps.
Im testing disabling everything now, because it can still be a fault cable for instance.

Ok, for me it stayed on gigabit, just cheked from Mikrotik hAP ax^2
1746693797610.png
 
I tried and it worked or at least have been worked 24 hours :)
Last 24 hours has not drop connection a connection and no problem also in system log.

My kernel is Linux 6.14.0-2-pve and I used following
Code:
ethtool -K eno1 gso off gro off tso off
It seems to be promisse. @timnis Do you know exactly what we are disabling with this? Is your system still running fine after some days? I will try to do it and update to the latest kernel to see what happens.
 
Sorry I just saw now that you tagged timnis, my bad.

It seems to be promisse. @timnis Do you know exactly what we are disabling with this? Is your system still running fine after some days? I will try to do it and update to the latest kernel to see what happens.

They are described on the man page https://linux.die.net/man/8/ethtool (-K).
rx on|off
Specifies whether RX checksumming should be enabled.
tx on|off
Specifies whether TX checksumming should be enabled.
sg on|off
Specifies whether scatter-gather should be enabled.
tso on|off
Specifies whether TCP segmentation offload should be enabled.
ufo on|off
Specifies whether UDP fragmentation offload should be enabled
gso on|off
Specifies whether generic segmentation offload should be enabled
gro on|off
Specifies whether generic receive offload should be enabled
lro on|off
Specifies whether large receive offload should be enabled
rxvlan on|off
Specifies whether RX VLAN acceleration should be enabled
txvlan on|off
Specifies whether TX VLAN acceleration should be enabled
ntuple on|off
Specifies whether Rx ntuple filters and actions should be enabled
rxhash on|off
Specifies whether receive hashing offload should be enabled

In my case, Just by disabling tso,ufo,gso,gro it did not worked well and I still have the hang logs
May 14 12:32:02 proxmox kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:

But after disabled them all with ethtool -K eno1 tso off ufo off gso off gro off rx off rxvlan off tx off txvlan off rxhash off

It have worked much better during weeks. I added the post-up to /etc/network/interfaces

Code:
iface eno1 inet manual
        post-up /sbin/ethtool -K eno1 tso off ufo off gso off gro off rx off rxvlan off tx off txvlan off rxhash off
 
Last edited:
for a while it look liked it worked but no....

With latest kernel and ethtool -K eno1 gso off gro off tso off it worked some how. In syslog there was no error about eno1.
But I have a secondary PBS installed to VM which pulls backup from primary PSB (form another site) over OpenZiti overlay network and it worked really bad.

At the moment I'm back top kernel 6.8.12-8 and it works without any problem. When I have more time I try to test agailn with ethtool with different parameters... Or hopefully a new kernel version fi the problem :)