e1000 driver hang

Adding another datapoint.

Dell T5810 with a

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection I217-LM (rev 05)

Been fine for more than a year, with Linux version 6.8.4-2-pve but upgraded mid-March, and ~9 days later had the first e1001e Detected Hardware Unit Hang error - that is with Linux version 6.8.12-20-pve

I didn't investigate it at the time, just assumed a random crash, but then again another ~ 9 days later and same thing.

I've setup a cron entry - but it's pretty ugly - just tries to ping my router, and if that fails twice in a row at 15m interval, force a reboot:

Code:
*/15 * * * * root ping -c3 -W2 192.168.1.1 > /dev/null 2>&1 && rm -f /run/ping-watchdog-fail || { [ -f /run/ping-watchdog-fail ] && /usr/sbin/shutdown -r now || touch /run/ping-watchdog-fail; }

This combined with proxmox-boot-tool kernel pin 6.8.4-2-pve means I should only have one more failure with 6.8.12-* branch.

At some point I'll have to revisit newer proxmox kernel releases to see if the regression's been properly fixed.