Problem Summary:
Two Proxmox nodes with Intel onboard NICs (I217/I219 family) suddenly began experiencing intermittent freezes, SSH drops, and bridge instability. Both nodes had been stable for years. The failures began simultaneously after recent kernel updates.Symptoms Observed:
- SSH sessions dropping
- VPN VM losing connectivity
- vmbr0 flapping or freezing
- Host temporarily unreachable
- No logs indicating a clean shutdown
- Both nodes affected identically
- Only the onboard NIC (e1000e driver) showed issues
- PCIe NICs remained stable
Root Cause:
A regression in the Intel e1000e driver introduced instability in the onboard NICs. Even when not actively carrying traffic, the NIC could hang internally and lock the kernel’s network stack.Both nodes shared the same NIC family and kernel version, so both were affected at the same time.
Solution:
- Migrated vmbr0 off the onboard NIC to a stable PCIe NIC
- Removed the failing NIC from the bridge
- Blacklisted the e1000e driver:
Code
echo "blacklist e1000e" > /etc/modprobe.d/blacklist-eno1.conf<br>update-initramfs -u<br> - Rebooted the host
- Verified the driver was no longer loaded
- (Planned) Disable the onboard NIC in BIOS for permanent removal
Outcome:
- System stability fully restored
- No further freezes or network drops
- VPN VM autostarted correctly
- vmbr0 now runs exclusively on the healthy PCIe NIC
- Environment stable for 24+ hours post‑fix
Why This Matters:
Intel’s e1000e driver has a long history of regressions across kernel versions. A kernel update can destabilize previously reliable hardware. Removing the NIC from service is often safer than attempting to tune or patch around the issue.Word smithed by AI, verified results WDT