Hardware and Software
Symptoms
Temporary fix (what most people discover first)
Physically unplug and replug the ethernet cable. Connectivity restores within seconds.
This works because unplugging the cable forces a hardware reset of the NIC, clearing the hung state. It is not a fix -- the dropout will return.
Root cause
The Intel I218-LM NIC uses the e1000e Linux kernel driver. There is a well-documented bug where hardware offloading features cause the NIC to enter a silent hang state. The kernel logs this as:
Check for this after a dropout with:
The NIC continues to report itself as UP and the link light stays on, which makes this extremely difficult to diagnose. The ARP table going stale is a symptom of the underlying NIC hang, not the root cause.
This affects multiple Intel NIC models using the e1000e driver, including I217-LM, I218-LM, I219-LM and I219-V. It is not specific to any particular VM or workload.
Permanent fix
Step 1 -- Identify your physical NIC name
Note the interface that is the bridge port for vmbr0.
Step 2 -- Check offloading is currently enabled (confirms the issue applies to you)
If any entries show on, proceed.
Step 3 -- Disable offloading immediately (temporary, to test)
Replace nic0 with your interface name:
Step 4 -- Make it permanent
Edit /etc/network/interfaces:
Add a post-up line to your physical NIC stanza. The post-up method is required -- the offload-* directives do not reliably apply on boot:
Reload networking:
Verify offloading is off after reload:
All entries should show off.
Step 5 -- Fix invalid ARP responses (secondary fix)
Proxmox can also send ARP replies on the wrong interface, confusing the router. Prevent this permanently:
Result
No further network dropouts. The fix survives reboots. No performance impact was observed on a home lab running Home Assistant and other lightweight VMs.
Important AI tools warning: conflicting and wrong diagnoses -- use multiple tools and verify everything
This section is worth reading before you spend hours chasing the wrong fix.
When the symptoms of this fault were put into Microsoft Copilot, it was adamant that the router was the cause. It pointed to the DHCP reservation, the ASUS firmware, and ARP staleness on the router side as the fault. Even after being told that the problem was fixed with above, Copilot continued to insist the router was at fault and suggested router-side fixes.
This is a known risk with AI assistants - they can latch onto a plausible-sounding diagnosis early and then defend it even when contradicting evidence is provided. In networking faults especially, where symptoms like ARP staleness and dropout can have many different root causes, this kind of confirmation bias in an AI response can send you in completely the wrong direction and waste a significant amount of time.
The fault was ultimately diagnosed correctly by using Claude (Anthropic) to analyse the raw ARP table output, the interface names, the NIC hardware details, and the specific symptom of cable-replug restoring connectivity. That combination of clues pointed specifically to the e1000e hardware offloading hang rather than any router or ARP configuration issue.
The lesson here is practical:
- Proxmox VE 9.1.6
- Fujitsu T935
- Intel Corporation Ethernet Connection (3) I218-LM (rev 03)
- Kernel driver: e1000e
- Interface name: nic0 (yours may differ -- typically eno1 or enp3s0)
- Guest VM: Any (the fault is at the host NIC level)
- Router: Asus XT9 with DHCP reservation
Symptoms
- Network connectivity to and from all VMs drops intermittently -- no pattern, could be hours or days between occurrences
- The Proxmox host itself also loses connectivity during the dropout
- The physical ethernet link light stays green -- the interface shows as UP in the OS
- Running ip neigh show dev vmbr0 shows ARP entries going STALE or DELAY for the router
- No errors visible in the Proxmox web UI
- Running ping to the router or any external host fails silently
- SSH sessions drop, Home Assistant becomes unreachable, all VM traffic stops
Temporary fix (what most people discover first)
Physically unplug and replug the ethernet cable. Connectivity restores within seconds.
This works because unplugging the cable forces a hardware reset of the NIC, clearing the hung state. It is not a fix -- the dropout will return.
Root cause
The Intel I218-LM NIC uses the e1000e Linux kernel driver. There is a well-documented bug where hardware offloading features cause the NIC to enter a silent hang state. The kernel logs this as:
e1000e 0000:00:19.0 nic0: Detected Hardware Unit HangCheck for this after a dropout with:
dmesg | grep -i "hang\|e1000e" | tail -20The NIC continues to report itself as UP and the link light stays on, which makes this extremely difficult to diagnose. The ARP table going stale is a symptom of the underlying NIC hang, not the root cause.
This affects multiple Intel NIC models using the e1000e driver, including I217-LM, I218-LM, I219-LM and I219-V. It is not specific to any particular VM or workload.
Permanent fix
Step 1 -- Identify your physical NIC name
lspci | grep -i ethernetip link showNote the interface that is the bridge port for vmbr0.
Step 2 -- Check offloading is currently enabled (confirms the issue applies to you)
ethtool -k nic0 | grep -E 'tcp-seg|generic-seg|generic-receive|rx-vlan|tx-vlan|scatter'If any entries show on, proceed.
Step 3 -- Disable offloading immediately (temporary, to test)
Replace nic0 with your interface name:
ethtool -K nic0 gso off tso off rxvlan off txvlan off gro off tx off rx off sg offStep 4 -- Make it permanent
Edit /etc/network/interfaces:
nano /etc/network/interfacesAdd a post-up line to your physical NIC stanza. The post-up method is required -- the offload-* directives do not reliably apply on boot:
Code:
auto lo
iface lo inet loopback
iface nic0 inet manual
post-up ethtool -K nic0 gso off tso off rxvlan off txvlan off gro off tx off rx off sg off
auto vmbr0
iface vmbr0 inet dhcp
bridge-ports nic0
bridge-stp off
bridge-fd 0
Reload networking:
ifreload -aVerify offloading is off after reload:
ethtool -k nic0 | grep -E 'tcp-seg|generic-seg|generic-receive|rx-vlan|tx-vlan|scatter'All entries should show off.
Step 5 -- Fix invalid ARP responses (secondary fix)
Proxmox can also send ARP replies on the wrong interface, confusing the router. Prevent this permanently:
echo -e "net.ipv4.conf.all.arp_ignore=2\nnet.ipv4.conf.all.arp_announce=2" | tee /etc/sysctl.d/99-proxmox-arp.conf sysctl -p /etc/sysctl.d/99-proxmox-arp.confResult
No further network dropouts. The fix survives reboots. No performance impact was observed on a home lab running Home Assistant and other lightweight VMs.
Important AI tools warning: conflicting and wrong diagnoses -- use multiple tools and verify everything
This section is worth reading before you spend hours chasing the wrong fix.
When the symptoms of this fault were put into Microsoft Copilot, it was adamant that the router was the cause. It pointed to the DHCP reservation, the ASUS firmware, and ARP staleness on the router side as the fault. Even after being told that the problem was fixed with above, Copilot continued to insist the router was at fault and suggested router-side fixes.
This is a known risk with AI assistants - they can latch onto a plausible-sounding diagnosis early and then defend it even when contradicting evidence is provided. In networking faults especially, where symptoms like ARP staleness and dropout can have many different root causes, this kind of confirmation bias in an AI response can send you in completely the wrong direction and waste a significant amount of time.
The fault was ultimately diagnosed correctly by using Claude (Anthropic) to analyse the raw ARP table output, the interface names, the NIC hardware details, and the specific symptom of cable-replug restoring connectivity. That combination of clues pointed specifically to the e1000e hardware offloading hang rather than any router or ARP configuration issue.
The lesson here is practical:
- Do not rely on a single AI tool for complex technical diagnosis
- Provide raw command output rather than describing symptoms in plain language -- AI tools reason much more accurately from actual data
- If an AI diagnosis does not match what you are observing, try a different tool with the same data
- Cross-reference any AI suggestion against the relevant community forums (in this case the Proxmox forum, where this exact bug is documented across multiple threads)
- AI tools are genuinely useful for this kind of diagnosis but they are not infallible -- treat their output as a starting point for investigation, not a final answer