[SOLVED] Intel NIC e1000e hardware unit hang

So I'm planning to rebuild my Home Assistant server over the weekend, using Proxmox so that I can build HAOS in a VM and be considered "supported" by HA (today I run HA supervised, but it's fallen out of favour for prod builds). My HA server is an Intel NUC (NUC8I3BEH 8th Gen Core i3). The NIC shows up as:

Code:
% lspci -v | grep Ethernet
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (6) I219-V (rev 30)
    Subsystem: Intel Corporation Ethernet Connection (6) I219-V

I assume, therefore, that I am going to join the club of e1000e driver hangs, which doesn't fill me with joy (the NIC has been 100% stable running bare metal for the last 6 years).

My question: can I avoid this grief if I use the VirtIO network driver instead of Intel E1000? I'm not doing anything fancy with networking - no VLANs or weird IP config. I'll just have separate IP addresses for the host and each of the VM guests. I do have Gbps fibre from my ISP, so would like to maintain full Gbps performance.
That is a very good question, and my testing so far indicates the wisdom that led me to consider this is backwards. Assigning the VM to use an e1000e nic made all the problems go away for me. I'll do further testing later. But using the VirtIO option seems to me to be what was causing the crashes. Although I did disable TSO as well, curious if you would experience no problems just using the e1000e nic without doing any further changes. Please let us know!
 
OK - that's super interesting. And I've twigged that perhaps I've missed the point a bit. I was thinking of this as a problem that materialised on VM guests, but now I'm thinking the workaround (disabling GSO / TSO) is probably applied to the host, not the guests. This means the problem must be a complex interaction between config of the interface on the host and network driver chosen for the guests. Ouch.

In any case, I'll certainly share my experience. Although I have to confess I am considering alternatives to Proxmox, if it means I can avoid this issue. It occurs to me that Proxmox is arguably far more capable than I need (two VM guests with super-vanilla config).
 
OK - that's super interesting. And I've twigged that perhaps I've missed the point a bit. I was thinking of this as a problem that materialised on VM guests, but now I'm thinking the workaround (disabling GSO / TSO) is probably applied to the host, not the guests. This means the problem must be a complex interaction between config of the interface on the host and network driver chosen for the guests. Ouch.

In any case, I'll certainly share my experience. Although I have to confess I am considering alternatives to Proxmox, if it means I can avoid this issue. It occurs to me that Proxmox is arguably far more capable than I need (two VM guests with super-vanilla config).
Don't let me talk you out of it. Two hours of iperf3 so far without issues. And just now, I edited the config file to be as the last poster suggested, or atleast similar, and I'm going to test it again. I'm now testing:


Code:
auto lo
iface lo inet loopback

iface eno1 inet manual
    # Disable offloads for e1000e driver issues
    # post-up /sbin/ethtool -K eno1 tso off gso off gro off
    post-up ethtool -K eno1 tso off

auto vmbr0
iface vmbr0 inet static
    address 10.10.10.1/24
    gateway 192.168.1.1
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0
    post-up ethtool -K vmbr0 tso off

# iface wls3f3 inet manual
# Wireless interface (Wi-Fi client)
auto wls3f3
iface wls3f3 inet dhcp
    wpa-ssid "backup"
    wpa-psk "backup"


source /etc/network/interfaces.d/*

and I'm not experiencing the same issues so far. I'll let it run Iperf3 overnight and see what happens.
 
I am stable with all updates installed. At least for the moment. I am using the offloading fix.
But to be sure, I bought two of these gadgets: https://store-eu.gl-inet.com/products/comet-gl-rm1-remote-keyboard-video-mouse
(They have a global and a US store)
Nice but can we be certain there is no hidden backdoor (even without using the cloud option)? Is the software open source and can it be checked by a community and compiled yourself? If I would use this, I would lock it down extremely tight. No traffic to the internet, for instance. I would make sure on my router/fw/gateway it only reacts to a few internal IP addresses, including a few that get handed out when I connect with VPN to my network (for remote access). Opening up physical access to the outside world is a really, really, REALLY, creepy thing, however nice it is.

And even locked down, what if I connect using a web interface and that web interface that is running on the client machine connects to the outside world as well through whatever is on the web page getting around my nicely locked down setup...?
 
Last edited: