e1000e if.6 eno1: NETDEV WATCHDOG: CPU: transit queue timed out?

grocerylist

New Member
Sep 30, 2022
8
0
1
I've had a proxmox system running for several months but just recently started finding that my VMs as well as Proxmox had become unresponsive, VMs down, hypervisor down, no response to pings. I usually run headless but connected a monitor to try to figure out what was going on. I'm getting the following errors non stop:

I searched and found a somewhat related post: https://forum.proxmox.com/threads/intel-nic-e1000e-hardware-unit-hang.106001/ where I saw that it was recommended to edit the interface and turn TSO off. I don't understand the cause of the problem, don't understand what turning TSO off but regardless I'm still having this happen. I tried a reboot and Proxmox and my VMs came back up temporarily but only for a short time, then the nic errors returned.

Can anyone help me diagnose and resolve this?

Thank you!
 
If you reboot the system and then quickly stop any VM's that have already auto-started, does the issue then still appear with just proxmox running?
Also, is there a specific reason you're running e1000e adaptors instead of VirtIO ones? VirtIO adaptors usually have better performance, except for a few edge-cases from what I've seen/read, but of course you do need to have the drivers installed (and switching to it now, depending on the OS, might require you to re-setup the IP (if set static now))
 
I don't have any VMs that auto-start. I could reboot and see what happens if I never start up my VMs.

My VMs use VirtIO adapters... I think it's the Proxmox install that (autoselects?) e1000e?

Is there a way to change/force Proxmox to setup the interface as VirtIO rather than e1000e?
 
This is the main VM I'm concerned with, showing Network Device setup as virtio.

Currently restarted without starting VMs. I wish there was an easy way to monitor/report if/when I started getting the NETDEV WATCHDOG errors. Not sure what could be causing them.

Screenshot 2024-07-08 101042.png
 
Last edited:
Just to get clarification...you are running E1000 cards on the host or on the virtual machines? If on they host are they really E1000 (very old) or are they one of the newer variants? Did you try turning off the offload as suggested (some of this hardware has features that don't work right with Linux).
 
I'm not sure how Proxmox install works, or why e1000e is setup on the host? My system has an Intel I219-LM nic. I'm happy to reinstall to correct but I don't recall during installation anywhere to change network devices?

I did turn off TSO as suggested (or I think I did, not sure if I did correctly or not).
Screenshot 2024-07-08 102332.png
 
So it is a newer variant. That's fine. The way you have turned tso off doesn't seem right. Did you verify it is really off with:

Code:
ethtool -k eno1

I'm betting it didn't really get turned off. The thread you linked to suggests something like this:

Code:
iface vmbr0 inet manual
     <other stuff>
    post-up ethtool -K eno1 tso off gso off

Please re-read the thread you linked to, especially the response from oguz (#8). After rebooting you can use ethtool again to verify that it really did turn off the offload. If it is off and it still hangs then there's more troubleshooting to do.
 
  • Like
Reactions: grocerylist
So it is a newer variant. That's fine. The way you have turned tso off doesn't seem right. Did you verify it is really off with:

Code:
ethtool -k eno1

I'm betting it didn't really get turned off. The thread you linked to suggests something like this:

Code:
iface vmbr0 inet manual
     <other stuff>
    post-up ethtool -K eno1 tso off gso off

Please re-read the thread you linked to, especially the response from oguz (#8). After rebooting you can use ethtool again to verify that it really did turn off the offload. If it is off and it still hangs then there's more troubleshooting to do.
Thanks for the help. I'm not sure with the ethtool -k eno1 output, what indicates that TSO is off or not. With the code you posted above and referenced from ogus (#8) response, after a reboot I get the following output:

I edited my interface config accordingly, also adding postup to iface eno1 inet manual:

iface lo inet loopback

iface eno1 inet manual
post-up ethtool -k eno1 tso off gso off

auto vmbr0
iface vmbr0 inet manual
bridge-ports eno1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 30,50,69

auto vmbr0.50
iface vmbr0.50 inet static
address 192.168.50.11/24
gateway 192.168.50.1
post-up ethtool -K eno1 tso off gso off


After a reboot my ethtool-k eno1 output is:
root@pve2:~# ethtool -k eno1
Features for eno1:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
tx-tcp-segmentation: off
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: off
generic-segmentation-offload: off
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off [fixed]
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-tunnel-remcsum-segmentation: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
tx-gso-list: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
rx-gro-list: off
macsec-hw-offload: off [fixed]
rx-udp-gro-forwarding: off
hsr-tag-ins-offload: off [fixed]
hsr-tag-rm-offload: off [fixed]
hsr-fwd-offload: off [fixed]
hsr-dup-offload: off [fixed]


I'll start my VM now, see how things go moving forward and update if I have anymore NETDEV WATCHDOG instances.

Thanks for the help!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!