Trap error on e1000 network adapter

frankz

Member
Nov 16, 2020
360
23
23
Hello everyone, I realized that I often see trap errors on Intel card:
Pc DELL T40


[394961.232725] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
TDH <9f>
TDT <df>
next_to_use <df>
next_to_clean <9e>
buffer_info[next_to_clean]:
time_stamp <105e1836d>
next_to_watch <9f>
jiffies <105e185e0>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
 
hi,

could you post the output from the following commands (you can use code tags for better readability):
* pveversion -v
* lspci -vk | grep e1000 -C 10
* dmesg > dmesg.txt and attach it here
* dmidecode -t bios

and also there is a known issue with this driver and a workaround [0] posted some time ago, might be worth a shot to try that out too ;)

[0]: https://forum.proxmox.com/threads/e1000-driver-hang.58284/page-4#post-302307
 
hi,

could you post the output from the following commands (you can use code tags for better readability):
* pveversion -v
* lspci -vk | grep e1000 -C 10
* dmesg > dmesg.txt and attach it here
* dmidecode -t bios

and also there is a known issue with this driver and a workaround [0] posted some time ago, might be worth a shot to try that out too ;)

[0]: https://forum.proxmox.com/threads/e1000-driver-hang.58284/page-4#post-302307
Here's what you asked me and above all thanks for improving Proxmox!



Code:
proxmox-ve: 7.1-1 (running kernel: 5.13.19-4-pve)
pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe)
pve-kernel-helper: 7.1-12
pve-kernel-5.13: 7.1-7
pve-kernel-5.11: 7.0-10
pve-kernel-5.4: 6.4-5
pve-kernel-5.13.19-4-pve: 5.13.19-9
pve-kernel-5.13.19-3-pve: 5.13.19-7
pve-kernel-5.13.19-2-pve: 5.13.19-4
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.4.128-1-pve: 5.4.128-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 14.2.21-1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: 0.8.36+pve1
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-3
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-6
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-5
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1










    Flags: fast devsel
    Memory at fe010000 (32-bit, non-prefetchable) [size=4K]

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
    DeviceName: Onboard - Ethernet
    Subsystem: Dell Ethernet Connection (7) I219-LM
    Flags: bus master, fast devsel, latency 0, IRQ 140
    Memory at b2100000 (32-bit, non-prefetchable) [size=128K]
    Capabilities: [c8] Power Management version 3
    Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
    Kernel driver in use: e1000e
    Kernel modules: e1000e

01:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
    Subsystem: Intel Corporation 82576 Gigabit Network Connection
    Flags: bus master, fast devsel, latency 0, IRQ 16
    Memory at b2020000 (32-bit, non-prefetchable) [size=128K]
    Memory at b1c00000 (32-bit, non-prefetchable) [size=4M]
    I/O ports at 5020 [disabled] [size=32]
    Memory at b2044000 (32-bit, non-prefetchable) [size=16K]
    Expansion ROM at b1800000 [disabled] [size=4M]
    Capabilities: [40] Power Management version 3







Getting SMBIOS data from sysfs.
SMBIOS 3.1.1 present.

Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
    Vendor: Dell Inc.
    Version: 1.2.0
    Release Date: 02/05/2020
    Address: 0xF0000
    Runtime Size: 64 kB
    ROM Size: 32 MB
    Characteristics:
        PCI is supported
        PNP is supported
        BIOS is upgradeable
        BIOS shadowing is allowed
        Boot from CD is supported
        Selectable boot is supported
        EDD is supported
        5.25"/1.2 MB floppy services are supported (int 13h)
        3.5"/720 kB floppy services are supported (int 13h)
        3.5"/2.88 MB floppy services are supported (int 13h)
        Print screen service is supported (int 5h)
        8042 keyboard services are supported (int 9h)
        Serial services are supported (int 14h)
        Printer services are supported (int 17h)
        ACPI is supported
        USB legacy is supported
        BIOS boot specification is supported
        Function key-initiated network boot is supported
        Targeted content distribution is supported
        UEFI is supported
    BIOS Revision: 1.2

Handle 0xF049, DMI type 13, 22 bytes
BIOS Language Information
    Language Description Format: Long
    Installable Languages: 2
        en|US|iso8859-1
        <BAD INDEX>
    Currently Installed Language: en|US|iso8859-1
 

Attachments

  • dmesg.txt
    81 KB · Views: 2
hi,

could you post the output from the following commands (you can use code tags for better readability):
* pveversion -v
* lspci -vk | grep e1000 -C 10
* dmesg > dmesg.txt and attach it here
* dmidecode -t bios

and also there is a known issue with this driver and a workaround [0] posted some time ago, might be worth a shot to try that out too ;)

[0]: https://forum.proxmox.com/threads/e1000-driver-hang.58284/page-4#post-302307
Same error on another node:

Code:
[14386.898326] e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
                 TDH                  <68>
                 TDT                  <a2>
                 next_to_use          <a2>
                 next_to_clean        <67>
               buffer_info[next_to_clean]:
                 time_stamp           <10035b9ff>
                 next_to_watch        <68>
                 jiffies              <10035bcb0>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[14388.882386] e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
                 TDH                  <68>
                 TDT                  <a2>
                 next_to_use          <a2>
                 next_to_clean        <67>
               buffer_info[next_to_clean]:
                 time_stamp           <10035b9ff>
                 next_to_watch        <68>
                 jiffies              <10035bea0>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[14390.902386] e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
                 TDH                  <68>
                 TDT                  <a2>
                 next_to_use          <a2>
                 next_to_clean        <67>
               buffer_info[next_to_clean]:
                 time_stamp           <10035b9ff>
                 next_to_watch        <68>
                 jiffies              <10035c099>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[14392.882387] e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
                 TDH                  <68>
                 TDT                  <a2>
                 next_to_use          <a2>
                 next_to_clean        <67>
               buffer_info[next_to_clean]:
                 time_stamp           <10035b9ff>
                 next_to_watch        <68>
                 jiffies              <10035c288>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[14393.874132] ------------[ cut here ]------------
[14393.874134] NETDEV WATCHDOG: enp0s25 (e1000e): transmit queue 0 timed out
[14393.874147] WARNING: CPU: 0 PID: 3403 at net/sched/sch_generic.c:467 dev_watchdog+0x24c/0x250
[14393.874152] Modules linked in: tcp_diag inet_diag binfmt_misc veth nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache netfs ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sctp ip6_udp_tunnel udp_tunnel iptable_filter bpfilter softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp rtl8xxxu kvm_intel snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic ledtrig_audio kvm irqbypass rtl8192cu snd_hda_intel crct10dif_pclmul i915 snd_intel_dspcfg rtl_usb ghash_clmulni_intel snd_intel_sdw_acpi aesni_intel mt7601u rtl8192c_common snd_hda_codec ppdev crypto_simd rtlwifi cryptd mei_hdcp snd_hda_core drm_kms_helper mac80211 cec snd_hwdep rapl rc_core intel_cstate i2c_algo_bit snd_pcm cfg80211 fb_sys_fops snd_timer syscopyarea pcspkr efi_pstore intel_pch_thermal snd libarc4 sysfillrect sysimgblt soundcore at24 mei_me mei parport_pc fujitsu_laptop parport sparse_keymap mac_hid
[14393.874194]  tpm_infineon zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc drm ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c i2c_i801 i2c_smbus crc32_pclmul ahci libahci lpc_ich r8169 realtek ehci_pci xhci_pci ehci_hcd xhci_pci_renesas e1000e xhci_hcd tg3 video
[14393.874221] CPU: 0 PID: 3403 Comm: kvm Tainted: P           O      5.13.19-4-pve #1
[14393.874223] Hardware name: FUJITSU ESPRIMO P720/D3221-A1, BIOS V4.6.5.4 R1.34.0 for D3221-A1x 01/08/2015
[14393.874224] RIP: 0010:dev_watchdog+0x24c/0x250
[14393.874226] Code: ba 26 fd ff eb ab 4c 89 ff c6 05 65 fd 4f 01 01 e8 a9 ef f9 ff 44 89 e9 4c 89 fe 48 c7 c7 f8 d2 c8 a6 48 89 c2 e8 2c f9 19 00 <0f> 0b eb 8c 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 d7 41 56 4d 89
[14393.874228] RSP: 0018:ffffbfcb00003e80 EFLAGS: 00010282
[14393.874229] RAX: 0000000000000000 RBX: ffff9d300e5ce000 RCX: ffff9d36fe2209c8
[14393.874230] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9d36fe2209c0
[14393.874231] RBP: ffffbfcb00003eb0 R08: 0000000000000000 R09: ffffbfcb00003c60
[14393.874232] R10: ffffbfcb00003c58 R11: ffffffffa7355428 R12: ffff9d300e5ce080
[14393.874233] R13: 0000000000000000 R14: ffff9d300f1dc480 R15: ffff9d300f1dc000
[14393.874233] FS:  00007fa72ffff700(0000) GS:ffff9d36fe200000(0000) knlGS:00000005d4e46000
[14393.874235] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14393.874236] CR2: ffffa0034f9de890 CR3: 000000019c84c004 CR4: 00000000001726f0
[14393.874237] Call Trace:
[14393.874238]  <IRQ>
[14393.874240]  ? pfifo_fast_enqueue+0x150/0x150
[14393.874242]  call_timer_fn+0x2e/0x100
[14393.874246]  __run_timers.part.0+0x1d8/0x250
[14393.874248]  ? ktime_get+0x3e/0xa0
[14393.874250]  ? lapic_next_event+0x21/0x30
[14393.874254]  ? clockevents_program_event+0x8f/0xe0
[14393.874257]  run_timer_softirq+0x2a/0x50
[14393.874259]  __do_softirq+0xce/0x281
[14393.874262]  irq_exit_rcu+0xa2/0xd0
[14393.874265]  sysvec_apic_timer_interrupt+0x7c/0x90
[14393.874267]  </IRQ>
[14393.874267]  <TASK>
[14393.874268]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[14393.874270] RIP: 0010:vmx_do_interrupt_nmi_irqoff+0x13/0x20 [kvm_intel]
[14393.874278] Code: 5a 41 59 41 58 5e 5f 5a 59 58 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 e4 f0 6a 18 55 9c 6a 10 e8 ed a5 0f e5 <48> 89 ec 5d c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 49 89 f8
[14393.874279] RSP: 0018:ffffbfcb0aed3d50 EFLAGS: 00000086
[14393.874280] RAX: 0000000000000d70 RBX: ffff9d30a796cc80 RCX: 00000000d2e7b2a9
[14393.874281] RDX: ffffffff00000000 RSI: ffffffd1094db7de RDI: ffffffffa6400d70
[14393.874282] RBP: ffffbfcb0aed3d50 R08: 0000000000000000 R09: 0000000000000000
[14393.874283] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000800000ec
[14393.874283] R13: 0000000000000000 R14: ffffbfcb0abf63e0 R15: ffff9d30a796ccb8
[14393.874285]  ? asm_sysvec_spurious_apic_interrupt+0x20/0x20
[14393.874287]  vmx_handle_exit_irqoff+0xfb/0x260 [kvm_intel]
[14393.874292]  kvm_arch_vcpu_ioctl_run+0xbac/0x16f0 [kvm]
[14393.874331]  ? kvm_vcpu_ioctl+0x2ef/0x5f0 [kvm]
[14393.874349]  kvm_vcpu_ioctl+0x247/0x5f0 [kvm]
[14393.874367]  ? kvm_on_user_return+0x63/0xa0 [kvm]
[14393.874391]  ? __fget_files+0xa3/0xd0
[14393.874393]  __x64_sys_ioctl+0x91/0xc0
[14393.874395]  do_syscall_64+0x61/0xb0
[14393.874398]  ? do_syscall_64+0x6e/0xb0
[14393.874400]  ? do_syscall_64+0x6e/0xb0
[14393.874401]  ? do_syscall_64+0x6e/0xb0
[14393.874403]  ? do_syscall_64+0x6e/0xb0
[14393.874405]  ? common_interrupt+0x55/0xa0
[14393.874407]  ? asm_common_interrupt+0x8/0x40
[14393.874409]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[14393.874411] RIP: 0033:0x7fa740720cc7
[14393.874412] Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 91 0c 00 f7 d8 64 89 01 48
[14393.874414] RSP: 002b:00007fa72fffa3c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[14393.874415] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fa740720cc7
[14393.874416] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001a
[14393.874417] RBP: 0000557bd4c90690 R08: 0000557bd2398d38 R09: 00000000ffffffff
[14393.874417] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[14393.874418] R13: 0000557bd27f41c0 R14: 0000000000000000 R15: 0000000000000000
[14393.874420]  </TASK>
[14393.874420] ---[ end trace 0fb7c3cf591448d2 ]---
[14393.874433] e1000e 0000:00:19.0 enp0s25: Reset adapter unexpectedly
 
thanks for the outputs :)

have you tried the workaround in the linked thread above?
Code:
ethtool -K <device name> gso off gro off tso off tx off rx off

replace <device name> with enp0s25 or eno1 depending on your network interface's name on your nodes (you can check with ip a)
 
thanks for the outputs :)

have you tried the workaround in the linked thread above?
Code:
ethtool -K <device name> gso off gro off tso off tx off rx off

replace <device name> with enp0s25 or eno1 depending on your network interface's name on your nodes (you can check with ip a)
Hi, I haven't tried yet as I right to ask you that the machine is in production, so if such a procedure with ethtool could prevent its operation.
 
Hi, I haven't tried yet as I right to ask you that the machine is in production, so if such a procedure with ethtool could prevent its operation.
I just performed what you asked me, no messages from the Proxmox system, so I guess I have to wait. Also, if it works, how can we make it permanently?
 
I just performed what you asked me, no messages from the Proxmox system, so I guess I have to wait. Also, if it works, how can we make it permanently?
great!

you can make it "permanent" by adding it as a post-up in your /etc/network/interfaces file:
Code:
iface enp0s25 inet manual
    # other configuration options here
    # post-up goes below
    post-up ethtool -K enp0s25 tso off gso off

(same for vmbr0, or whichever bridge interface that is being used with the ethernet interface)

make sure ethtool is installed on your nodes :)
 
Last edited:
great!

you can make it "permanent" by adding it as a post-up in your /etc/networ/interfaces file:
Code:
iface enp0s25 inet manual
    # other configuration options here
    # post-up goes below
    post-up ethtool -K enp0s25 tso off gso off

(same for vmbr0, or whichever bridge interface that is being used with the ethernet interface)

make sure ethtool is installed on your nodes :)
So I understand that what I did and the following highlighted by the line "<----- " :



Code:
auto lo
iface lo inet loopback

iface eno1 inet manual
post-up ethtool -K eno1 tso off gso off    <------------------------------

iface enp1s0f0 inet manual

iface enp1s0f1 inet manual

iface enp2s0 inet manual

iface enp4s0 inet manual

auto vmbr0
iface vmbr0 inet static
    address 192.168.3.33/24
    gateway 192.168.3.2
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
    hwaddress a4:bb:6d:56:39:5a
#LAN

auto vmbr1
iface vmbr1 inet static
    address 192.168.1.33/24
    bridge-ports enp4s0
    bridge-stp off
    bridge-fd 0
    hwaddress 00:e0:4c:68:0d:c2
#WAN

auto vmbr2
iface vmbr2 inet static
    address 192.168.2.33/24
    bridge-ports enp2s0
    bridge-stp off
    bridge-fd 0
    hwaddress 00:e0:4c:68:0e:b9
#Rete NFS

auto vmbr3
iface vmbr3 inet static
    address 192.168.9.33/24
    bridge-ports enp1s0f0
    bridge-stp off
    bridge-fd 0
#Secure

auto vmbr4
iface vmbr4 inet static
    address 192.168.10.33/24
    bridge-ports enp1s0f1
    bridge-stp off
    bridge-fd 0
#DMZ
 
So I understand that what I did and the following highlighted by the line "<----- " :
yes, and also below all the options under vmbr0 (since it's set as a bridge port for eno1), so like the following:
Code:
iface eno1 inet manual
    post-up ethtool -K eno1 tso off gso off 

iface enp1s0f0 inet manual

iface enp1s0f1 inet manual

iface enp2s0 inet manual

iface enp4s0 inet manual

auto vmbr0
iface vmbr0 inet static
    address 192.168.3.33/24
    gateway 192.168.3.2
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
    hwaddress a4:bb:6d:56:39:5a
    post-up ethtool -K eno1 tso off gso off
#LAN
 
yes, and also below all the options under vmbr0 (since it's set as a bridge port for eno1), so like the following:
Code:
iface eno1 inet manual
    post-up ethtool -K eno1 tso off gso off

iface enp1s0f0 inet manual

iface enp1s0f1 inet manual

iface enp2s0 inet manual

iface enp4s0 inet manual

auto vmbr0
iface vmbr0 inet static
    address 192.168.3.33/24
    gateway 192.168.3.2
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094
    hwaddress a4:bb:6d:56:39:5a
    post-up ethtool -K eno1 tso off gso off
#LAN
Excellent, so the same instruction for both the nic and the vmbr0. Thank you for being quick in your reply. I hope this will help other users and improve "Proxmox".
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!