kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:

md127

New Member
Feb 10, 2022
21
2
1
124
Error message on syslog when I switch on a VM. Also appearing randomly often leading to crashing the whole PVE.

Code:
Feb 24 16:21:34 snuc2 kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <a>
  TDT                  <6a>
  next_to_use          <6a>
  next_to_clean        <a>
buffer_info[next_to_clean]:
  time_stamp           <1012d10a7>
  next_to_watch        <b>
  jiffies              <1012d12f8>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 24 16:21:36 snuc2 kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <a>
  TDT                  <6a>
  next_to_use          <6a>
  next_to_clean        <a>
buffer_info[next_to_clean]:
  time_stamp           <1012d10a7>
  next_to_watch        <b>
  jiffies              <1012d14f1>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 24 16:21:38 snuc2 kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <a>
  TDT                  <6a>
  next_to_use          <6a>
  next_to_clean        <a>
buffer_info[next_to_clean]:
  time_stamp           <1012d10a7>
  next_to_watch        <b>
  jiffies              <1012d16e0>
  next_to_watch.status <0>
MAC Status             <40080083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
Feb 24 16:21:38 snuc2 kernel: ------------[ cut here ]------------
Feb 24 16:21:38 snuc2 kernel: NETDEV WATCHDOG: eno1 (e1000e): transmit queue 0 timed out
Feb 24 16:21:38 snuc2 kernel: WARNING: CPU: 5 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x24c/0x250
Feb 24 16:21:38 snuc2 kernel: Modules linked in: veth rpcsec_gss_krb5 nfsv4 nfs fscache netfs tcp_diag inet_diag vfio_pci vfio_virqfd vfio_iommu_type1 vfio ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd i915 cryptd rapl mei_hdcp drm_kms_helper intel_cstate cec intel_wmi_thunderbolt wmi_bmof efi_pstore rc_core pcspkr ee1004 i2c_algo_bit fb_sys_fops mei_me syscopyarea sysfillrect joydev sysimgblt input_leds mei intel_pch_thermal mac_hid acpi_pad zfs(PO) acpi_tad zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi nfsd scsi_transport_iscsi auth_rpcgss nfs_acl overlay lockd grace drm sunrpc ip_tables x_tables
Feb 24 16:21:38 snuc2 kernel:  autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c uas usb_storage hid_generic usbkbd usbhid hid crc32_pclmul i2c_i801 sdhci_pci xhci_pci xhci_pci_renesas cqhci e1000e i2c_smbus sdhci thunderbolt intel_lpss_pci ahci intel_lpss xhci_hcd libahci idma64 wmi video pinctrl_cannonlake
Feb 24 16:21:38 snuc2 kernel: CPU: 5 PID: 0 Comm: swapper/5 Tainted: P           O      5.13.19-3-pve #1
Feb 24 16:21:38 snuc2 kernel: Hardware name: Intel(R) Client Systems NUC10i7FNH/NUC10i7FNB, BIOS FNCML357.0055.2021.1202.1748 12/02/2021
Feb 24 16:21:38 snuc2 kernel: RIP: 0010:dev_watchdog+0x24c/0x250
Feb 24 16:21:38 snuc2 kernel: Code: ba 26 fd ff eb ab 4c 89 ff c6 05 d3 00 50 01 01 e8 a9 ef f9 ff 44 89 e9 4c 89 fe 48 c7 c7 80 d1 48 9b 48 89 c2 e8 9c fd 19 00 <0f> 0b eb 8c 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 d7 41 56 4d 89
Feb 24 16:21:38 snuc2 kernel: RSP: 0018:ffffb0bf40270e80 EFLAGS: 00010282
Feb 24 16:21:38 snuc2 kernel: RAX: 0000000000000000 RBX: ffff95078f69ce00 RCX: 000000000000083f
Feb 24 16:21:38 snuc2 kernel: RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f
Feb 24 16:21:38 snuc2 kernel: RBP: ffffb0bf40270eb0 R08: 0000000000000000 R09: ffffb0bf40270c60
Feb 24 16:21:38 snuc2 kernel: R10: ffffb0bf40270c58 R11: ffffffff9bb55428 R12: ffff95078f69ce80
Feb 24 16:21:38 snuc2 kernel: R13: 0000000000000000 R14: ffff950790cf0480 R15: ffff950790cf0000
Feb 24 16:21:38 snuc2 kernel: FS:  0000000000000000(0000) GS:ffff950f21c80000(0000) knlGS:0000000000000000
Feb 24 16:21:38 snuc2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 24 16:21:38 snuc2 kernel: CR2: 00007ffa96ba5130 CR3: 000000033f610004 CR4: 00000000003726e0
Feb 24 16:21:38 snuc2 kernel: Call Trace:
Feb 24 16:21:38 snuc2 kernel:  <IRQ>
Feb 24 16:21:38 snuc2 kernel:  ? pfifo_fast_enqueue+0x150/0x150
Feb 24 16:21:38 snuc2 kernel:  call_timer_fn+0x2c/0x100
Feb 24 16:21:38 snuc2 kernel:  __run_timers.part.0+0x1d8/0x250
Feb 24 16:21:38 snuc2 kernel:  ? ktime_get+0x3b/0xa0
Feb 24 16:21:38 snuc2 kernel:  ? lapic_next_deadline+0x2c/0x40
Feb 24 16:21:38 snuc2 kernel:  ? clockevents_program_event+0x8f/0xe0
Feb 24 16:21:38 snuc2 kernel:  run_timer_softirq+0x2a/0x50
Feb 24 16:21:38 snuc2 kernel:  __do_softirq+0xcb/0x281
Feb 24 16:21:38 snuc2 kernel:  irq_exit_rcu+0xa2/0xd0
Feb 24 16:21:38 snuc2 kernel:  sysvec_apic_timer_interrupt+0x7c/0x90
Feb 24 16:21:38 snuc2 kernel:  </IRQ>
Feb 24 16:21:38 snuc2 kernel:  <TASK>
Feb 24 16:21:38 snuc2 kernel:  asm_sysvec_apic_timer_interrupt+0x12/0x20
Feb 24 16:21:38 snuc2 kernel: RIP: 0010:cpuidle_enter_state+0xcc/0x360
Feb 24 16:21:38 snuc2 kernel: Code: 3d a1 b6 8d 65 e8 b4 78 7a ff 49 89 c6 0f 1f 44 00 00 31 ff e8 55 84 7a ff 80 7d d7 00 0f 85 01 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 ff 0f 88 0d 01 00 00 49 63 cf 4c 2b 75 c8 48 8d 04 49 48 89
Feb 24 16:21:38 snuc2 kernel: RSP: 0018:ffffb0bf40133e68 EFLAGS: 00000246
Feb 24 16:21:38 snuc2 kernel: RAX: ffff950f21cb4ec0 RBX: 0000000000000003 RCX: 000000000000001f
Feb 24 16:21:38 snuc2 kernel: RDX: 0000000000000000 RSI: 000000004f9a2282 RDI: 0000000000000000
Feb 24 16:21:38 snuc2 kernel: RBP: ffffb0bf40133ea0 R08: 0000480ed1e13e13 R09: 0000000000000018
Feb 24 16:21:38 snuc2 kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffffd0bf3fa80500
Feb 24 16:21:38 snuc2 kernel: R13: ffffffff9bc50e00 R14: 0000480ed1e13e13 R15: 0000000000000003
Feb 24 16:21:38 snuc2 kernel:  ? cpuidle_enter_state+0xbb/0x360
Feb 24 16:21:38 snuc2 kernel:  cpuidle_enter+0x2e/0x40
Feb 24 16:21:38 snuc2 kernel:  do_idle+0x1ff/0x2a0
Feb 24 16:21:38 snuc2 kernel:  cpu_startup_entry+0x20/0x30
Feb 24 16:21:38 snuc2 kernel:  start_secondary+0x11f/0x160
Feb 24 16:21:38 snuc2 kernel:  secondary_startup_64_no_verify+0xc2/0xcb
Feb 24 16:21:38 snuc2 kernel:  </TASK>
Feb 24 16:21:38 snuc2 kernel: ---[ end trace c08828b0f0bda29a ]---
Feb 24 16:21:38 snuc2 kernel: e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
Feb 24 16:21:38 snuc2 kernel: vmbr0: port 1(eno1) entered disabled state
Feb 24 16:21:42 snuc2 kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Feb 24 16:21:42 snuc2 kernel: vmbr0: port 1(eno1) entered blocking state
Feb 24 16:21:42 snuc2 kernel: vmbr0: port 1(eno1) entered forwarding state
Feb 24 16:22:19 snuc2 pvedaemon[203245]: <root@pam> starting task UPID:snuc2:000342A2:0078F4F6:6217A2AB:qmstop:100:root@pam:
Feb 24 16:22:19 snuc2 pvedaemon[213666]: stop VM 100: UPID:snuc2:000342A2:0078F4F6:6217A2AB:qmstop:100:root@pam:
Feb 24 16:22:19 snuc2 kernel: vmbr0: port 4(tap100i0) entered disabled state
 
Hello,
I am not an expert, so take this comment as a best effort one.
From the information I see it seems being e problem rlated with the host ethernet interface eno1.
In case you know a specific VM triggering the problem I'd suggest you:

  • generate a bridge with no physical interface attached,
  • reconfigure the VM ethernet connected to this bridge
  • start the VM one, two, many times searching for the bug
  • in case of no relation between the VM and the hardware unit hang The cause is absolutely connected with the ethernet host.
secondly,
  • you may attach an usb ethernet interface to the host (just for testing).
  • do the same than the previous
  • in case of not having the bug the problem is with the onboard Ethernet interface: check if there are firmware/bios updates, or if you need to install some firmware package in the OS.
Appart from potential hardware-software-firmware issue: I See you are using ZFS, so I'd suggest you to avoid using ballooning in all VMs and check if this is the cause.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!