Detected Hardware Unit Hang

Smooky

Active Member
Feb 13, 2019
25
1
43
43
Hallo an alle.

Ich habe vor kurzem mein System auf neue Hardware umgestellt.
Es läuft soweit auch alles prima.
Ich habe nur ein Problem mit einem Bond von 4 Ports wo ab und an mal ein Port kurzzeitig aussteigt.
Hat vielleicht jemand eine Idee was man dagegen machen kann ?
Hier mal die dazugehörige Fehlermeldung aus dem Syslog:

Code:
Nov 28 17:56:51 pve1 kernel: [158434.578384] e1000e 0000:44:00.1 enp68s0f1: Detected Hardware Unit Hang:
Nov 28 17:56:51 pve1 kernel: [158434.578384]   TDH                  <47>
Nov 28 17:56:51 pve1 kernel: [158434.578384]   TDT                  <a7>
Nov 28 17:56:51 pve1 kernel: [158434.578384]   next_to_use          <a7>
Nov 28 17:56:51 pve1 kernel: [158434.578384]   next_to_clean        <46>
Nov 28 17:56:51 pve1 kernel: [158434.578384] buffer_info[next_to_clean]:
Nov 28 17:56:51 pve1 kernel: [158434.578384]   time_stamp           <1025b398c>
Nov 28 17:56:51 pve1 kernel: [158434.578384]   next_to_watch        <47>
Nov 28 17:56:51 pve1 kernel: [158434.578384]   jiffies              <1025b3e61>
Nov 28 17:56:51 pve1 kernel: [158434.578384]   next_to_watch.status <0>
Nov 28 17:56:51 pve1 kernel: [158434.578384] MAC Status             <80383>
Nov 28 17:56:51 pve1 kernel: [158434.578384] PHY Status             <792d>
Nov 28 17:56:51 pve1 kernel: [158434.578384] PHY 1000BASE-T Status  <3800>
Nov 28 17:56:51 pve1 kernel: [158434.578384] PHY Extended Status    <3000>
Nov 28 17:56:51 pve1 kernel: [158434.578384] PCI Status             <10>
Nov 28 17:56:55 pve1 kernel: [158438.574873] e1000e 0000:44:00.1 enp68s0f1: Detected Hardware Unit Hang:
Nov 28 17:56:55 pve1 kernel: [158438.574873]   TDH                  <47>
Nov 28 17:56:55 pve1 kernel: [158438.574873]   TDT                  <a7>
Nov 28 17:56:55 pve1 kernel: [158438.574873]   next_to_use          <a7>
Nov 28 17:56:55 pve1 kernel: [158438.574873]   next_to_clean        <46>
Nov 28 17:56:55 pve1 kernel: [158438.574873] buffer_info[next_to_clean]:
Nov 28 17:56:55 pve1 kernel: [158438.574873]   time_stamp           <1025b398c>
Nov 28 17:56:55 pve1 kernel: [158438.574873]   next_to_watch        <47>
Nov 28 17:56:55 pve1 kernel: [158438.574873]   jiffies              <1025b4248>
Nov 28 17:56:55 pve1 kernel: [158438.574873]   next_to_watch.status <0>
Nov 28 17:56:55 pve1 kernel: [158438.574873] MAC Status             <80383>
Nov 28 17:56:55 pve1 kernel: [158438.574873] PHY Status             <792d>
Nov 28 17:56:55 pve1 kernel: [158438.574873] PHY 1000BASE-T Status  <3800>
Nov 28 17:56:55 pve1 kernel: [158438.574873] PHY Extended Status    <3000>
Nov 28 17:56:55 pve1 kernel: [158438.574873] PCI Status             <10>
Nov 28 17:56:55 pve1 kernel: [158438.798380] ------------[ cut here ]------------
Nov 28 17:56:55 pve1 kernel: [158438.798392] NETDEV WATCHDOG: enp68s0f1 (e1000e): transmit queue 0 timed out
Nov 28 17:56:55 pve1 kernel: [158438.798406] WARNING: CPU: 12 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x24c/0x250
Nov 28 17:56:55 pve1 kernel: [158438.798413] Modules linked in: uas usb_storage joydev ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter bonding tls softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm nouveau crct10dif_pclmul ghash_clmulni_intel video aesni_intel snd_hda_codec_hdmi drm_ttm_helper ttm crypto_simd cryptd snd_hda_intel snd_intel_dspcfg drm_kms_helper snd_intel_sdw_acpi input_leds rapl snd_hda_codec cec rc_core snd_hda_core i2c_algo_bit snd_hwdep fb_sys_fops snd_pcm syscopyarea sysfillrect snd_timer wmi_bmof sysimgblt snd pcspkr efi_pstore soundcore mxm_wmi k10temp ccp mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nct6775 hwmon_vid vfio_pci vfio_virqfd irqbypass vfio_iommu_type1 vfio drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor
Nov 28 17:56:55 pve1 kernel: [158438.798600]  zstd_compress raid6_pq hid_generic usbkbd usbmouse usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32_pclmul xhci_pci nvme xhci_pci_renesas gpio_amdpt ahci i2c_piix4 e1000e nvme_core libahci xhci_hcd wmi gpio_generic
Nov 28 17:56:55 pve1 kernel: [158438.798654] CPU: 12 PID: 0 Comm: swapper/12 Tainted: P           O      5.13.19-1-pve #1
Nov 28 17:56:55 pve1 kernel: [158438.798658] Hardware name: Micro-Star International Co., Ltd. MS-7B09/X399 SLI PLUS (MS-7B09), BIOS A.88 10/21/2021
Nov 28 17:56:55 pve1 kernel: [158438.798662] RIP: 0010:dev_watchdog+0x24c/0x250
Nov 28 17:56:55 pve1 kernel: [158438.798666] Code: 2a 27 fd ff eb ab 4c 89 ff c6 05 91 13 50 01 01 e8 39 f3 f9 ff 44 89 e9 4c 89 fe 48 c7 c7 d0 be e8 9b 48 89 c2 e8 53 f5 19 00 <0f> 0b eb 8c 0f 1f 44 00 00 55 48 89 e5 41 57 49 89 d7 41 56 4d 89
Nov 28 17:56:55 pve1 kernel: [158438.798671] RSP: 0018:ffffa8c900830e80 EFLAGS: 00010282
Nov 28 17:56:55 pve1 kernel: [158438.798678] RAX: 0000000000000000 RBX: ffff9c0420091a00 RCX: 0000000000000000
Nov 28 17:56:55 pve1 kernel: [158438.798680] RDX: ffff9c133d528640 RSI: ffff9c133d5189c0 RDI: 0000000000000300
Nov 28 17:56:55 pve1 kernel: [158438.798682] RBP: ffffa8c900830eb0 R08: 0000000000000000 R09: ffffa8c900830c60
Nov 28 17:56:55 pve1 kernel: [158438.798686] R10: ffffa8c900830c58 R11: ffff9c133d0fffe8 R12: ffff9c0420091a80
Nov 28 17:56:55 pve1 kernel: [158438.798688] R13: 0000000000000000 R14: ffff9c04285bc480 R15: ffff9c04285bc000
Nov 28 17:56:55 pve1 kernel: [158438.798691] FS:  0000000000000000(0000) GS:ffff9c133d500000(0000) knlGS:0000000000000000
Nov 28 17:56:55 pve1 kernel: [158438.798693] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 28 17:56:55 pve1 kernel: [158438.798695] CR2: 000079a6f6cb8cfc CR3: 0000000132c64000 CR4: 00000000003506e0
Nov 28 17:56:55 pve1 kernel: [158438.798698] Call Trace:
Nov 28 17:56:55 pve1 kernel: [158438.798700]  <IRQ>
Nov 28 17:56:55 pve1 kernel: [158438.798706]  ? pfifo_fast_enqueue+0x150/0x150
Nov 28 17:56:55 pve1 kernel: [158438.798709]  call_timer_fn+0x2e/0x100
Nov 28 17:56:55 pve1 kernel: [158438.798714]  __run_timers.part.0+0x1d8/0x250
Nov 28 17:56:55 pve1 kernel: [158438.798720]  ? ktime_get+0x3e/0xa0
Nov 28 17:56:55 pve1 kernel: [158438.798725]  ? lapic_next_event+0x21/0x30
Nov 28 17:56:55 pve1 kernel: [158438.798730]  ? clockevents_program_event+0x8f/0xe0
Nov 28 17:56:55 pve1 kernel: [158438.798734]  run_timer_softirq+0x2a/0x50
Nov 28 17:56:55 pve1 kernel: [158438.798739]  __do_softirq+0xce/0x281
Nov 28 17:56:55 pve1 kernel: [158438.798744]  irq_exit_rcu+0xa2/0xd0
Nov 28 17:56:55 pve1 kernel: [158438.798750]  sysvec_apic_timer_interrupt+0x7c/0x90
Nov 28 17:56:55 pve1 kernel: [158438.798756]  </IRQ>
Nov 28 17:56:55 pve1 kernel: [158438.798758]  asm_sysvec_apic_timer_interrupt+0x12/0x20
Nov 28 17:56:55 pve1 kernel: [158438.798762] RIP: 0010:cpuidle_enter_state+0xcc/0x360
Nov 28 17:56:55 pve1 kernel: [158438.798767] Code: 3d 51 48 ed 64 e8 24 8a 7a ff 49 89 c6 0f 1f 44 00 00 31 ff e8 c5 95 7a ff 80 7d d7 00 0f 85 01 01 00 00 fb 66 0f 1f 44 00 00 <45> 85 ff 0f 88 0d 01 00 00 49 63 cf 4c 2b 75 c8 48 8d 04 49 48 89
Nov 28 17:56:55 pve1 kernel: [158438.798770] RSP: 0018:ffffa8c900227e68 EFLAGS: 00000246
Nov 28 17:56:55 pve1 kernel: [158438.798774] RAX: ffff9c133d52cec0 RBX: 0000000000000002 RCX: 000000000000001f
Nov 28 17:56:55 pve1 kernel: [158438.798776] RDX: 0000000000000000 RSI: 0000000025a5a719 RDI: 0000000000000000
Nov 28 17:56:55 pve1 kernel: [158438.798779] RBP: ffffa8c900227ea0 R08: 00009019684bb1b7 R09: 0000000000000000
Nov 28 17:56:55 pve1 kernel: [158438.798781] R10: 0000000000000001 R11: 0000000000000000 R12: ffff9c04105cac00
Nov 28 17:56:55 pve1 kernel: [158438.798783] R13: ffffffff9c65dd40 R14: 00009019684bb1b7 R15: 0000000000000002
Nov 28 17:56:55 pve1 kernel: [158438.798786]  ? cpuidle_enter_state+0xbb/0x360
Nov 28 17:56:55 pve1 kernel: [158438.798790]  cpuidle_enter+0x2e/0x40
Nov 28 17:56:55 pve1 kernel: [158438.798793]  do_idle+0x1ff/0x2a0
Nov 28 17:56:55 pve1 kernel: [158438.798796]  cpu_startup_entry+0x20/0x30
Nov 28 17:56:55 pve1 kernel: [158438.798800]  start_secondary+0x11f/0x160
Nov 28 17:56:55 pve1 kernel: [158438.798802]  secondary_startup_64_no_verify+0xc2/0xcb
Nov 28 17:56:55 pve1 kernel: [158438.798808] ---[ end trace 1bb546ab7a6508fc ]---
Nov 28 17:56:55 pve1 kernel: [158438.798821] e1000e 0000:44:00.1 enp68s0f1: Reset adapter unexpectedly
Nov 28 17:56:55 pve1 kernel: [158438.970159] bond0: (slave enp68s0f1): link status definitely down, disabling slave
Nov 28 17:56:58 pve1 kernel: [158441.826993] e1000e 0000:44:00.1 enp68s0f1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Nov 28 17:56:58 pve1 kernel: [158441.922186] bond0: (slave enp68s0f1): link status definitely up, 1000 Mbps full duplex

LG
Marcel Follert
 
  • Like
Reactions: Stoiko Ivanov
Hi oguz,

thanks for the tip.
I'll give it a try and then give you feedback.

Regards

Marcel
 
Hi saeft_3004,

bisher sieht es so aus als wenn es funktioniert hat.
Der Fehler kam immer mindestens 1 mal am Tag aber jetzt nach über 24Stunden ist noch nichts wieder gekommen.

LG
Marcel