eno1: Detected Hardware Unit Hang

amaxcz

New Member
May 12, 2013
10
0
1
Hello!

Latest proxmox.
Hetzner EX52-NVMe hardware.
restoring backup via cifs, adapter resets many times.


root@proxmox ~ # uname -a
Linux proxmox 5.0.18-1-pve #1 SMP PVE 5.0.18-3 (Thu, 8 Aug 2019 09:05:29 +0200) x86_64 GNU/Linux





===================================



[ 75.714187] ------------[ cut here ]------------
[ 75.714303] NETDEV WATCHDOG: eno1 (e1000e): transmit queue 0 timed out
[ 75.714427] WARNING: CPU: 8 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x221/0x230
[ 75.716322] Modules linked in: ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_physdev xt_addrtype xt_comment xt_multiport xt_conntrack ip_set_hash_net ip_set arc4 md4 cmac nls_utf8 cifs ccm fscache iptable_filter xt_nat xt_tcpudp iptable_nat ipt_MASQUERADE nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bpfilter softdog nfnetlink_log nfnetlink intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_cstate intel_rapl_perf joydev input_leds wmi_bmof intel_wmi_thunderbolt intel_pch_thermal mac_hid acpi_pad tcp_bbr sch_fq vhost_net vhost tap ib_iser rdma_cm iw_cm sunrpc ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov
[ 75.716352] async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 multipath linear hid_generic usbmouse usbkbd usbhid hid raid1 e1000e ahci i2c_i801 libahci wmi video pinctrl_cannonlake pinctrl_intel
[ 75.716956] CPU: 8 PID: 0 Comm: swapper/8 Not tainted 5.0.18-1-pve #1
[ 75.717066] Hardware name: Gigabyte Technology Co., Ltd. B360 HD3P-LM/B360HD3PLM-CF, BIOS F4 HZ 04/30/2019
[ 75.717198] RIP: 0010:dev_watchdog+0x221/0x230
[ 75.717307] Code: 00 49 63 4e e0 eb 92 4c 89 ef c6 05 7a a2 ef 00 01 e8 23 2b fc ff 89 d9 4c 89 ee 48 c7 c7 50 0a 5b 9e 48 89 c2 e8 11 d6 78 ff <0f> 0b eb c0 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
[ 75.717468] RSP: 0018:ffff9ef37f203e68 EFLAGS: 00010286
[ 75.717578] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[ 75.717691] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff9ef37f216440
[ 75.717804] RBP: ffff9ef37f203e98 R08: 0000000000000000 R09: 0000000000000399
[ 75.717917] R10: 0000000000000774 R11: ffff9ef37f203cb8 R12: 0000000000000001
[ 75.718031] R13: ffff9ef36b370000 R14: ffff9ef36b3704c0 R15: ffff9ef36b2d2880
[ 75.718144] FS: 0000000000000000(0000) GS:ffff9ef37f200000(0000) knlGS:0000000000000000
[ 75.718271] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 75.718382] CR2: 00007f9c25d7d000 CR3: 000000037160e002 CR4: 00000000003626e0
[ 75.718507] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 75.718619] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 75.718732] Call Trace:
[ 75.718838] <IRQ>
[ 75.718945] ? pfifo_fast_enqueue+0x120/0x120
[ 75.719054] call_timer_fn+0x30/0x130
[ 75.719161] run_timer_softirq+0x3e4/0x420
[ 75.719268] ? ktime_get+0x3c/0xa0
[ 75.719375] ? lapic_next_deadline+0x26/0x30
[ 75.719481] ? clockevents_program_event+0x93/0xf0
[ 75.719591] __do_softirq+0xdc/0x2f3
[ 75.719697] irq_exit+0xc0/0xd0
[ 75.719803] smp_apic_timer_interrupt+0x79/0x140
[ 75.719913] apic_timer_interrupt+0xf/0x20
[ 75.720018] </IRQ>
[ 75.720123] RIP: 0010:cpuidle_enter_state+0xbd/0x450
[ 75.720231] Code: ff e8 37 2b 86 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 63 03 00 00 31 ff e8 da 5a 8c ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8d 02 00 00 49 63 cd 48 8b 75 d0 48 2b 75 c8 48 8d
[ 75.720392] RSP: 0018:ffffb5fb0635be60 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 75.720520] RAX: ffff9ef37f222d80 RBX: ffffffff9e953d40 RCX: 000000000000001f
[ 75.720634] RDX: 00000011a0eab639 RSI: 000000002819aa06 RDI: 0000000000000000
[ 75.720747] RBP: ffffb5fb0635bea0 R08: 0000000000000000 R09: 0000000000022640
[ 75.720861] R10: 00000051a66394dd R11: ffff9ef37f221c04 R12: ffff9ef37f22d500
[ 75.720974] R13: 0000000000000003 R14: ffffffff9e953e78 R15: ffffffff9e953e60
[ 75.721091] cpuidle_enter+0x17/0x20
[ 75.721199] call_cpuidle+0x23/0x40
[ 75.721314] do_idle+0x23a/0x280
[ 75.721419] cpu_startup_entry+0x1d/0x20
[ 75.721526] start_secondary+0x1ab/0x200
[ 75.721633] secondary_startup_64+0xa4/0xb0
[ 75.721741] ---[ end trace d8a0eb4ade667d86 ]---
[ 75.721892] e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
[ 80.805773] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 589.311697] CIFS VFS: Cancelling wait for mid 1770 cmd: 5
[ 589.311808] CIFS VFS: Cancelling wait for mid 1771 cmd: 16
[ 589.311917] CIFS VFS: Cancelling wait for mid 1772 cmd: 6
[ 589.507096] CIFS VFS: Cancelling wait for mid 1773 cmd: 5
[ 589.507204] CIFS VFS: Cancelling wait for mid 1774 cmd: 16
[ 589.507322] CIFS VFS: Cancelling wait for mid 1775 cmd: 6
[ 591.403070] CIFS VFS: Close unmatched open
[ 591.403298] CIFS VFS: Close unmatched open
[ 1439.560547] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
TDH <21>
TDT <aa>
next_to_use <aa>
next_to_clean <20>
buffer_info[next_to_clean]:
time_stamp <10004555e>
next_to_watch <21>
jiffies <100045828>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[ 1441.576626] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
TDH <21>
TDT <aa>
next_to_use <aa>
next_to_clean <20>
buffer_info[next_to_clean]:
time_stamp <10004555e>
next_to_watch <21>
jiffies <100045a20>
next_to_watch.status <0>
MAC Status <40080083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[ 1443.592516] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:





===================================




root@proxmox ~ # pveversion --verbose
proxmox-ve: 6.0-2 (running kernel: 5.0.18-1-pve)
pve-manager: 6.0-5 (running version: 6.0-5/f8a710d7)
pve-kernel-5.0: 6.0-6
pve-kernel-helper: 6.0-6
pve-kernel-5.0.18-1-pve: 5.0.18-3
ceph-fuse: 12.2.11+dfsg1-2.1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
libjs-extjs: 6.0.1-10
libknet1: 1.10-pve2
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-3
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-7
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-63
lxcfs: 3.0.3-pve60
novnc-pve: 1.0.0-60
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-5
pve-cluster: 6.0-5
pve-container: 3.0-5
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-2
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-7
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1




root@proxmox ~ # lshw -short
H/W path Device Class Description
===================================================
system B360 HD3P-LM (Default string)
/0 bus B360HD3PLM-CF
/0/0 memory 64KiB BIOS
/0/3a memory 64GiB System Memory
/0/3a/0 memory 16GiB DIMM DDR4 Synchronous 2666 MHz (0.4 ns)
/0/3a/1 memory 16GiB DIMM DDR4 Synchronous 2666 MHz (0.4 ns)
/0/3a/2 memory 16GiB DIMM DDR4 Synchronous 2666 MHz (0.4 ns)
/0/3a/3 memory 16GiB DIMM DDR4 Synchronous 2666 MHz (0.4 ns)
/0/44 memory 384KiB L1 cache
/0/45 memory 1536KiB L2 cache
/0/46 memory 12MiB L3 cache
/0/47 processor Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
/0/1 generic
/0/100 bridge Intel Corporation
/0/100/1 bridge Skylake PCIe Controller (x16)
/0/100/1/0 storage Samsung Electronics Co Ltd
/0/100/2 display Intel Corporation
/0/100/12 generic Intel Corporation
/0/100/14 bus Intel Corporation
/0/100/14/0 usb1 bus xHCI Host Controller
/0/100/14/0/8 input PS2toUSB Adapter
/0/100/14/1 usb2 bus xHCI Host Controller
/0/100/14.2 memory RAM memory
/0/100/16 communication Intel Corporation
/0/100/17 storage Intel Corporation
/0/100/1b bridge Intel Corporation
/0/100/1b/0 storage Samsung Electronics Co Ltd
/0/100/1d bridge Intel Corporation
/0/100/1f bridge Intel Corporation
/0/100/1f.4 bus Intel Corporation
/0/100/1f.5 bus Intel Corporation
/0/100/1f.6 eno1 network Intel Corporation
/0/2 system PnP device PNP0c02
/0/3 system PnP device PNP0c02
/0/4 system PnP device PNP0b00
/0/5 generic PnP device INT3f0d
/0/6 system PnP device PNP0c02
/0/7 system PnP device PNP0c02
/0/8 system PnP device PNP0c02
/0/9 system PnP device PNP0c02
/1 power To Be Filled By O.E.M.
 
what should I try?
- ethtool -K eno1 gso off gro off tso off
- kernel option pcie_aspm=off
- kernel option intel_idle.max_cstate=0 processor.max_cstate=1
 
Hetzner EX52-NVMe hardware.

Just to note, this is desktop hardware and you will probably never get server grade stability and performance from these desktop computing components, not build for 24/7 server loads and not tested to run with Linux from the vendor (Gigabyte lists only Win10 for this mainboard).

I really recommend to use server grade components, there are reasons why server hardware is more expensive.
 
I really recommend to use server grade components, there are reasons why server hardware is more expensive.

sure, but for the cluster - its great, cheap and pretty fast. why not?
but, anyway, I had similar problems even with server hardware, like DELL or HP's.

so, ideas ? =)
 
Hi all.
Having similar situation EX62-NVMe from Hetzner sometimes giving me such error in log files:
https://pastebin.com/Vbd6C0ns
After that - server freezes, and I need to reboot.

pveversion
pve-manager/6.0-7/28984024 (running kernel: 5.0.21-1-pve)

Proxmox was installed on top of Debian 10 distro installed using Hetzner Rescue-System
 
Same issue here with a Lenovo TS430 and integrated 82576LM Adapter.

Only way to solve it is to reboot machine and then it will keep fine during a random number of days.
 
It seems that disabling TSO (tcp-segmentation-offload) and GSO (generic-segmentation-offload) works with
Code:
ethtool -K <interface> tso off gso off

Any thoughts on how to make this permanent between reboots?
 
It seems that disabling TSO (tcp-segmentation-offload) and GSO (generic-segmentation-offload) works with
Code:
ethtool -K <interface> tso off gso off

Any thoughts on how to make this permanent between reboots?
very old but making sure i help tie off this thread if someone else sees this

you got to edit your interfaces file on your proxmox host;
Code:
nano /etc/network/interfaces

and add this to the bottom away from the other lines (BUT CHANGE YOUR NETWORK "eno2":
Code:
post-up /sbin/ethtool -K eno2 tso off gso off gro off

reference
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!