Proxmox 5.4-1 e1000e causing crash

SteveW

New Member
May 22, 2019
3
0
1
62
Hi,

Installed latest version with updates.

Server crashes every 48hrs or so also displaying the Detected Hardware Unit Hang on Intel Corporation Ethernet Connection (2) I219-LM (rev 31)

Please advise

Further details below.

Linux scrypt 4.15.18-14-pve #1 SMP PVE 4.15.18-39 (Wed, 15 May 2019 06:56:23 +0200) x86_64 GNU/Linux

proxmox-ve: 5.4-1 (running kernel: 4.15.18-14-pve)
pve-manager: 5.4-5 (running version: 5.4-5/c6fdb264)
pve-kernel-4.15: 5.4-2
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-13-pve: 4.15.18-37
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-9
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-51
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-42
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-26
pve-cluster: 5.0-37
pve-container: 2.0-37
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-20
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-2
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-51
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31)
Subsystem: Fujitsu Technology Solutions Ethernet Connection (2) I219-LM
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 142
Region 0: Memory at ef200000 (32-bit, non-prefetchable) [size=128K]
Capabilities: [c8] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00498 Data: 0000
Capabilities: [e0] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel driver in use: e1000e
Kernel modules: e1000e

May 22 06:34:15 scrypt kernel: [101650.550599] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
May 22 06:34:15 scrypt kernel: [101650.550599] TDH <6d>
May 22 06:34:15 scrypt kernel: [101650.550599] TDT <80>
May 22 06:34:15 scrypt kernel: [101650.550599] next_to_use <80>
May 22 06:34:15 scrypt kernel: [101650.550599] next_to_clean <6c>
May 22 06:34:15 scrypt kernel: [101650.550599] buffer_info[next_to_clean]:
May 22 06:34:15 scrypt kernel: [101650.550599] time_stamp <101829e46>
May 22 06:34:15 scrypt kernel: [101650.550599] next_to_watch <6d>
May 22 06:34:15 scrypt kernel: [101650.550599] jiffies <101829f60>
May 22 06:34:15 scrypt kernel: [101650.550599] next_to_watch.status <0>
May 22 06:34:15 scrypt kernel: [101650.550599] MAC Status <80083>
May 22 06:34:15 scrypt kernel: [101650.550599] PHY Status <796d>
May 22 06:34:15 scrypt kernel: [101650.550599] PHY 1000BASE-T Status <7800>
May 22 06:34:15 scrypt kernel: [101650.550599] PHY Extended Status <3000>
May 22 06:34:15 scrypt kernel: [101650.550599] PCI Status <10>
May 22 06:34:17 scrypt kernel: [101652.566531] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
May 22 06:34:17 scrypt kernel: [101652.566531] TDH <6d>
May 22 06:34:17 scrypt kernel: [101652.566531] TDT <80>
May 22 06:34:17 scrypt kernel: [101652.566531] next_to_use <80>
May 22 06:34:17 scrypt kernel: [101652.566531] next_to_clean <6c>
May 22 06:34:17 scrypt kernel: [101652.566531] buffer_info[next_to_clean]:
May 22 06:34:17 scrypt kernel: [101652.566531] time_stamp <101829e46>
May 22 06:34:17 scrypt kernel: [101652.566531] next_to_watch <6d>
May 22 06:34:17 scrypt kernel: [101652.566531] jiffies <10182a158>
May 22 06:34:17 scrypt kernel: [101652.566531] next_to_watch.status <0>
May 22 06:34:17 scrypt kernel: [101652.566531] MAC Status <80083>
May 22 06:34:17 scrypt kernel: [101652.566531] PHY Status <796d>
May 22 06:34:17 scrypt kernel: [101652.566531] PHY 1000BASE-T Status <7800>
May 22 06:34:17 scrypt kernel: [101652.566531] PHY Extended Status <3000>
May 22 06:34:17 scrypt kernel: [101652.566531] PCI Status <10>

Looks like the old problem has returned.
Server tends to last about 48hrs before crashing

Settings for enp0s31f6:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: on (auto)
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes

ethtool -K enp0s31f6 sg off tso off gro off
Cannot get device udp-fragmentation-offload settings: Operation not supported
Cannot get device udp-fragmentation-offload settings: Operation not supported
 
* Are there any BIOS/Firmware upgrades available for your hardware?
* If so please apply them - this can help resolve situations like this
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!