Hi,
Installed latest version with updates.
Server crashes every 48hrs or so also displaying the Detected Hardware Unit Hang on Intel Corporation Ethernet Connection (2) I219-LM (rev 31)
Please advise
Further details below.
Linux scrypt 4.15.18-14-pve #1 SMP PVE 4.15.18-39 (Wed, 15 May 2019 06:56:23 +0200) x86_64 GNU/Linux
proxmox-ve: 5.4-1 (running kernel: 4.15.18-14-pve)
pve-manager: 5.4-5 (running version: 5.4-5/c6fdb264)
pve-kernel-4.15: 5.4-2
pve-kernel-4.15.18-14-pve: 4.15.18-39
pve-kernel-4.15.18-13-pve: 4.15.18-37
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-9
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-51
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-13
libpve-storage-perl: 5.0-42
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-3
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-26
pve-cluster: 5.0-37
pve-container: 2.0-37
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-20
pve-firmware: 2.0-6
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-2
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-51
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31)
Subsystem: Fujitsu Technology Solutions Ethernet Connection (2) I219-LM
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 142
Region 0: Memory at ef200000 (32-bit, non-prefetchable) [size=128K]
Capabilities: [c8] Power Management version 3
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00498 Data: 0000
Capabilities: [e0] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel driver in use: e1000e
Kernel modules: e1000e
May 22 06:34:15 scrypt kernel: [101650.550599] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
May 22 06:34:15 scrypt kernel: [101650.550599] TDH <6d>
May 22 06:34:15 scrypt kernel: [101650.550599] TDT <80>
May 22 06:34:15 scrypt kernel: [101650.550599] next_to_use <80>
May 22 06:34:15 scrypt kernel: [101650.550599] next_to_clean <6c>
May 22 06:34:15 scrypt kernel: [101650.550599] buffer_info[next_to_clean]:
May 22 06:34:15 scrypt kernel: [101650.550599] time_stamp <101829e46>
May 22 06:34:15 scrypt kernel: [101650.550599] next_to_watch <6d>
May 22 06:34:15 scrypt kernel: [101650.550599] jiffies <101829f60>
May 22 06:34:15 scrypt kernel: [101650.550599] next_to_watch.status <0>
May 22 06:34:15 scrypt kernel: [101650.550599] MAC Status <80083>
May 22 06:34:15 scrypt kernel: [101650.550599] PHY Status <796d>
May 22 06:34:15 scrypt kernel: [101650.550599] PHY 1000BASE-T Status <7800>
May 22 06:34:15 scrypt kernel: [101650.550599] PHY Extended Status <3000>
May 22 06:34:15 scrypt kernel: [101650.550599] PCI Status <10>
May 22 06:34:17 scrypt kernel: [101652.566531] e1000e 0000:00:1f.6 enp0s31f6: Detected Hardware Unit Hang:
May 22 06:34:17 scrypt kernel: [101652.566531] TDH <6d>
May 22 06:34:17 scrypt kernel: [101652.566531] TDT <80>
May 22 06:34:17 scrypt kernel: [101652.566531] next_to_use <80>
May 22 06:34:17 scrypt kernel: [101652.566531] next_to_clean <6c>
May 22 06:34:17 scrypt kernel: [101652.566531] buffer_info[next_to_clean]:
May 22 06:34:17 scrypt kernel: [101652.566531] time_stamp <101829e46>
May 22 06:34:17 scrypt kernel: [101652.566531] next_to_watch <6d>
May 22 06:34:17 scrypt kernel: [101652.566531] jiffies <10182a158>
May 22 06:34:17 scrypt kernel: [101652.566531] next_to_watch.status <0>
May 22 06:34:17 scrypt kernel: [101652.566531] MAC Status <80083>
May 22 06:34:17 scrypt kernel: [101652.566531] PHY Status <796d>
May 22 06:34:17 scrypt kernel: [101652.566531] PHY 1000BASE-T Status <7800>
May 22 06:34:17 scrypt kernel: [101652.566531] PHY Extended Status <3000>
May 22 06:34:17 scrypt kernel: [101652.566531] PCI Status <10>
Looks like the old problem has returned.
Server tends to last about 48hrs before crashing
Settings for enp0s31f6:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: No
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: on (auto)
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000007 (7)
drv probe link
Link detected: yes
ethtool -K enp0s31f6 sg off tso off gro off
Cannot get device udp-fragmentation-offload settings: Operation not supported
Cannot get device udp-fragmentation-offload settings: Operation not supported
Please open a new thread for this, thanks!
Can you also ensure you turned off some power savement states, like PCIe ones and BIOS/EFI ones (e.g., extended C3 states)?