4.15 based test kernel for PVE 5.x available

Dark26

Member
Nov 27, 2017
183
10
23
44
With the kernel 4.15.17-1 , same error when trying to compile the r8168 module.
 

0xFelix

Member
Oct 25, 2017
23
2
8
27
My HP DL160 G6 has problems gettings its network up after updating to the 4.15.17-1-pve kernel.

The following lines are visible in dmesg regarding the network driver igb:

Code:
[    1.311414] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
[    1.311415] igb: Copyright (c) 2007-2014 Intel Corporation.
[    1.523149] igb 0000:05:00.0: added PHC on eth0
[    1.523151] igb 0000:05:00.0: Intel(R) Gigabit Ethernet Network Connection
[    1.523154] igb 0000:05:00.0: eth0: (PCIe:2.5Gb/s:Width x4) f4:ce:46:b2:6a:70
[    1.523157] igb 0000:05:00.0: eth0: PBA No: Unknown
[    1.523160] igb 0000:05:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
[    1.733315] igb 0000:05:00.1: added PHC on eth1
[    1.733317] igb 0000:05:00.1: Intel(R) Gigabit Ethernet Network Connection
[    1.733320] igb 0000:05:00.1: eth1: (PCIe:2.5Gb/s:Width x4) f4:ce:46:b2:6a:71
[    1.733323] igb 0000:05:00.1: eth1: PBA No: Unknown
[    1.733325] igb 0000:05:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
[    1.736769] igb 0000:05:00.1 enp5s0f1: renamed from eth1
[    1.761176] igb 0000:05:00.0 enp5s0f0: renamed from eth0
[   20.364015] igb 0000:05:00.0 enp5s0f0: PCIe link lost, device now detached
[   20.396441] igb 0000:05:00.0 enp5s0f0: failed to initialize vlan filtering on this port
[   20.420555] igb 0000:05:00.0 enp5s0f0: failed to initialize vlan filtering on this port

The host is not reachable over its network connection after booting up.
Issueing the following commands makes the network functional:

Code:
rmmod igb
modprobe igb

Any ideas what that could be?

I also found this bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1442638
 

masterdaweb

Active Member
Apr 17, 2017
87
3
28
28
We just uploaded a new 4.15 kernel (pve-kernel-4.15.17-1-pve: 4.15.17-8) to pvetest, should fix your issue.

Please test and give feedback, thx.
Hello @martin ,

I run proxmox in a Dell Poweredge 11th Generation. Last week after a kernel update, my grub menu wasn't not even shown, and I had to reinstall Proxmox. I haven't tested this fix yet.

Do you think was that related to this bug ?
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
5,497
906
163
My HP DL160 G6 has problems gettings its network up after updating to the 4.15.17-1-pve kernel.

The following lines are visible in dmesg regarding the network driver igb:

Code:
[    1.311414] igb: Intel(R) Gigabit Ethernet Network Driver - version 5.4.0-k
[    1.311415] igb: Copyright (c) 2007-2014 Intel Corporation.
[    1.523149] igb 0000:05:00.0: added PHC on eth0
[    1.523151] igb 0000:05:00.0: Intel(R) Gigabit Ethernet Network Connection
[    1.523154] igb 0000:05:00.0: eth0: (PCIe:2.5Gb/s:Width x4) f4:ce:46:b2:6a:70
[    1.523157] igb 0000:05:00.0: eth0: PBA No: Unknown
[    1.523160] igb 0000:05:00.0: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
[    1.733315] igb 0000:05:00.1: added PHC on eth1
[    1.733317] igb 0000:05:00.1: Intel(R) Gigabit Ethernet Network Connection
[    1.733320] igb 0000:05:00.1: eth1: (PCIe:2.5Gb/s:Width x4) f4:ce:46:b2:6a:71
[    1.733323] igb 0000:05:00.1: eth1: PBA No: Unknown
[    1.733325] igb 0000:05:00.1: Using MSI-X interrupts. 8 rx queue(s), 8 tx queue(s)
[    1.736769] igb 0000:05:00.1 enp5s0f1: renamed from eth1
[    1.761176] igb 0000:05:00.0 enp5s0f0: renamed from eth0
[   20.364015] igb 0000:05:00.0 enp5s0f0: PCIe link lost, device now detached
[   20.396441] igb 0000:05:00.0 enp5s0f0: failed to initialize vlan filtering on this port
[   20.420555] igb 0000:05:00.0 enp5s0f0: failed to initialize vlan filtering on this port

The host is not reachable over its network connection after booting up.
Issueing the following commands makes the network functional:

Code:
rmmod igb
modprobe igb

Any ideas what that could be?

I also found this bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1442638

which NIC do you use in that system? we cannot reproduce this here with any of our igb devices..
 

janos

Member
Aug 24, 2017
161
14
23
Hungary
which NIC do you use in that system? we cannot reproduce this here with any of our igb devices..
I have DL160 G6, it have this NIC:

Code:
05:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
05:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
 

0xFelix

Member
Oct 25, 2017
23
2
8
27
As janos already said, my DL160 also has the following NICs:

Code:
05:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
05:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
5,497
906
163
I have DL160 G6, it have this NIC:

Code:
05:00.0 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)
05:00.1 Ethernet controller: Intel Corporation 82576 Gigabit Network Connection (rev 01)

and you also have the problem @0xFelix describes? or just the same server model/NIC?
 

a_d

New Member
Jan 4, 2018
3
0
1
i'm also having boot issues on a new Dell R740.

after disabeling the quiet boot option i see the errors in the attached screenshots
(it stops for some time at the first one and then produces the stacktrace)
rpviewer-2.png

rpviewer-3.png

unfortunatly i have not been able to scroll up in the remote console sofar to see the first part of the stack trace.

with the latest 4.13 kernel, it seems to work perfectly sofar (i'm still testing the new hardware for replacement of a production cluster), however teh installer kernel had issues with the qlogic 10GBe network cards.
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
5,497
906
163
i'm also having boot issues on a new Dell R740.

after disabeling the quiet boot option i see the errors in the attached screenshots
(it stops for some time at the first one and then produces the stacktrace)
View attachment 7414

View attachment 7415

unfortunatly i have not been able to scroll up in the remote console sofar to see the first part of the stack trace.

with the latest 4.13 kernel, it seems to work perfectly sofar (i'm still testing the new hardware for replacement of a production cluster), however teh installer kernel had issues with the qlogic 10GBe network cards.

could you collect a full boot log (e.g. via serial console)?
 

rotanid

New Member
Dec 18, 2012
11
0
1
since updating to pve-kernel 4.15.17 (a week ago) we had multiple cases where the NIC driver e1000e had problems and we had to reboot the machine to get network running again. it happened on 2 different servers, multiple times on one of them.
the NIC:
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31)


the log messages which appear every few seconds again during the not-working network status:

Code:
May 12 04:25:57 vh5 kernel: [114078.129015] e1000e 0000:00:1f.6 enp0s31f6:
Detected Hardware Unit Hang:
May 12 04:25:57 vh5 kernel: [114078.129015]   TDH                  <0>
May 12 04:25:57 vh5 kernel: [114078.129015]   TDT                  <6>
May 12 04:25:57 vh5 kernel: [114078.129015]   next_to_use          <6>
May 12 04:25:57 vh5 kernel: [114078.129015]   next_to_clean        <0>
May 12 04:25:57 vh5 kernel: [114078.129015] buffer_info[next_to_clean]:
May 12 04:25:57 vh5 kernel: [114078.129015]   time_stamp           <101b204c8>
May 12 04:25:57 vh5 kernel: [114078.129015]   next_to_watch        <0>
May 12 04:25:57 vh5 kernel: [114078.129015]   jiffies              <101b20740>
May 12 04:25:57 vh5 kernel: [114078.129015]   next_to_watch.status <0>
May 12 04:25:57 vh5 kernel: [114078.129015] MAC Status             <80083>
May 12 04:25:57 vh5 kernel: [114078.129015] PHY Status             <796d>
May 12 04:25:57 vh5 kernel: [114078.129015] PHY 1000BASE-T Status  <7800>
May 12 04:25:57 vh5 kernel: [114078.129015] PHY Extended Status    <3000>
May 12 04:25:57 vh5 kernel: [114078.129015] PCI Status             <10>


sometimes also:
Code:
May 12 04:26:19 vh5 kernel: [114100.142967] e1000e 0000:00:1f.6 enp0s31f6:
Reset adapter unexpectedly
May 12 04:26:23 vh5 kernel: [114104.613728] e1000e: enp0s31f6 NIC Link is Up
1000 Mbps Full Duplex, Flow Control: Rx/Tx
 

udo

Famous Member
Apr 22, 2009
5,912
170
83
Ahrensburg; Germany
since updating to pve-kernel 4.15.17 (a week ago) we had multiple cases where the NIC driver e1000e had problems and we had to reboot the machine to get network running again. it happened on 2 different servers, multiple times on one of them.
...
sometimes also:
Code:
May 12 04:26:19 vh5 kernel: [114100.142967] e1000e 0000:00:1f.6 enp0s31f6:
Reset adapter unexpectedly
May 12 04:26:23 vh5 kernel: [114104.613728] e1000e: enp0s31f6 NIC Link is Up
1000 Mbps Full Duplex, Flow Control: Rx/Tx
Hi,
we had simmiliar with kernel 4.15.10 on one host with an e1000e-nic and changed the nic against a broadcom (and update to 4.15.17).

I tried to reproduce that in the lab, but also with heavy iperf traffic, the issue don't occur. Looks that's only see with real live traffic.

Unfortunality I have further nine hosts with e1000e-nics :(

Udo
 

rotanid

New Member
Dec 18, 2012
11
0
1
I tried to reproduce that in the lab, but also with heavy iperf traffic, the issue don't occur. Looks that's only see with real live traffic.
so far it only happened during heavy backup traffic times (BackupPC, rsync)
 

rotanid

New Member
Dec 18, 2012
11
0
1
hm, so 5.2 has been released probably without a fix for this issue with pve-kernel-4.15 ...
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
5,497
906
163
I still can't reproduce that - also with rsync (and iperf is heavy too - for hours 120MB/s in both directions).

Udo

neither can we - issues which are not reproducible are obviously a lot harder to track down and fix.
 
Jun 8, 2016
292
52
48
44
Johannesburg, South Africa
We have unfortunately started receiving feedback that some guests intermittently have network performance degradation unless we disable GRO on the physical NICs:
Code:
/etc/rc.local
  ethtool -K eth0 gro off
  ethtool -K eth1 gro off

These NICs are part of an OVS active/backup bond interface.

We furthermore, again seemingly intermittently, experience massive upload speed limits on virtuals which we did not have a problem with on the 4.13 kernel. Disabling TSO within the guest fixes the issue (speedtest script doesn't test long enough to ramp up fully. Running a Windows VM and searching for 'speed test' in Chrome yields Google's own speed test service which shows 900Mbps up and down).

Code:
[davidh@test ~]# ./speedtest_cli.py --server 1620
Retrieving speedtest.net configuration...
Retrieving speedtest.net server list...
Testing from xxxxxxxxxx (xx.xx.xx.xx)...
Hosted by yyyyyy (Johannesburg) [3.21 km]: 14.675 ms
Testing download speed........................................
Download: 686.26 Mbit/s
Testing upload speed..................................................
Upload: 1.10 Mbit/s

[davidh@test ~]# ethtool -K eth0 tso off

[davidh@test ~]# ./speedtest_cli.py --server 1620
Retrieving speedtest.net configuration...
Retrieving speedtest.net server list...
Testing from xxxxxxxxxx (xx.xx.xx.xx)...
Hosted by yyyyyy (Johannesburg) [3.21 km]: 0.856 ms
Testing download speed........................................
Download: 905.83 Mbit/s
Testing upload speed..................................................
Upload: 280.83 Mbit/s


Making individual virtual guest changes is not feasible, could you perhaps release a test 4.15 kernel with the out of tree Intel drivers so that we can ascertain whether or not this problem then disappears?

We're running with OVS and continue to experience problems after disabling TSO, GSO, GRO and LRO on the physical interfaces, bond interfaces, vmbr0 and TAPs... Is there perhaps alternatively a way to advise the guest that TSO is not available?


Working perfectly on our pure Intel S2600WT servers. The in-tree Intel drivers additionally allow us to run without disabling gro (generic receive offloading), which we had to do on every Proxmox kernel before (only appeared to affect older RHEL5 guests).
 
Nov 24, 2017
6
0
6
46
Cape Town (South Africa)
Hi,
we had simmiliar with kernel 4.15.10 on one host with an e1000e-nic and changed the nic against a broadcom (and update to 4.15.17).

I tried to reproduce that in the lab, but also with heavy iperf traffic, the issue don't occur. Looks that's only see with real live traffic.

Unfortunality I have further nine hosts with e1000e-nics :(

Udo
Same problem here, Proxmox 5.2
It happened 20 minutes after normal boot from kernel 4.13 to 4.15.17-2

At first the host machine was force reset since it lost network connection (error repeating non stop on the KVM console)

On second restart restart with kernel 4.15.17-1 the problem re-appeared shortly after
It was discovered that a guest KVM machine with heavy nginx load was generating most of the traffic
By stopping nginx on the guest KVM machine the error on the console would stop (very unresponsive network connection before stopping it) and go back to normal with no errors, starting nginx again would bring the errors again and network unresponsiveness having to stop it again for stability sake, not a nice cat and mouse game for a production machine

Finally before trying to go back to 4.13 and as suggested somewhere (can't post links) for this kind of problem, the command:

ethtool -K eth0 gso off gro off tso off

solved the issue, machine is stable for 2 days now, undergoing various other heavy traffic loads

In an identical machine with kernel 4.15.17 and different use profile (though with heavy ssh and rsync network load) the problem never arised

Errors reported by the kernel below, I wasn't allowed to post full boot log



Code:
[  766.398855] e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
                 TDH                  <fb>
                 TDT                  <1b>
                 next_to_use          <1b>
                 next_to_clean        <fa>
               buffer_info[next_to_clean]:
                 time_stamp           <10001bfa5>
                 next_to_watch        <fb>
                 jiffies              <10001c7e0>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[  766.530531] ------------[ cut here ]------------
[  766.530532] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
[  766.530542] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:323 dev_watchdog+0x222/0x230
[  766.530543] Modules linked in: ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_NFLOG xt_physdev xt_comment xt_tcpudp xt_mark xt_set xt_addrtype xt_multiport xt_conntrack ip_set_hash_net ip_set iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c softdog nfnetlink_log nfnetlink intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc i915 aesni_intel aes_x86_64 crypto_simd snd_pcm glue_helper cryptd snd_timer drm_kms_helper snd intel_cstate drm soundcore joydev input_leds i2c_algo_bit fb_sys_fops syscopyarea sysfillrect mei_me sysimgblt intel_rapl_perf
[  766.530566]  wmi mei serio_raw intel_pch_thermal video acpi_pad mac_hid pcspkr vhost_net vhost tap nfsd ib_iser auth_rpcgss nfs_acl lockd grace rdma_cm sunrpc iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress raid6_pq hid_generic usbkbd usbhid hid psmouse e1000e ptp pps_core i2c_i801 ahci libahci
[  766.530599] CPU: 1 PID: 0 Comm: swapper/1 Tainted: P           O     4.15.17-1-pve #1
[  766.530600] Hardware name: FUJITSU D3401-H1/D3401-H1, BIOS V5.0.0.11 R1.21.0 for D3401-H1x                    05/15/2017
[  766.530601] RIP: 0010:dev_watchdog+0x222/0x230
[  766.530602] RSP: 0018:ffff9fdd6e443e58 EFLAGS: 00010286
[  766.530603] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[  766.530603] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9fdd6e456490
[  766.530604] RBP: ffff9fdd6e443e88 R08: 0000000000000001 R09: 00000000000003ac
[  766.530604] R10: ffff9fdd6e443e10 R11: 00000000000003ac R12: 0000000000000001
[  766.530605] R13: ffff9fdd1dd2c000 R14: ffff9fdd1dd2c478 R15: ffff9fdd26a57880
[  766.530606] FS:  0000000000000000(0000) GS:ffff9fdd6e440000(0000) knlGS:0000000000000000
[  766.530606] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  766.530607] CR2: 000007fffffdf478 CR3: 000000051080a001 CR4: 00000000003626e0
[  766.530608] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  766.530608] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  766.530609] Call Trace:
[  766.530610]  <IRQ>
[  766.530612]  ? dev_deactivate_queue.constprop.33+0x60/0x60
[  766.530614]  call_timer_fn+0x32/0x130
[  766.530615]  run_timer_softirq+0x1dd/0x430
[  766.530616]  ? ktime_get+0x43/0xa0
[  766.530618]  __do_softirq+0x109/0x29b
[  766.530620]  irq_exit+0xb6/0xc0
[  766.530621]  smp_apic_timer_interrupt+0x71/0x130
[  766.530622]  apic_timer_interrupt+0x84/0x90
[  766.530623]  </IRQ>
[  766.530624] RIP: 0010:cpuidle_enter_state+0xa8/0x2e0
[  766.530625] RSP: 0018:ffffbbcb862d3e58 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11
[  766.530626] RAX: ffff9fdd6e4628c0 RBX: 0000000000000001 RCX: 000000000000001f
[  766.530626] RDX: 000000b278c79e3f RSI: fffffff49110c0f1 RDI: 0000000000000000
[  766.530627] RBP: ffffbbcb862d3e90 R08: 0000000000000726 R09: 0000000000000018
[  766.530628] R10: ffffbbcb862d3e28 R11: 00000000000006f9 R12: ffff9fdd6e46cc00
[  766.530628] R13: ffffffffb1571c98 R14: 000000b278c79e3f R15: ffffffffb1571c80
[  766.530630]  ? cpuidle_enter_state+0x97/0x2e0
[  766.530631]  cpuidle_enter+0x17/0x20
[  766.530632]  call_cpuidle+0x23/0x40
[  766.530633]  do_idle+0x19a/0x200
[  766.530634]  cpu_startup_entry+0x73/0x80
[  766.530636]  start_secondary+0x1a6/0x200
[  766.530637]  secondary_startup_64+0xa5/0xb0
[  766.530638] Code: 37 00 49 63 4e e8 eb 92 4c 89 ef c6 05 a6 48 d8 00 01 e8 b2 21 fd ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 68 4c 19 b1 e8 de dd 7f ff <0f> 0b eb c0 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
[  766.530657] ---[ end trace 0bd742d9a2d71859 ]---
[  766.530668] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
[  770.245169] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[  832.412731] e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
                 TDH                  <15>
                 TDT                  <53>
                 next_to_use          <53>
                 next_to_clean        <15>
               buffer_info[next_to_clean]:
                 time_stamp           <100020665>
                 next_to_watch        <16>
                 jiffies              <100020858>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[  834.400643] e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
                 TDH                  <15>
                 TDT                  <53>
                 next_to_use          <53>
                 next_to_clean        <15>
               buffer_info[next_to_clean]:
                 time_stamp           <100020665>
                 next_to_watch        <16>
                 jiffies              <100020a49>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[  836.412587] e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
                 TDH                  <15>
                 TDT                  <53>
                 next_to_use          <53>
                 next_to_clean        <15>
               buffer_info[next_to_clean]:
                 time_stamp           <100020665>
                 next_to_watch        <16>
                 jiffies              <100020c40>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[  838.396495] e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
                 TDH                  <15>
                 TDT                  <53>
                 next_to_use          <53>
                 next_to_clean        <15>
               buffer_info[next_to_clean]:
                 time_stamp           <100020665>
                 next_to_watch        <16>
                 jiffies              <100020e30>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[  840.412284] e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
                 TDH                  <15>
                 TDT                  <53>
                 next_to_use          <53>
                 next_to_clean        <15>
               buffer_info[next_to_clean]:
                 time_stamp           <100020665>
                 next_to_watch        <16>
                 jiffies              <100021028>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <3800>
               PHY Extended Status    <3000>
               PCI Status             <10>
[  840.508144] e1000e 0000:00:1f.6 eth0: Reset adapter unexpectedly
[  844.686666] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 1105.398463] e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang:
                 TDH                  <13>
                 TDT                  <3e>
                 next_to_use          <3e>
                 next_to_clean        <13>
               buffer_info[next_to_clean]:
                 time_stamp           <10003104d>
                 next_to_watch        <14>
                 jiffies              <1000312f0>
                 next_to_watch.status <0>
               MAC Status             <80083>
               PHY Status             <796d>
               PHY 1000BASE-T Status  <7800>
               PHY Extended Status    <3000>
               PCI Status             <10>
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
3,933
894
163
South Tyrol/Italy
shop.maurer-it.com
Making individual virtual guest changes is not feasible, could you perhaps release a test 4.15 kernel with the out of tree Intel drivers so that we can ascertain whether or not this problem then disappears?

Initially we did not ship them because the newest public available versions were not compatible with newer kernels, like 4.15 - also intel people said they're the same as the in-tree ones, which I really do not believe to be honest. But, in the mean time there were some new releases, the e1000e module needs still some porting to changed internal Kernel APIs, I'll try to port it and upload a 4.15 kernel with the newest out-of-tree modules to test, but that could possible take some time...

Please consider to workaround this issues using the 4.13 Kernel (which still has the out-of-tree modules built in), we still provide stable/security updates for that version for now.
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
3,933
894
163
South Tyrol/Italy
shop.maurer-it.com
We build a pve-kernel 4.15.17 kernel with the out of tree intel NIC drivers, i.e., e1000e, igb and ixgbe and uploaded a kernel and header (if you use DKMS for something) package here:
http://download.proxmox.com/temp/out-of-tree-intel-drivers/

Some testing would be really appreciated, as we could not reproduce this problem in house yet.
Download them, e.g. with wget URL confirm the checksums:

Code:
# sha1sum pve-kernel-4.15.17-3-pve_4.15.17-11_amd64.deb pve-headers-4.15.17-3-pve_4.15.17-11_amd64.deb
52269b5dc09dfbe1c8278e5a2aa48de9520b79bc  pve-headers-4.15.17-3-pve_4.15.17-11_amd64.deb
c5b46e528b86f6021458f95efdf839ea8339fe2b  pve-kernel-4.15.17-3-pve_4.15.17-11_amd64.deb

# md5sum pve-kernel-4.15.17-3-pve_4.15.17-11_amd64.deb pve-headers-4.15.17-3-pve_4.15.17-11_amd64.deb
1d23b120906f45691976ce509d112827  pve-headers-4.15.17-3-pve_4.15.17-11_amd64.deb
e1eae4572e98d3907ea5902aad90e43c  pve-kernel-4.15.17-3-pve_4.15.17-11_amd64.deb

and install them with dpkg, e.g., for installing only the kernel:
Code:
apt install ./pve-kernel-4.15.17-3-pve_4.15.17-11_amd64.deb
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!