Intel NUC e1000 kernel panic

trunet

Active Member
Jul 12, 2018
8
1
43
44
Hello...

My first post here, I have been using proxmox for around 6 months, I have a home cluster (as a homelab) and production cluster (used for clients).

On my home cluster, I have an Intel NUC7i3BNH with BIOS BNKBL357.86A.0063.2018.0413.1542.

From time to time, the NUC network dies completely with the following kernel panic, only a reboot makes it come back:
Code:
Jul 12 03:19:04 proxmox1 kernel: [4783082.527554] vmbr0: port 1(eno1) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.527854] vmbr7: port 1(eno1.10) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.528001] vmbr6: port 1(eno1.11) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.528118] vmbr1: port 1(eno1.20) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.528173] vmbr2: port 1(eno1.21) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.528215] vmbr3: port 1(eno1.22) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.528248] vmbr4: port 1(eno1.23) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.528293] vmbr5: port 1(eno1.24) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.538201] ------------[ cut here ]------------
Jul 12 03:19:04 proxmox1 kernel: [4783082.538244] invalid opcode: 0000 [#1] SMP PTI
Jul 12 03:19:04 proxmox1 kernel: [4783082.538260] Modules linked in: tcp_diag inet_diag rpcsec_gss_krb5 nfsv4 nfsd auth_rpcgss cfg80211 nfsv3 nfs_acl nfs lockd grace fscache veth ip_set ip6table_filter ip6_tables iptable_filter 8021q garp mrp softdog nfnetlink_log nfnetlink wmi_bmof intel_wmi_thunderbolt intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc i915 aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf drm_kms_helper snd_pcm snd_timer snd drm soundcore ir_rc6_decoder pcspkr mei_me cp210x i2c_algo_bit fb_sys_fops usbserial mei syscopyarea sysfillrect sysimgblt shpchp intel_pch_thermal wmi rc_rc6_mce ir_lirc_codec lirc_dev ite_cir rc_core tpm_crb acpi_pad video mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm
Jul 12 03:19:04 proxmox1 kernel: [4783082.538495]  sunrpc ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress raid6_pq e1000e ptp pps_core i2c_i801 ahci libahci
Jul 12 03:19:04 proxmox1 kernel: [4783082.538573] CPU: 1 PID: 16446 Comm: kworker/1:1 Tainted: P        W  O     4.15.17-1-pve #1
Jul 12 03:19:04 proxmox1 kernel: [4783082.538600] Hardware name: Intel Corporation NUC7i3BNH/NUC7i3BNB, BIOS BNKBL357.86A.0063.2018.0413.1542 04/13/2018
Jul 12 03:19:04 proxmox1 kernel: [4783082.538639] Workqueue: events e1000_reset_task [e1000e]
Jul 12 03:19:04 proxmox1 kernel: [4783082.538663] RIP: 0010:e1000_flush_desc_rings+0x2da/0x2f0 [e1000e]
Jul 12 03:19:04 proxmox1 kernel: [4783082.538684] RSP: 0018:ffffa5bd2122bd70 EFLAGS: 00010212
Jul 12 03:19:04 proxmox1 kernel: [4783082.538702] RAX: 00000000000000ce RBX: ffff96b5923e48c0 RCX: 00000000000000e3
Jul 12 03:19:04 proxmox1 kernel: [4783082.538726] RDX: 00000000000000ce RSI: 0000000000000246 RDI: 0000000000000246
Jul 12 03:19:04 proxmox1 kernel: [4783082.538749] RBP: ffffa5bd2122bda8 R08: 0000000000000002 R09: ffffa5bd2122bd3c
Jul 12 03:19:04 proxmox1 kernel: [4783082.538772] R10: 00000000000000fe R11: ffff96b59dc02938 R12: 000000003103f0fa
Jul 12 03:19:04 proxmox1 kernel: [4783082.538796] R13: ffff96b5923e4d78 R14: ffff96b592a41e00 R15: 0000000004008018
Jul 12 03:19:04 proxmox1 kernel: [4783082.538819] FS:  0000000000000000(0000) GS:ffff96b5bec80000(0000) knlGS:0000000000000000
Jul 12 03:19:04 proxmox1 kernel: [4783082.538845] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 12 03:19:04 proxmox1 kernel: [4783082.538865] CR2: 00007ff4983f3008 CR3: 00000005f0c0a004 CR4: 00000000003626e0
Jul 12 03:19:04 proxmox1 kernel: [4783082.538888] Call Trace:
Jul 12 03:19:04 proxmox1 kernel: [4783082.538905]  e1000e_reset+0x4d4/0x770 [e1000e]
Jul 12 03:19:04 proxmox1 kernel: [4783082.538925]  e1000e_down+0x1e3/0x210 [e1000e]
Jul 12 03:19:04 proxmox1 kernel: [4783082.538944]  e1000e_reinit_locked+0x4c/0x70 [e1000e]
Jul 12 03:19:04 proxmox1 kernel: [4783082.538965]  e1000_reset_task+0x58/0x60 [e1000e]
Jul 12 03:19:04 proxmox1 kernel: [4783082.538983]  process_one_work+0x1e0/0x400
Jul 12 03:19:04 proxmox1 kernel: [4783082.538999]  worker_thread+0x4b/0x420
Jul 12 03:19:04 proxmox1 kernel: [4783082.539014]  kthread+0x105/0x140
Jul 12 03:19:04 proxmox1 kernel: [4783082.539027]  ? process_one_work+0x400/0x400
Jul 12 03:19:04 proxmox1 kernel: [4783082.539044]  ? kthread_create_worker_on_cpu+0x70/0x70
Jul 12 03:19:04 proxmox1 kernel: [4783082.539063]  ? do_syscall_64+0x73/0x130
Jul 12 03:19:04 proxmox1 kernel: [4783082.539078]  ? SyS_exit_group+0x14/0x20
Jul 12 03:19:04 proxmox1 kernel: [4783082.539093]  ret_from_fork+0x35/0x40
Jul 12 03:19:04 proxmox1 kernel: [4783082.539107] Code: ff ff 4c 89 ef e8 e7 fc ff ff e9 0f ff ff ff 4c 89 ef e8 da fc ff ff e9 1e fe ff ff 31 c0 45 31 e4 66 41 89 46 20 e9 76 fe ff ff <0f> 0b e8 6f ce 60 f5 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00
Jul 12 03:19:04 proxmox1 kernel: [4783082.539219] ---[ end trace 46a1a06df10b1c63 ]---

I was running pve-kernel-4.15.17-1-pve before this last panic, after the reboot, I'm running pve-kernel-4.15.17-3-pve now. I have the latest packages from the enterprise repository installed.

Any idea on what can be causing this?

Thanks,
Wagner Sartori Junior
 
I can't post url links (new user restriction)... you can check on sourceforge.net uri /p/e1000/bugs/618/
 
The link would be: https://sourceforge.net/p/e1000/bugs/618/

pve-kernel-4.15.17-1-pve had the in-tree drivers for e1000e, we went back for the out-of-tree drivers for pve-kernel-4.15.17-3-pve as there were quite a few problems reported.
For the out-of-tree drivers sourceforge is the correct place to go, for the in-tree drivers it'd be the intel-wired-lan mailing list, i.e.: https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

Now, do you still experience the problems with the new kernel?
This could also be a memory problem, maybe running memtest would be good to check for that...
 
  • Like
Reactions: trunet
The problem happens out of the blue, around once a week or so, I will have to wait to see if it happens again.

I ran a memtest when I assembled it 6 months ago, for a whole day, and everything was good. If it happens again after 4.5.17-3, I will give another shot and let you know.

Thanks for the info, let's wait and see.