Intel NUC e1000 kernel panic

trunet

Member
Jul 12, 2018
8
1
23
43
Hello...

My first post here, I have been using proxmox for around 6 months, I have a home cluster (as a homelab) and production cluster (used for clients).

On my home cluster, I have an Intel NUC7i3BNH with BIOS BNKBL357.86A.0063.2018.0413.1542.

From time to time, the NUC network dies completely with the following kernel panic, only a reboot makes it come back:
Code:
Jul 12 03:19:04 proxmox1 kernel: [4783082.527554] vmbr0: port 1(eno1) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.527854] vmbr7: port 1(eno1.10) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.528001] vmbr6: port 1(eno1.11) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.528118] vmbr1: port 1(eno1.20) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.528173] vmbr2: port 1(eno1.21) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.528215] vmbr3: port 1(eno1.22) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.528248] vmbr4: port 1(eno1.23) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.528293] vmbr5: port 1(eno1.24) entered disabled state
Jul 12 03:19:04 proxmox1 kernel: [4783082.538201] ------------[ cut here ]------------
Jul 12 03:19:04 proxmox1 kernel: [4783082.538244] invalid opcode: 0000 [#1] SMP PTI
Jul 12 03:19:04 proxmox1 kernel: [4783082.538260] Modules linked in: tcp_diag inet_diag rpcsec_gss_krb5 nfsv4 nfsd auth_rpcgss cfg80211 nfsv3 nfs_acl nfs lockd grace fscache veth ip_set ip6table_filter ip6_tables iptable_filter 8021q garp mrp softdog nfnetlink_log nfnetlink wmi_bmof intel_wmi_thunderbolt intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc i915 aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf drm_kms_helper snd_pcm snd_timer snd drm soundcore ir_rc6_decoder pcspkr mei_me cp210x i2c_algo_bit fb_sys_fops usbserial mei syscopyarea sysfillrect sysimgblt shpchp intel_pch_thermal wmi rc_rc6_mce ir_lirc_codec lirc_dev ite_cir rc_core tpm_crb acpi_pad video mac_hid vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm
Jul 12 03:19:04 proxmox1 kernel: [4783082.538495]  sunrpc ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor zstd_compress raid6_pq e1000e ptp pps_core i2c_i801 ahci libahci
Jul 12 03:19:04 proxmox1 kernel: [4783082.538573] CPU: 1 PID: 16446 Comm: kworker/1:1 Tainted: P        W  O     4.15.17-1-pve #1
Jul 12 03:19:04 proxmox1 kernel: [4783082.538600] Hardware name: Intel Corporation NUC7i3BNH/NUC7i3BNB, BIOS BNKBL357.86A.0063.2018.0413.1542 04/13/2018
Jul 12 03:19:04 proxmox1 kernel: [4783082.538639] Workqueue: events e1000_reset_task [e1000e]
Jul 12 03:19:04 proxmox1 kernel: [4783082.538663] RIP: 0010:e1000_flush_desc_rings+0x2da/0x2f0 [e1000e]
Jul 12 03:19:04 proxmox1 kernel: [4783082.538684] RSP: 0018:ffffa5bd2122bd70 EFLAGS: 00010212
Jul 12 03:19:04 proxmox1 kernel: [4783082.538702] RAX: 00000000000000ce RBX: ffff96b5923e48c0 RCX: 00000000000000e3
Jul 12 03:19:04 proxmox1 kernel: [4783082.538726] RDX: 00000000000000ce RSI: 0000000000000246 RDI: 0000000000000246
Jul 12 03:19:04 proxmox1 kernel: [4783082.538749] RBP: ffffa5bd2122bda8 R08: 0000000000000002 R09: ffffa5bd2122bd3c
Jul 12 03:19:04 proxmox1 kernel: [4783082.538772] R10: 00000000000000fe R11: ffff96b59dc02938 R12: 000000003103f0fa
Jul 12 03:19:04 proxmox1 kernel: [4783082.538796] R13: ffff96b5923e4d78 R14: ffff96b592a41e00 R15: 0000000004008018
Jul 12 03:19:04 proxmox1 kernel: [4783082.538819] FS:  0000000000000000(0000) GS:ffff96b5bec80000(0000) knlGS:0000000000000000
Jul 12 03:19:04 proxmox1 kernel: [4783082.538845] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 12 03:19:04 proxmox1 kernel: [4783082.538865] CR2: 00007ff4983f3008 CR3: 00000005f0c0a004 CR4: 00000000003626e0
Jul 12 03:19:04 proxmox1 kernel: [4783082.538888] Call Trace:
Jul 12 03:19:04 proxmox1 kernel: [4783082.538905]  e1000e_reset+0x4d4/0x770 [e1000e]
Jul 12 03:19:04 proxmox1 kernel: [4783082.538925]  e1000e_down+0x1e3/0x210 [e1000e]
Jul 12 03:19:04 proxmox1 kernel: [4783082.538944]  e1000e_reinit_locked+0x4c/0x70 [e1000e]
Jul 12 03:19:04 proxmox1 kernel: [4783082.538965]  e1000_reset_task+0x58/0x60 [e1000e]
Jul 12 03:19:04 proxmox1 kernel: [4783082.538983]  process_one_work+0x1e0/0x400
Jul 12 03:19:04 proxmox1 kernel: [4783082.538999]  worker_thread+0x4b/0x420
Jul 12 03:19:04 proxmox1 kernel: [4783082.539014]  kthread+0x105/0x140
Jul 12 03:19:04 proxmox1 kernel: [4783082.539027]  ? process_one_work+0x400/0x400
Jul 12 03:19:04 proxmox1 kernel: [4783082.539044]  ? kthread_create_worker_on_cpu+0x70/0x70
Jul 12 03:19:04 proxmox1 kernel: [4783082.539063]  ? do_syscall_64+0x73/0x130
Jul 12 03:19:04 proxmox1 kernel: [4783082.539078]  ? SyS_exit_group+0x14/0x20
Jul 12 03:19:04 proxmox1 kernel: [4783082.539093]  ret_from_fork+0x35/0x40
Jul 12 03:19:04 proxmox1 kernel: [4783082.539107] Code: ff ff 4c 89 ef e8 e7 fc ff ff e9 0f ff ff ff 4c 89 ef e8 da fc ff ff e9 1e fe ff ff 31 c0 45 31 e4 66 41 89 46 20 e9 76 fe ff ff <0f> 0b e8 6f ce 60 f5 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00
Jul 12 03:19:04 proxmox1 kernel: [4783082.539219] ---[ end trace 46a1a06df10b1c63 ]---

I was running pve-kernel-4.15.17-1-pve before this last panic, after the reboot, I'm running pve-kernel-4.15.17-3-pve now. I have the latest packages from the enterprise repository installed.

Any idea on what can be causing this?

Thanks,
Wagner Sartori Junior
 
I can't post url links (new user restriction)... you can check on sourceforge.net uri /p/e1000/bugs/618/
 
The link would be: https://sourceforge.net/p/e1000/bugs/618/

pve-kernel-4.15.17-1-pve had the in-tree drivers for e1000e, we went back for the out-of-tree drivers for pve-kernel-4.15.17-3-pve as there were quite a few problems reported.
For the out-of-tree drivers sourceforge is the correct place to go, for the in-tree drivers it'd be the intel-wired-lan mailing list, i.e.: https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

Now, do you still experience the problems with the new kernel?
This could also be a memory problem, maybe running memtest would be good to check for that...
 
  • Like
Reactions: trunet
The problem happens out of the blue, around once a week or so, I will have to wait to see if it happens again.

I ran a memtest when I assembled it 6 months ago, for a whole day, and everything was good. If it happens again after 4.5.17-3, I will give another shot and let you know.

Thanks for the info, let's wait and see.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!