Hi all,
I've been using Proxmox for a year now on my own home server and somewhat longer at the job.
On my homeserver however, since 8 months ago, i've been having some issues related to CPU Soft locks.
The issue is intermittent and up to yesterday, the last time was 4 months ago (august 1st and 2nd).
It seems to be related to the e100e device, see logging below.
Whenever the issue occurs all VM and LXC come to a halt eventually and a reboot (or wait roughly 30 minutes) is all that seems to fix the issue. Downside is that OPNSense stops, which results in no internet and as my OmadaSDN controller is on there as well, the WiFi does not work either. Thru the SuperMicro BMC i can access the console, but the console itself is just so slow that i prefer the reboot to waiting the 30 minutes.
Is there a known issue with the e1000e? (e.g. https://bugzilla.kernel.org/show_bug.cgi?id=218740)
I would like to upgrade to the new version 8.3, but before upgrading I want to be sure it will either solve the issue or not introduce new issues due to the current issue.
Would greatly appreciate any clues or tips on where to look further.
My setup:
I've been using Proxmox for a year now on my own home server and somewhat longer at the job.
On my homeserver however, since 8 months ago, i've been having some issues related to CPU Soft locks.
The issue is intermittent and up to yesterday, the last time was 4 months ago (august 1st and 2nd).
It seems to be related to the e100e device, see logging below.
Whenever the issue occurs all VM and LXC come to a halt eventually and a reboot (or wait roughly 30 minutes) is all that seems to fix the issue. Downside is that OPNSense stops, which results in no internet and as my OmadaSDN controller is on there as well, the WiFi does not work either. Thru the SuperMicro BMC i can access the console, but the console itself is just so slow that i prefer the reboot to waiting the 30 minutes.
Is there a known issue with the e1000e? (e.g. https://bugzilla.kernel.org/show_bug.cgi?id=218740)
I would like to upgrade to the new version 8.3, but before upgrading I want to be sure it will either solve the issue or not introduce new issues due to the current issue.
Would greatly appreciate any clues or tips on where to look further.
My setup:
Code:
Proxmox 8.2.4 with kernel 6.8.8-1 (No subscription)
CPU: Intel i3-12100
Mobo: Supermicro X13SAE-F
Memory: 2x Kingston FURY KF548C38BBK2-32
NIC: Intel X520 + Zaram SFP+ (igxbe unsupported SFP+ enabled
Storage: Lexar NM790 for Proxmox + VM
Storage: 4x Western Digital Ultrastar DC HC320 3.5" 8 TB (Passed thru to TrueNAS VM) ZFS for CIFS/SMB file shares
Code:
Jan 07 13:49:06 server kernel: ixgbe 0000:01:00.0 enp1s0f0: NIC Link is Down
Jan 07 13:49:06 server kernel: vmbr1: port 1(enp1s0f0) entered disabled state
Jan 07 13:49:10 server kernel: ixgbe 0000:01:00.0 enp1s0f0: NIC Link is Up 10 Gbps, Flow Control: RX/TX
Jan 07 13:49:10 server kernel: vmbr1: port 1(enp1s0f0) entered blocking state
Jan 07 13:49:10 server kernel: vmbr1: port 1(enp1s0f0) entered forwarding state
Jan 07 13:57:04 server kernel: igc 0000:07:00.0 eno2: NIC Link is Down
Jan 07 13:57:04 server kernel: vmbr2: port 1(eno2) entered disabled state
Jan 07 13:57:04 server kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Down
Jan 07 13:57:05 server kernel: vmbr0: port 1(eno1) entered disabled state
Jan 07 13:57:14 server pvestatd[1526]: storage 'VM-Backup' is not online
Jan 07 13:57:14 server pvestatd[1526]: status update time (5.211 seconds)
Jan 07 13:57:24 server pvestatd[1526]: storage 'VM-Backup' is not online
Jan 07 13:57:24 server pvestatd[1526]: status update time (5.192 seconds)
Jan 07 13:57:35 server pvestatd[1526]: storage 'VM-Backup' is not online
Jan 07 13:57:35 server pvestatd[1526]: status update time (5.213 seconds)
Jan 07 13:57:44 server pvestatd[1526]: storage 'VM-Backup' is not online
Jan 07 13:57:44 server pvestatd[1526]: status update time (5.219 seconds)
Jan 07 13:57:52 server pvestatd[1526]: storage 'VM-Backup' is not online
Jan 07 13:58:02 server pvestatd[1526]: storage 'VM-Backup' is not online
Jan 07 13:58:12 server pvestatd[1526]: storage 'VM-Backup' is not online
Jan 07 13:58:22 server pvestatd[1526]: storage 'VM-Backup' is not online
Jan 07 13:58:31 server pvestatd[1526]: storage 'VM-Backup' is not online
Jan 07 13:58:34 server kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
Jan 07 13:58:34 server kernel: BUG: scheduling while atomic: kworker/4:0/1725868/0x00000002
Jan 07 13:58:34 server kernel: Modules linked in: ftdi_sio usbserial cdc_acm dm_snapshot cfg80211 cmac tcp_diag inet_diag nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter nf_tables 8021q garp mrp bonding tls sunrpc nfnetlink_log binfmt_misc nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif kvm_intel kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_hda_codec_realtek snd_sof snd_hda_codec_generic snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation soundwire_bus snd_soc_core snd_compress ac97_bus
Jan 07 13:58:34 server kernel: snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi rapl mei_hdcp mei_pxp snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer cmdlinepart spi_nor wmi_bmof ast intel_cstate i2c_algo_bit snd mei_me ucsi_acpi pcspkr typec_ucsi mtd soundcore mei acpi_ipmi typec ipmi_si ipmi_devintf intel_pmc_core ipmi_msghandler intel_vsec pmt_telemetry acpi_tad acpi_pad pmt_class input_leds joydev mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq rndis_host cdc_ether usbnet mii hid_generic usbmouse usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c xhci_pci intel_lpss_pci spi_intel_pci nvme xhci_pci_renesas crc32_pclmul intel_lpss spi_intel igc nvme_core ixgbe e1000e ahci i2c_i801 xhci_hcd xfrm_algo i2c_smbus dca nvme_auth libahci idma64 mdio video wmi pinctrl_alderlake
Jan 07 13:58:34 server kernel: CPU: 4 PID: 1725868 Comm: kworker/4:0 Tainted: P O 6.8.8-1-pve #1
Jan 07 13:58:34 server kernel: Hardware name: Supermicro Super Server/X13SAE-F, BIOS 2.0a 02/17/2023
Jan 07 13:58:34 server kernel: Workqueue: events e1000_watchdog_task [e1000e]
Jan 07 13:58:34 server kernel: Call Trace:
Jan 07 13:58:34 server kernel: <TASK>
Jan 07 13:58:34 server kernel: dump_stack_lvl+0x76/0xa0
Jan 07 13:58:34 server kernel: dump_stack+0x10/0x20
Jan 07 13:58:34 server kernel: __schedule_bug+0x64/0x80
Jan 07 13:58:34 server kernel: __schedule+0x10f1/0x15e0
Jan 07 13:58:34 server kernel: ? clockevents_program_event+0xb3/0x140
Jan 07 13:58:34 server kernel: ? tick_program_event+0x43/0xa0
Jan 07 13:58:34 server kernel: ? hrtimer_reprogram+0x88/0xe0
Jan 07 13:58:34 server kernel: ? hrtimer_start_range_ns+0x138/0x390
Jan 07 13:58:34 server kernel: schedule+0x33/0x110
Jan 07 13:58:34 server kernel: schedule_hrtimeout_range_clock+0xbc/0x130
Jan 07 13:58:34 server kernel: ? __pfx_hrtimer_wakeup+0x10/0x10
Jan 07 13:58:34 server kernel: schedule_hrtimeout_range+0x13/0x30
Jan 07 13:58:34 server kernel: usleep_range_state+0x65/0xa0
Jan 07 13:58:34 server kernel: e1000e_read_phy_reg_mdic+0x98/0x2a0 [e1000e]
Jan 07 13:58:34 server kernel: e1000e_update_stats+0x52b/0x730 [e1000e]
Jan 07 13:58:34 server kernel: e1000_watchdog_task+0xf7/0xa90 [e1000e]
Jan 07 13:58:34 server kernel: process_one_work+0x16a/0x350
Jan 07 13:58:34 server kernel: worker_thread+0x306/0x440
Jan 07 13:58:34 server kernel: ? __pfx_worker_thread+0x10/0x10
Jan 07 13:58:34 server kernel: kthread+0xef/0x120
Jan 07 13:58:34 server kernel: ? __pfx_kthread+0x10/0x10
Jan 07 13:58:34 server kernel: ret_from_fork+0x44/0x70
Jan 07 13:58:34 server kernel: ? __pfx_kthread+0x10/0x10
Jan 07 13:58:34 server kernel: ret_from_fork_asm+0x1b/0x30
Jan 07 13:58:34 server kernel: </TASK>
Jan 07 13:58:34 server kernel: vmbr0: port 1(eno1) entered blocking state
Jan 07 13:58:34 server kernel: vmbr0: port 1(eno1) entered forwarding state
Jan 07 13:59:00 server kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [pvestatd:1526]
Jan 07 13:59:00 server kernel: Modules linked in: ftdi_sio usbserial cdc_acm dm_snapshot cfg80211 cmac tcp_diag inet_diag nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm ib_core cifs_md4 netfs veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter nf_tables 8021q garp mrp bonding tls sunrpc nfnetlink_log binfmt_misc nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif kvm_intel kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_hda_codec_realtek snd_sof snd_hda_codec_generic snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation soundwire_bus snd_soc_core snd_compress ac97_bus
Jan 07 13:59:00 server kernel: snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi rapl mei_hdcp mei_pxp snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer cmdlinepart spi_nor wmi_bmof ast intel_cstate i2c_algo_bit snd mei_me ucsi_acpi pcspkr typec_ucsi mtd soundcore mei acpi_ipmi typec ipmi_si ipmi_devintf intel_pmc_core ipmi_msghandler intel_vsec pmt_telemetry acpi_tad acpi_pad pmt_class input_leds joydev mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq rndis_host cdc_ether usbnet mii hid_generic usbmouse usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c xhci_pci intel_lpss_pci spi_intel_pci nvme xhci_pci_renesas crc32_pclmul intel_lpss spi_intel igc nvme_core ixgbe e1000e ahci i2c_i801 xhci_hcd xfrm_algo i2c_smbus dca nvme_auth libahci idma64 mdio video wmi pinctrl_alderlake
Jan 07 13:59:00 server kernel: CPU: 2 PID: 1526 Comm: pvestatd Tainted: P W O 6.8.8-1-pve #1
Jan 07 13:59:00 server kernel: Hardware name: Supermicro Super Server/X13SAE-F, BIOS 2.0a 02/17/2023
Jan 07 13:59:00 server kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x227/0x2d0
Jan 07 13:59:00 server kernel: Code: 41 8d 4e 01 41 c1 e5 10 c1 e1 12 44 09 e9 89 c8 c1 e8 10 66 87 43 02 89 c2 c1 e2 10 81 fa ff ff 00 00 77 37 31 d2 eb 02 f3 90 <8b> 03 66 85 c0 75 f7 89 c6 66 31 f6 39 ce 74 7e c6 03 01 48 85 d2
Jan 07 13:59:00 server kernel: RSP: 0018:ffffab6141327ad8 EFLAGS: 00000202
Jan 07 13:59:00 server kernel: RAX: 00000000000c0101 RBX: ffffa03ec90c7448 RCX: 00000000000c0000
Jan 07 13:59:00 server kernel: RDX: 0000000000000000 RSI: 0000000000000101 RDI: ffffa03ec90c7448
Jan 07 13:59:00 server kernel: RBP: ffffab6141327af8 R08: 0000000000000000 R09: 0000000000000000
Jan 07 13:59:00 server kernel: R10: 00000000000001d1 R11: 0000000001000001 R12: ffffa04dffb359c0
Jan 07 13:59:00 server kernel: R13: 0000000000000000 R14: 0000000000000002 R15: ffffa03ec31ac618
Jan 07 13:59:00 server kernel: FS: 00007bde68620740(0000) GS:ffffa04dffb00000(0000) knlGS:0000000000000000
Jan 07 13:59:00 server kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 07 13:59:00 server kernel: CR2: 000058bcbbc91848 CR3: 000000010d5c0000 CR4: 0000000000f52ef0
Jan 07 13:59:00 server kernel: PKRU: 55555554
Jan 07 13:59:00 server kernel: Call Trace:
Jan 07 13:59:00 server kernel: <IRQ>
Jan 07 13:59:00 server kernel: ? show_regs+0x6d/0x80
Jan 07 13:59:00 server kernel: ? watchdog_timer_fn+0x206/0x290
Jan 07 13:59:00 server kernel: ? __pfx_watchdog_timer_fn+0x10/0x10
Jan 07 13:59:00 server kernel: ? __hrtimer_run_queues+0x105/0x280
Jan 07 13:59:00 server kernel: ? clockevents_program_event+0xb3/0x140
Jan 07 13:59:00 server kernel: ? hrtimer_interrupt+0xf6/0x250
Jan 07 13:59:00 server kernel: ? __sysvec_apic_timer_interrupt+0x4e/0x150
Jan 07 13:59:00 server kernel: ? sysvec_apic_timer_interrupt+0x8d/0xd0
Jan 07 13:59:00 server kernel: </IRQ>
Jan 07 13:59:00 server kernel: <TASK>
Jan 07 13:59:00 server kernel: ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
Jan 07 13:59:00 server kernel: ? native_queued_spin_lock_slowpath+0x227/0x2d0
Jan 07 13:59:00 server kernel: _raw_spin_lock+0x3f/0x60
Jan 07 13:59:00 server kernel: e1000e_get_stats64+0x23/0x140 [e1000e]
Jan 07 13:59:00 server kernel: dev_get_stats+0x5e/0x120
Jan 07 13:59:00 server kernel: dev_seq_printf_stats+0x49/0x100
Jan 07 13:59:00 server kernel: dev_seq_show+0x14/0x40
Jan 07 13:59:00 server kernel: seq_read_iter+0x2c6/0x4a0
Jan 07 13:59:00 server kernel: seq_read+0x11e/0x160
Jan 07 13:59:00 server kernel: proc_reg_read+0x69/0xb0
Jan 07 13:59:00 server kernel: vfs_read+0xad/0x390
Jan 07 13:59:00 server kernel: ksys_read+0x73/0x100
Jan 07 13:59:00 server kernel: __x64_sys_read+0x19/0x30
Jan 07 13:59:00 server kernel: x64_sys_call+0x23f0/0x24b0
Jan 07 13:59:00 server kernel: do_syscall_64+0x81/0x170
Jan 07 13:59:00 server kernel: ? syscall_exit_to_user_mode+0x89/0x260
Jan 07 13:59:00 server kernel: ? do_syscall_64+0x8d/0x170
Jan 07 13:59:00 server kernel: ? irqentry_exit+0x43/0x50
Jan 07 13:59:00 server kernel: ? exc_page_fault+0x94/0x1b0
Jan 07 13:59:00 server kernel: entry_SYSCALL_64_after_hwframe+0x78/0x80
Jan 07 13:59:00 server kernel: RIP: 0033:0x7bde6875719d
Jan 07 13:59:00 server kernel: Code: 31 c0 e9 c6 fe ff ff 50 48 8d 3d 66 54 0a 00 e8 49 ff 01 00 66 0f 1f 84 00 00 00 00 00 80 3d 41 24 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 48 83 ec
Jan 07 13:59:00 server kernel: RSP: 002b:00007fff14860eb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Jan 07 13:59:00 server kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007bde6875719d
Jan 07 13:59:00 server kernel: RDX: 0000000000002000 RSI: 000058bcc0f01bf0 RDI: 0000000000000008
Jan 07 13:59:00 server kernel: RBP: 0000000000002000 R08: 0000000000000000 R09: 00007bde68831d30
Jan 07 13:59:00 server kernel: R10: 000058bcc0f01bf0 R11: 0000000000000246 R12: 000058bcc0f01bf0
Jan 07 13:59:00 server kernel: R13: 000058bcbb2fb2a0 R14: 0000000000000008 R15: 000058bcc0e2c3a0
Jan 07 13:59:00 server kernel: </TASK>
Last edited: