Host hard crashes, PVE 8.1.4

I have similar N100 minipc from CWWK, but dual NIC version, and can confirm similar issue. It just crashes during night, and only way to make it live again is reboot. Funny thing that it only happened on Proxmox/Debian, and that same box was running rock solid on Unraid for 30 days (trial), with basically same load with bunch of containers running.
 
I suspected long time heat issues as the device is cooled passively, but removing the hood for better air circulation, did not improve anything.

I started the device again, keeping an eye on the log…when starting a Windows VM installation, a blue screen occurred…in the log: „CPU locked“
Any relevant conclusions here?

I'm currently testing Proxmox 7.4 on the buggy unit at the moment, could you please do the same so that we share results?
@Skedaddle do you have time to run the test in paralell?
I have similar N100 minipc from CWWK, but dual NIC version, and can confirm similar issue. It just crashes during night, and only way to make it live again is reboot. Funny thing that it only happened on Proxmox/Debian, and that same box was running rock solid on Unraid for 30 days (trial), with basically same load with bunch of containers running.
@mixedd please let us know the pveversion and kernel. I urge you too to try installing an older kernel so that we can test in paralell. I will continue testing as well during the weekend with 5.15.136 under pve 7.4.17 as well as with the 'newest' 6.2 on pve 8.1
 
Any relevant conclusions here?


@Skedaddle do you have time to run the test in paralell?

@mixedd please let us know the pveversion and kernel. I urge you too to try installing an older kernel so that we can test in paralell. I will continue testing as well during the weekend with 5.15.136 under pve 7.4.17 as well as with the 'newest' 6.2 on pve 8.1
Actually I'm still testing and trying to reproduce that issue.
So far changed enp0 to autostart, which was enabled only on vmbr0, and following one topic where someone had issues with Jumbo Frames, I disabled them on my switch in Unifi controller.

If I will be able to reproduce this issue, I will drop all relevant info I will able to find. Need to check tomorrow morning after some scheduled nightly Plex tasks and other things will run as it was in first case.
 
Any relevant conclusions here?


@Skedaddle do you have time to run the test in paralell?

@mixedd please let us know the pveversion and kernel. I urge you too to try installing an older kernel so that we can test in paralell. I will continue testing as well during the weekend with 5.15.136 under pve 7.4.17 as well as with the 'newest' 6.2 on pve 8.1
So in the end my issues turned out to be power supply related, managed to reproduce couple of times during nightly Plex library scan where it also does intro/credit detection, so in other words moderate/heavy load for that little N100 box. When it was on my table for repaste (tought about overheating at first) tought to look at PSU, and was shocked by that it's only 35W, so combine 15W from N100 at full load togheter with older 7200 SATA disk looks like fine recepy for disaster.
 
New to Proxmox here and having similar issues with a Topton N100 mini pc firewall. 32gb ram 1tb nvme. Initially it was repeatedly crashing within a few minutes when running HASSIO which was fixed by disabling split lock detection. After this fix things improved, it will run for days at a time (longest run 10 days) and then hang, power cycle needed. Temps are 48°-51°c. Topton suggested I get a fan which is on order but after reading the comments here I don't think it will help. I tried disabling C-states but the bios has no such option. I suspect it happens when the web gui shell is open, I have to do more testing to confirm.

System log shows hard and soft lockups which doesn't seem too great. Any advice would be much appreciated, I don't know where to go from here except try to return the unit which is a shame since other than not working it checks all the boxes for my application.

Mar 24 18:28:23 proxmox kernel: CPU: 0 PID: 224936 Comm: pvedaemon worke Tainted: P B D O 6.5.13-1-pve #1
Mar 24 18:28:23 proxmox kernel: Hardware name: Default string Default string/Default string, BIOS 5.27 09/28/2023
Mar 24 18:28:23 proxmox kernel: Call Trace:
Mar 24 18:28:23 proxmox kernel: <TASK>
Mar 24 18:28:23 proxmox kernel: dump_stack_lvl+0x48/0x70
Mar 24 18:28:23 proxmox kernel: dump_stack+0x10/0x20
Mar 24 18:28:23 proxmox kernel: __schedule_bug+0x64/0x80
Mar 24 18:28:23 proxmox kernel: __schedule+0x100d/0x1440
Mar 24 18:28:23 proxmox kernel: ? vprintk+0x42/0x80
Mar 24 18:28:23 proxmox kernel: ? _printk+0x60/0x90
Mar 24 18:28:23 proxmox kernel: do_task_dead+0x44/0x50
Mar 24 18:28:23 proxmox kernel: make_task_dead+0x15a/0x180
Mar 24 18:28:23 proxmox kernel: rewind_stack_and_make_dead+0x17/0x20
Mar 24 18:28:23 proxmox kernel: RIP: 0033:0x7dcfe3328349
Mar 24 18:28:23 proxmox kernel: Code: Unable to access opcode bytes at 0x7dcfe332831f.
Mar 24 18:28:23 proxmox kernel: RSP: 002b:00007ffc7e5cd588 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
Mar 24 18:28:23 proxmox kernel: RAX: ffffffffffffffda RBX: 00007dcfe34229e0 RCX: 00007dcfe3328349
Mar 24 18:28:23 proxmox kernel: RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
Mar 24 18:28:23 proxmox kernel: RBP: 0000000000000000 R08: ffffffffffffff78 R09: 00007dcfe342dac0
Mar 24 18:28:23 proxmox kernel: R10: 00007dcfe325e320 R11: 0000000000000246 R12: 00007dcfe34229e0
Mar 24 18:28:23 proxmox kernel: R13: 00007dcfe34282e0 R14: 00000000000001ab R15: 00007dcfe34282c8
Mar 24 18:28:23 proxmox kernel: </TASK>
Mar 24 18:30:40 proxmox kernel: watchdog: Watchdog detected hard LOCKUP on cpu 1
Mar 24 18:30:40 proxmox kernel: Modules linked in: udp_diag tcp_diag inet_diag cfg80211 veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls sunrpc nfnetlink_log binfmt_misc nfnetlink intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi coretemp kvm_intel kvm snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel irqbypass crct10dif_pclmul polyval_clmulni polyval_generic snd_sof_intel_hda_mlink ghash_clmulni_intel sha256_ssse3 soundwire_cadence sha1_ssse3 aesni_intel snd_sof_intel_hda snd_sof_pci crypto_simd i915 cryptd snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match mei_pxp mei_hdcp snd_soc_acpi soundwire_generic_allocation soundwire_bus snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec rapl snd_hda_core snd_hwdep drm_buddy ttm snd_pcm drm_display_helper snd_timer intel_cstate wmi_bmof pcspkr cec
Mar 24 18:30:40 proxmox kernel: snd cmdlinepart rc_core soundcore spi_nor mei_me drm_kms_helper mtd mei i2c_algo_bit acpi_tad acpi_pad mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c nvme crc32_pclmul xhci_pci xhci_pci_renesas nvme_core spi_intel_pci i2c_i801 spi_intel i2c_smbus nvme_common igc xhci_hcd ahci libahci video wmi
Mar 24 18:30:40 proxmox kernel: CPU: 1 PID: 2250658 Comm: qm Tainted: P B D W O 6.5.13-1-pve #1
Mar 24 18:30:40 proxmox kernel: Hardware name: Default string Default string/Default string, BIOS 5.27 09/28/2023
Mar 24 18:30:40 proxmox kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x7f/0x2d0
Mar 24 18:30:40 proxmox kernel: Code: 00 00 f0 0f ba 2b 08 0f 92 c2 8b 03 0f b6 d2 c1 e2 08 30 e4 09 d0 3d ff 00 00 00 77 5f 85 c0 74 10 0f b6 03 84 c0 74 09 f3 90 <0f> b6 03 84 c0 75 f7 b8 01 00 00 00 66 89 03 5b 41 5c 41 5d 41 5e
Mar 24 18:30:40 proxmox kernel: RSP: 0018:ffffc0494a12ba90 EFLAGS: 00000002
Mar 24 18:30:40 proxmox kernel: RAX: 0000000000000001 RBX: ffff99b18f721050 RCX: 0000000000000000
Mar 24 18:30:40 proxmox kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff99b18f721050
Mar 24 18:30:40 proxmox kernel: RBP: ffffc0494a12bab0 R08: 0000000000000000 R09: 0000000000000000
Mar 24 18:30:40 proxmox kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000246
Mar 24 18:30:40 proxmox kernel: R13: 0000000000000000 R14: ffffc0494a12bc60 R15: 0000000000000000
Mar 24 18:30:40 proxmox kernel: FS: 00007b7e4100f740(0000) GS:ffff99b8dfa80000(0000) knlGS:0000000000000000
Mar 24 18:30:40 proxmox kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 24 18:30:40 proxmox kernel: CR2: 00007b7e3f20afc0 CR3: 00000003208aa000 CR4: 0000000000752ee0
Mar 24 18:30:40 proxmox kernel: PKRU: 55555554
Mar 24 18:30:40 proxmox kernel: Call Trace:
Mar 24 18:30:40 proxmox kernel: <NMI>
Mar 24 18:30:40 proxmox kernel: ? show_regs+0x6d/0x80
Mar 24 18:30:40 proxmox kernel: ? watchdog_hardlockup_check+0x10c/0x1e0
Mar 24 18:30:40 proxmox kernel: ? watchdog_overflow_callback+0x6b/0x80
Mar 24 18:30:40 proxmox kernel: ? __perf_event_overflow+0x119/0x380
Mar 24 18:30:40 proxmox kernel: ? perf_event_overflow+0x19/0x30
Mar 24 18:30:40 proxmox kernel: ? handle_pmi_common+0x175/0x3f0
Mar 24 18:30:40 proxmox kernel: ? intel_pmu_handle_irq+0x11f/0x480
Mar 24 18:30:40 proxmox kernel: ? perf_event_nmi_handler+0x2b/0x50
Mar 24 18:30:40 proxmox kernel: ? nmi_handle+0x5d/0x160
Mar 24 18:30:40 proxmox kernel: ? default_do_nmi+0x47/0x130
Mar 24 18:30:40 proxmox kernel: ? exc_nmi+0x1d5/0x2a0
Mar 24 18:30:40 proxmox kernel: ? end_repeat_nmi+0x16/0x67
Mar 24 18:30:40 proxmox kernel: ? native_queued_spin_lock_slowpath+0x7f/0x2d0
Mar 24 18:30:40 proxmox kernel: ? native_queued_spin_lock_slowpath+0x7f/0x2d0
Mar 24 18:30:40 proxmox kernel: ? native_queued_spin_lock_slowpath+0x7f/0x2d0
Mar 24 18:30:40 proxmox kernel: </NMI>
Mar 24 18:30:40 proxmox kernel: <TASK>
Mar 24 18:30:40 proxmox kernel: _raw_spin_lock_irqsave+0x5c/0x80
Mar 24 18:30:40 proxmox kernel: folio_lruvec_lock_irqsave+0x60/0xa0
Mar 24 18:30:40 proxmox kernel: release_pages+0x269/0x4c0
Mar 24 18:30:40 proxmox kernel: ? unlink_anon_vmas+0x14b/0x1c0
Mar 24 18:30:40 proxmox kernel: free_pages_and_swap_cache+0x4a/0x60
Mar 24 18:30:40 proxmox kernel: tlb_batch_pages_flush+0x43/0x80
Mar 24 18:30:40 proxmox kernel: tlb_finish_mmu+0x73/0x1a0
Mar 24 18:30:40 proxmox kernel: unmap_region+0x119/0x160
Mar 24 18:30:40 proxmox kernel: do_vmi_align_munmap+0x37f/0x550
Mar 24 18:30:40 proxmox kernel: do_vmi_munmap+0xdf/0x190
Mar 24 18:30:40 proxmox kernel: __vm_munmap+0xae/0x180
Mar 24 18:30:40 proxmox kernel: __x64_sys_munmap+0x27/0x40
Mar 24 18:30:40 proxmox kernel: do_syscall_64+0x58/0x90
Mar 24 18:30:40 proxmox kernel: ? exit_to_user_mode_prepare+0x39/0x190
Mar 24 18:30:40 proxmox kernel: ? irqentry_exit_to_user_mode+0x17/0x20
Mar 24 18:30:40 proxmox kernel: ? irqentry_exit+0x43/0x50
Mar 24 18:30:40 proxmox kernel: ? exc_page_fault+0x94/0x1b0
Mar 24 18:30:40 proxmox kernel: entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Mar 24 18:30:40 proxmox kernel: RIP: 0033:0x7b7e4114f8f7
Mar 24 18:30:40 proxmox kernel: Code: 00 00 00 48 8b 15 09 05 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 0b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d9 04 0d 00 f7 d8 64 89 01 48
Mar 24 18:30:40 proxmox kernel: RSP: 002b:00007ffe9c87ca38 EFLAGS: 00000202 ORIG_RAX: 000000000000000b
Mar 24 18:30:40 proxmox kernel: RAX: ffffffffffffffda RBX: ffffffffffffff78 RCX: 00007b7e4114f8f7
Mar 24 18:30:40 proxmox kernel: RDX: 0000000000000000 RSI: 0000000000151000 RDI: 00007b7e3e6af000
Mar 24 18:30:40 proxmox kernel: RBP: 0000000000000016 R08: 0000000000151000 R09: 0000585b929b4180
Mar 24 18:30:40 proxmox kernel: R10: 606712f746acb496 R11: 0000000000000202 R12: 00007b7e41220820
Mar 24 18:30:40 proxmox kernel: R13: 0000585b929827a0 R14: 0000000000000151 R15: 00007b7e412222c8
Mar 24 18:30:40 proxmox kernel: </TASK>



Mar 24 18:31:08 proxmox kernel: watchdog: BUG: soft lockup - CPU#0 stuck for 52s! [CPU 2/KVM:958]
Mar 24 18:31:08 proxmox kernel: Modules linked in: udp_diag tcp_diag inet_diag cfg80211 veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls sunrpc nfnetlink_log binfmt_misc nfnetlink intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_hdmi coretemp kvm_intel kvm snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel irqbypass crct10dif_pclmul polyval_clmulni polyval_generic snd_sof_intel_hda_mlink ghash_clmulni_intel sha256_ssse3 soundwire_cadence sha1_ssse3 aesni_intel snd_sof_intel_hda snd_sof_pci crypto_simd i915 cryptd snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match mei_pxp mei_hdcp snd_soc_acpi soundwire_generic_allocation soundwire_bus snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec rapl snd_hda_core snd_hwdep drm_buddy ttm snd_pcm drm_display_helper snd_timer intel_cstate wmi_bmof pcspkr cec
Mar 24 18:31:08 proxmox kernel: snd cmdlinepart rc_core soundcore spi_nor mei_me drm_kms_helper mtd mei i2c_algo_bit acpi_tad acpi_pad mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq simplefb dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c nvme crc32_pclmul xhci_pci xhci_pci_renesas nvme_core spi_intel_pci i2c_i801 spi_intel i2c_smbus nvme_common igc xhci_hcd ahci libahci video wmi
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!