Proxmox Freezing

dfunction

New Member
Oct 31, 2024
2
0
1
Hi,
For the last several weeks my machine will freeze about once every 5 to 10 days. I've never been able to figure out why. Since last night it has crashed five times, and I would appreciate some help. The last crash was so bad I had to remove the power cord - I was only getting a blank screen through the Supermicro BMC.

I thought it may be a nic card issue, so I switched to a standard intel card - did not help.

Machine specs:
  • Current Proxmox Virtual Environment 8.2.7 (all software up to date)
  • Supermicro Super Server/X13SAE-F motherboard, BIOS 3.3b 08/26/2024 (up to date)
  • 12th Gen Intel(R) Core(TM) i7-12700K


Here is what is in my proxmox /var/lib/systemd/pstore/ directory:
Code:
<4>[   39.528159] ------------[ cut here ]------------
<4>[   39.528295] general protection fault, maybe for address 0x1: 0000 [#1] PREEMPT SMP NOPTI
<4>[   39.528297] CPU: 7 PID: 400 Comm: spl_kmem_cache Tainted: P        W  OE      6.8.12-2-pve #1
<4>[   39.528298] Hardware name: Supermicro Super Server/X13SAE-F, BIOS 3.3b 08/26/2024
<4>[   39.528298] RIP: 0010:fbcon_scroll+0x75/0x1c0
<4>[   39.528303] Code: 25 bf 8b 90 d8 03 00 00 85 d2 74 23 b8 01 00 00 00 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 d2 31 c9 31 f6 31 ff 45 31 c0 <c3> cc cc cc cc 80 bb ec 01 00 00 00 75 d4 48 8b 80 e0 03 00 00 8b
<4>[   39.528303] RSP: 0018:ffffb6abc12cf780 EFLAGS: 00010046
<4>[   39.528305] RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000000
<4>[   39.528305] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
<4>[   39.528306] RBP: ffffb6abc12cf7c0 R08: 0000000000000000 R09: 0000000000000000
<4>[   39.528306] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9da84029e000
<4>[   39.528306] R13: 0000000000000000 R14: 0000000000000030 R15: 0000000000000000
<4>[   39.528307] FS:  0000000000000000(0000) GS:ffff9db77f580000(0000) knlGS:0000000000000000
<4>[   39.528308] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[   39.528308] CR2: 00007f5e402e9c1c CR3: 0000000fbee36001 CR4: 0000000000f72ef0
<4>[   39.528309] PKRU: 55555554
<4>[   39.528309] Call Trace:
<4>[   39.528310]  <TASK>
<4>[   39.528311]  ? show_regs+0x6d/0x80
<4>[   39.528314]  ? die_addr+0x37/0xa0
<4>[   39.528315]  ? exc_general_protection+0x1db/0x480
<4>[   39.528318]  ? asm_exc_general_protection+0x27/0x30
<4>[   39.528321]  ? fbcon_scroll+0x75/0x1c0
<4>[   39.528323]  ? console_flush_all+0x17f/0x390
<4>[   39.528326]  ? console_unlock+0x56/0x130
<4>[   39.528327]  ? vprintk_emit+0xd6/0x330
<4>[   39.528329]  ? vmap_small_pages_range_noflush+0x345/0x620
<4>[   39.528330]  ? vprintk_default+0x1d/0x30
<4>[   39.528332]  ? vprintk+0x42/0x80
<4>[   39.528333]  ? _printk+0x60/0x90
<4>[   39.528334]  ? report_bug+0x156/0x1b0
<4>[   39.528336]  ? handle_bug+0x46/0x90
<4>[   39.528337]  ? exc_invalid_op+0x18/0x80
<4>[   39.528338]  ? asm_exc_invalid_op+0x1b/0x20
<4>[   39.528340]  ? vmap_small_pages_range_noflush+0x345/0x620
<4>[   39.528341]  ? __vmap_pages_range_noflush+0x11a/0x150
<4>[   39.528342]  ? alloc_pages_bulk_array_mempolicy+0xbd/0x240
<4>[   39.528344]  ? __vmalloc_node_range+0x4a1/0x8f0
<4>[   39.528347]  ? spl_cache_grow_work+0x8a/0x250 [spl]
<4>[   39.528356]  ? __vmalloc_node+0x4e/0x80
<4>[   39.528357]  ? spl_cache_grow_work+0x8a/0x250 [spl]
<4>[   39.528363]  ? __vmalloc+0x1e/0x30
<4>[   39.528364]  ? spl_cache_grow_work+0x8a/0x250 [spl]
<4>[   39.528370]  ? taskq_thread+0x27f/0x4c0 [spl]
<4>[   39.528377]  ? finish_task_switch.isra.0+0x8c/0x310
<4>[   39.528380]  ? __pfx_default_wake_function+0x10/0x10
<4>[   39.528382]  ? __pfx_spl_cache_grow_work+0x10/0x10 [spl]
<4>[   39.528388]  ? __pfx_taskq_thread+0x10/0x10 [spl]
<4>[   39.528394]  ? kthread+0xef/0x120
<4>[   39.528395]  ? __pfx_kthread+0x10/0x10
<4>[   39.528397]  ? ret_from_fork+0x44/0x70
<4>[   39.528398]  ? __pfx_kthread+0x10/0x10
<4>[   39.528399]  ? ret_from_fork_asm+0x1b/0x30
<4>[   39.528401]  </TASK>
<4>[   39.528401] Modules linked in: cfg80211 veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter nf_tables sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink snd_hda_cod
ec_hdmi snd_hda_codec_realtek snd_hda_codec_generic intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common x86_pkg_temp_thermal snd_sof_pci_intel_tgl i915(OE) intel_powerclamp snd_sof_intel_hda_
common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic
_allocation coretemp soundwire_bus snd_soc_core kvm_intel snd_compress ipmi_ssif xe ac97_bus snd_pcm_dmaengine kvm snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec crct10dif_pclmul polyval_clmulni polyval_gener
ic drm_gpuvm snd_hda_core ghash_clmulni_intel drm_exec sha256_ssse3 gpu_sched snd_hwdep drm_buddy sha1_ssse3 drm_suballoc_helper snd_pcm
<4>[   39.528423]  aesni_intel drm_ttm_helper cmdlinepart ttm snd_timer crypto_simd mei_hdcp mei_pxp cryptd spi_nor snd drm_display_helper rapl acpi_ipmi intel_cstate wmi_bmof pcspkr mtd ast cec soundcore ucsi_acpi mei_me ipm
i_si intel_pmc_core typec_ucsi rc_core mei ipmi_devintf intel_vsec typec ipmi_msghandler pmt_telemetry pmt_class acpi_pad acpi_tad input_leds joydev mac_hid vhost_net vhost vhost_iotlb tap vfio_pci vfio_pci_core irqbypass vfi
o_iommu_type1 vfio iommufd efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) rndis_host cdc_ether usbnet mii btrfs blake2b_generic xor hid_generic usbmouse usbhid hid raid6_pq libcrc32c xhci_pci nvme xhci_pci_ren
esas crc32_pclmul igb e1000e nvme_core xhci_hcd spi_intel_pci i2c_i801 i2c_algo_bit igc intel_lpss_pci ahci spi_intel i2c_smbus dca nvme_auth intel_lpss libahci idma64 video wmi pinctrl_alderlake
<4>[   39.528449] ---[ end trace 0000000000000000 ]---

I see the following in my journal all the time, though, it does not seem to be important since it would crash hours or days after this:

Code:
20.8898731
fwln10310: entered allmulticast
20.889915]
fuln103i0: entered promiscuous mode
20.889974] fubr10310:
port 1(fuln103io)
entered blocking state
20.8899851
fubr10310:
1 (fuln10310)
entered forwarding state
20.894718] fubr10310:
2 (veth10310)
entered blocking state
20.8947251
fubr10310:
2 (veth103io)
entered disabled state
20.894733] veth103i0:
entered allmulticast mode
20.8947551
veth103i0: entered promiscuous mode
20.922118]
ethe: renamed from vethjgSaMU
22.324199]
audit: type=1400 audit (1730406859.047:30): apparmon="STATUS" operation="profile_load" profile="/usr/bin/1xc-start
name="1xc-104_</var/lib/1xc›"
pid=2583 comm="apparmor_parsen"
22.6565671
vmbro: port 5(veth10410) entered blocking state
22.656733]
vmbr0: port 5(veth104i0) entered disabled
22.656870]
veth10410: entered allmulticast mode
22.657024] veth10410:
entered promiscuous mode
22. 683853]
etho: renamed from vethN5mCzQ
23.074763]
vmbro: port 5(veth10410) entered blocking state
23.074914]
vmbr0: port 5(veth104i0) entered forwarding state
23.329629]
audit: type=1400 audit (1730406860.052:31): apparmor="STATUS" operation="profile_load" profile="/us/bin/1xc-start
name="1xc-105_</var/1ib/1xc›"
pid=2835 comm="apparmor_parsen"
23.669414] vmbro: port 6(veth105i0) entered blocking state
23.6696461
vmbro: port 6(veth105i0) entered disabled
23.669881]
veth10510: entered allmulticast mode
23.6700411
veth10510: entered promiscuous mode
23.6972871
etho: renamed from vethuk1BrQ
24.0758781
vmbro: port 6(veth10510) entered blocking state
24.0760581
vmbro: port 6(veth10510) entered forwarding
24.3983081
tap10710: entered promiscuous mode
24.431116]
vmbro: port 7(fupr107p0) entered blocking state
24.431324]
vmbro: port 7(fur107p0) entered disabled
24.431526]
fupr107p0: entered allmulticast mode
24.431701]
24.431838]
fupr107p0: entered promiscuous mode vmbro: port 7(fupr107p0) entered blocking
24.431944]
vmbro: port 7(fupr107p0) entered forwarding state
24.436588]
fubr10710: port 1(fuln107i0) entered blocking state
24.436739]
fubr 10710:
port 1(fuln107i0) entered disabled
24.4368551
fuln10710:
entered allmulticast mode
24.436984] fuln107i0:
entered promiscuous mode
24.437120] fubr10710:
port 1(fuln10710) entered blocking state
24.437214] fubr10710:
port 1(fuln107i0) entered forwarding state
24.4423271
fubr107i0: port
2(tap10710) entered blocking state
24.442498]
fubr10710: port
2(tap107i0) entered
disabled
state
24.442669]
tap10710: entered allmulticast mode
24.442850]
fubr10710: port 2(tap107i0) entered
24.442946]
blocking state
fubr10710:
port
2 (tap107i0)
entered
25.497917] fubr10310:
port
forwarding state
2 (veth103i0)
entered blocking state
25.4980951
fubr103i0:
port
2 (veth10310)
entered forwarding state

I'd appreciate any help!
 
No problem, here is the output fromlspci -nnk

Code:
00:00.0 Host bridge [0600]: Intel Corporation 12th Gen Core Processor Host Bridge/DRAM Registers [8086:4668] (rev 02)
    Subsystem: Super Micro Computer Inc 12th Gen Core Processor Host Bridge/DRAM Registers [15d9:1c48]
00:02.0 VGA compatible controller [0300]: Intel Corporation AlderLake-S GT1 [8086:4680] (rev 0c)
    Subsystem: Super Micro Computer Inc AlderLake-S GT1 [15d9:1c48]
    Kernel driver in use: vfio-pci
    Kernel modules: xe, i915
00:0a.0 Signal processing controller [1180]: Intel Corporation Platform Monitoring Technology [8086:467d] (rev 01)
    Kernel driver in use: intel_vsec
    Kernel modules: intel_vsec
00:14.0 USB controller [0c03]: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller [8086:7ae0] (rev 11)
    Subsystem: Super Micro Computer Inc Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller [15d9:1c48]
    Kernel driver in use: xhci_hcd
    Kernel modules: xhci_pci
00:14.2 RAM memory [0500]: Intel Corporation Alder Lake-S PCH Shared SRAM [8086:7aa7] (rev 11)
00:15.0 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #0 [8086:7acc] (rev 11)
    Subsystem: Super Micro Computer Inc Alder Lake-S PCH Serial IO I2C Controller [15d9:1c48]
    Kernel driver in use: intel-lpss
    Kernel modules: intel_lpss_pci
00:15.1 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #1 [8086:7acd] (rev 11)
    Subsystem: Super Micro Computer Inc Alder Lake-S PCH Serial IO I2C Controller [15d9:1c48]
    Kernel driver in use: intel-lpss
    Kernel modules: intel_lpss_pci
00:16.0 Communication controller [0780]: Intel Corporation Alder Lake-S PCH HECI Controller #1 [8086:7ae8] (rev 11)
    Subsystem: Super Micro Computer Inc Alder Lake-S PCH HECI Controller [15d9:1c48]
    Kernel driver in use: mei_me
    Kernel modules: mei_me
00:16.3 Serial controller [0700]: Intel Corporation Device [8086:7aeb] (rev 11)
    Subsystem: Super Micro Computer Inc Device [15d9:1c48]
    Kernel driver in use: serial
00:17.0 SATA controller [0106]: Intel Corporation Alder Lake-S PCH SATA Controller [AHCI Mode] [8086:7ae2] (rev 11)
    Subsystem: Super Micro Computer Inc Alder Lake-S PCH SATA Controller [AHCI Mode] [15d9:1c48]
    Kernel driver in use: ahci
    Kernel modules: ahci
00:1a.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #25 [8086:7ac8] (rev 11)
    Subsystem: Super Micro Computer Inc Alder Lake-S PCH PCI Express Root Port [15d9:1c48]
    Kernel driver in use: pcieport
00:1b.0 PCI bridge [0604]: Intel Corporation Device [8086:7ac0] (rev 11)
    Kernel driver in use: pcieport
00:1b.4 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port [8086:7ac4] (rev 11)
    Subsystem: Super Micro Computer Inc Alder Lake-S PCH PCI Express Root Port [15d9:1c48]
    Kernel driver in use: pcieport
00:1c.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #1 [8086:7ab8] (rev 11)
    Subsystem: Super Micro Computer Inc Alder Lake-S PCH PCI Express Root Port [15d9:1c48]
    Kernel driver in use: pcieport
00:1c.1 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #2 [8086:7ab9] (rev 11)
    Subsystem: Super Micro Computer Inc Alder Lake-S PCH PCI Express Root Port [15d9:1c48]
    Kernel driver in use: pcieport
00:1c.3 PCI bridge [0604]: Intel Corporation Device [8086:7abb] (rev 11)
    Subsystem: Super Micro Computer Inc Device [15d9:1c48]
    Kernel driver in use: pcieport
00:1d.0 PCI bridge [0604]: Intel Corporation Alder Lake-S PCH PCI Express Root Port #9 [8086:7ab0] (rev 11)
    Subsystem: Super Micro Computer Inc Alder Lake-S PCH PCI Express Root Port [15d9:1c48]
    Kernel driver in use: pcieport
00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:7a88] (rev 11)
    Subsystem: Super Micro Computer Inc Device [15d9:1c48]
00:1f.3 Audio device [0403]: Intel Corporation Alder Lake-S HD Audio Controller [8086:7ad0] (rev 11)
    Subsystem: Super Micro Computer Inc Alder Lake-S HD Audio Controller [15d9:1c48]
    Kernel driver in use: snd_hda_intel
    Kernel modules: snd_hda_intel, snd_sof_pci_intel_tgl
00:1f.4 SMBus [0c05]: Intel Corporation Alder Lake-S PCH SMBus Controller [8086:7aa3] (rev 11)
    Subsystem: Super Micro Computer Inc Alder Lake-S PCH SMBus Controller [15d9:1c48]
    Kernel driver in use: i801_smbus
    Kernel modules: i2c_i801
00:1f.5 Serial bus controller [0c80]: Intel Corporation Alder Lake-S PCH SPI Controller [8086:7aa4] (rev 11)
    Subsystem: Super Micro Computer Inc Alder Lake-S PCH SPI Controller [15d9:1c48]
    Kernel driver in use: intel-spi
    Kernel modules: spi_intel_pci
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (17) I219-LM [8086:1a1c] (rev 11)
    DeviceName:  Intel Ethernet i219-LM
    Subsystem: Super Micro Computer Inc Ethernet Connection (17) I219-LM [15d9:1a1c]
    Kernel driver in use: e1000e
    Kernel modules: e1000e
01:00.0 Non-Volatile memory controller [0108]: Kingston Technology Company, Inc. NV2 NVMe SSD SM2267XT (DRAM-less) [2646:5017] (rev 03)
    Subsystem: Kingston Technology Company, Inc. NV2 NVMe SSD SM2267XT (DRAM-less) [2646:5017]
    Kernel driver in use: nvme
    Kernel modules: nvme
03:00.0 Non-Volatile memory controller [0108]: Kingston Technology Company, Inc. NV2 NVMe SSD SM2267XT (DRAM-less) [2646:5017] (rev 03)
    Subsystem: Kingston Technology Company, Inc. NV2 NVMe SSD SM2267XT (DRAM-less) [2646:5017]
    Kernel driver in use: nvme
    Kernel modules: nvme
04:00.0 PCI bridge [0604]: Integrated Technology Express, Inc. IT8893E PCIe to PCI Bridge [1283:8893] (rev 41)
06:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-LM [8086:15f2] (rev 03)
    DeviceName:  Intel Ethernet i225-LM
    Subsystem: Super Micro Computer Inc Ethernet Controller I225-LM [15d9:15f2]
    Kernel driver in use: igc
    Kernel modules: igc
07:00.0 PCI bridge [0604]: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge [1a03:1150] (rev 06)
    Subsystem: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge [1a03:1150]
08:00.0 VGA compatible controller [0300]: ASPEED Technology, Inc. ASPEED Graphics Family [1a03:2000] (rev 52)
    DeviceName:  ASPEED Video AST2600
    Subsystem: ASPEED Technology, Inc. ASPEED Graphics Family [1a03:2000]
    Kernel driver in use: ast
    Kernel modules: ast
09:00.0 Ethernet controller [0200]: Intel Corporation 82576 Gigabit Network Connection [8086:10c9] (rev 01)
    Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter [8086:a03c]
    Kernel driver in use: igb
    Kernel modules: igb
09:00.1 Ethernet controller [0200]: Intel Corporation 82576 Gigabit Network Connection [8086:10c9] (rev 01)
    Subsystem: Intel Corporation Gigabit ET Dual Port Server Adapter [8086:a03c]
    Kernel driver in use: igb
    Kernel modules: igb

I had no idea there were potential issues upgrading. Thanks for the heads up. I appreciate any insight you may have.