Proxmox crashes with Kernel Error

christian.hahn

New Member
Feb 23, 2024
1
0
1
Hi !

I have a new Proxmox Server installed on a Tuxedo Mini Server (https://www.tuxedocomputers.com/en/TUXEDO-Nano-Pro-Gen12.tuxedo)
After a while setting up VMs the server becomes unresponsive to the extend where I can do nothing anymore and need to turn it off via power switch.
This happened 2 times and the only thing I see in dmesg is:

[ 9326.421441] BUG: kernel NULL pointer dereference, address: 0000000000000510
[ 9326.421473] #PF: supervisor write access in kernel mode
[ 9326.421486] #PF: error_code(0x0002) - not-present page
[ 9326.421500] PGD 0 P4D 0
[ 9326.421534] CPU: 2 PID: 119 Comm: ksmd Tainted: P O 6.5.11-8-pve #1
[ 9326.421511] Oops: 0002 [#1] PREEMPT SMP NOPTI
[ 9326.421554] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./4X4-7040 Series/D5, BIOS P1.00 09/20/2023
[ 9326.421583] RIP: 0010:ksm_scan_thread+0x35c/0x2060
[ 9326.421602] Code: 82 f2 04 00 00 48 8b 03 48 89 df 49 89 45 00 e8 4a d6 ff ff 48 8b 43 10 48 89 de 48 8b 3d ec d9 3b 03 48 83 2d b4 d9 3b 03 01 <48> 83 a8 10 05 00 00 01 48 c7 43 10 00 00 00 00 e8 3f b0 00 00 49
[ 9326.421651] RSP: 0018:ffffae1300577e18 EFLAGS: 00010212
[ 9326.421665] RAX: 0000000000000000 RBX: ffff9c7348ba6200 RCX: 0000000000000000
[ 9326.421684] RDX: 0000000000000000 RSI: ffff9c7348ba6200 RDI: ffff9c7340222b00
[ 9326.421705] RBP: ffffae1300577ee0 R08: 0000000000000000 R09: 0000000000000000
[ 9326.421727] R10: 0000000000000001 R11: 0000000000000000 R12: ffff9c734006d800
[ 9326.421751] R13: ffff9c7340ba6940 R14: 00007f67677c0000 R15: ffffcfed481f7000
[ 9326.421773] FS: 0000000000000000(0000) GS:ffff9c823e880000(0000) knlGS:0000000000000000
[ 9326.421802] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9326.421818] CR2: 0000000000000510 CR3: 000000010c234000 CR4: 0000000000750ee0
[ 9326.421836] PKRU: 55555554
[ 9326.421845] Call Trace:
[ 9326.421854] <TASK>
[ 9326.421866] ? show_regs+0x6d/0x80
[ 9326.421886] ? __die+0x24/0x80
[ 9326.421897] ? page_fault_oops+0x176/0x500
[ 9326.421916] ? srso_alias_return_thunk+0x5/0x7f
[ 9326.421940] ? psi_task_switch+0xd3/0x240
[ 9326.421961] ? do_user_addr_fault+0x31d/0x6a0
[ 9326.421977] ? exc_page_fault+0x83/0x1b0
[ 9326.421998] ? asm_exc_page_fault+0x27/0x30
[ 9326.422025] ? ksm_scan_thread+0x35c/0x2060
[ 9326.422038] ? ksm_scan_thread+0x346/0x2060
[ 9326.422055] ? __pfx_ksm_scan_thread+0x10/0x10
[ 9326.422068] kthread+0xef/0x120
[ 9326.422080] ? __pfx_kthread+0x10/0x10
[ 9326.422097] ret_from_fork+0x44/0x70
[ 9326.422111] ? __pfx_kthread+0x10/0x10
[ 9326.422125] ret_from_fork_asm+0x1b/0x30
[ 9326.422146] </TASK>
[ 9326.422152] Modules linked in: tcp_diag inet_diag veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables softdog bonding tls sunrpc binfmt_misc nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common edac_mce_amd snd_hda_codec_realtek amdgpu kvm_amd snd_hda_codec_generic ledtrig_audio kvm mt7921e snd_hda_codec_hdmi mt7921_common amdxcp btusb iommu_v2 irqbypass mt76_connac_lib btrtl drm_buddy crct10dif_pclmul snd_hda_intel btbcm gpu_sched polyval_clmulni mt76 btintel snd_intel_dspcfg polyval_generic drm_suballoc_helper snd_intel_sdw_acpi ghash_clmulni_intel drm_ttm_helper btmtk ttm aesni_intel snd_hda_codec mac80211 bluetooth drm_display_helper crypto_simd snd_hda_core cryptd cec snd_hwdep ecdh_generic snd_pcm rc_core ecc rapl pcspkr snd_timer cfg80211 drm_kms_helper k10temp snd ipmi_devintf i2c_algo_bit ccp soundcore libarc4 ipmi_msghandler amd_pmc joydev input_leds mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap drm efi_pstore dmi_sysfs
[ 9326.422238] ip_tables x_tables autofs4 btrfs blake2b_generic xor hid_generic usbkbd usbmouse usbhid raid6_pq simplefb uas usb_storage dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c xhci_pci nvme xhci_pci_renesas crc32_pclmul thunderbolt xhci_hcd nvme_core ahci ehci_pci r8169 i2c_piix4 i2c_hid_acpi libahci ehci_hcd nvme_common video realtek i2c_hid wmi hid
[ 9326.422460] CR2: 0000000000000510
[ 9326.422469] ---[ end trace 0000000000000000 ]---
[ 9326.547114] RIP: 0010:ksm_scan_thread+0x35c/0x2060
[ 9326.547127] Code: 82 f2 04 00 00 48 8b 03 48 89 df 49 89 45 00 e8 4a d6 ff ff 48 8b 43 10 48 89 de 48 8b 3d ec d9 3b 03 48 83 2d b4 d9 3b 03 01 <48> 83 a8 10 05 00 00 01 48 c7 43 10 00 00 00 00 e8 3f b0 00 00 49
[ 9326.547144] RSP: 0018:ffffae1300577e18 EFLAGS: 00010212
[ 9326.547152] RAX: 0000000000000000 RBX: ffff9c7348ba6200 RCX: 0000000000000000
[ 9326.547161] RDX: 0000000000000000 RSI: ffff9c7348ba6200 RDI: ffff9c7340222b00
[ 9326.547170] RBP: ffffae1300577ee0 R08: 0000000000000000 R09: 0000000000000000
[ 9326.547179] R10: 0000000000000001 R11: 0000000000000000 R12: ffff9c734006d800
[ 9326.547187] R13: ffff9c7340ba6940 R14: 00007f67677c0000 R15: ffffcfed481f7000
[ 9326.547196] FS: 0000000000000000(0000) GS:ffff9c823e880000(0000) knlGS:0000000000000000
[ 9326.547626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9326.548004] CR2: 0000000000000510 CR3: 000000067d388000 CR4: 0000000000750ee0
[ 9326.548376] PKRU: 55555554
[ 9326.548739] note: ksmd[119] exited with irqs disabled


anyone knows what could be wrong ?
 
Hi,

[ 9326.421441] BUG: kernel NULL pointer dereference, address: 0000000000000510
[ 9326.421473] #PF: supervisor write access in kernel mode
[ 9326.421486] #PF: error_code(0x0002) - not-present page
[ 9326.421500] PGD 0 P4D 0
I guess this machine does not have ECC RAM? This sounds most likely like faulty memory, not to uncommon with consumer, non-ECC RAM.

I'd suggest you let run memtest86+ for a while and see if that finds any errors.
This can be done be rebooting and selecting memtest86+ in the bootloader menu, should be installed by default. You can also boot the latest Proxmox VE ISO from e.g. a USB and start it from there under Advanced Options.

You can also of course try taking DIMM modules out, if multiple are installed and try to find the faulty stick this way.

Depending on the amount of memory, this can take hours to days.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!