Kernel oops - BUG: unable to handle page fault

Slyrack

Member
May 16, 2022
5
0
6
Hi,

I'm testing Proxmox 8 on a Dell Precision T7960 workstation with the new Xeon w5-3435X (Sapphire Rapids) and I get a kernel Oops everytime I try to launch a VM.
The computer has been configured to enable PCI(e) passthrough using the doc (1 / 2) but the Oops occurs even when trying to exec a VM without any PCI device attached to it, so I guess that was not the cause.
Strange thing though, according to the doc, I had to enable IOMMU by putting intel_iommu=on iommu=pt on the cmdline even though its written that it is only necessary for pre-5.15 kernels.
I tried kernels 6.2.16-3-pve (default from ISO), 6.2.16-15-pve and 6.2.16-16-pve (most up to date at the moment).
Every time I want to launch a VM, first it starts up correctly and the kernel Oops a few seconds later (up to a point where a pre-restored from PBS Windows 10 VM makes it to its circular loading animation).
Here is the kernel output then :
Bash:
[  143.155575] BUG: unable to handle page fault for address: ff2c3744a37f7cff
[  143.155583] #PF: supervisor write access in kernel mode
[  143.155586] #PF: error_code(0x0003) - permissions violation
[  143.155588] PGD 117801067 P4D 117802067 PUD 1001f3063 PMD 1236c4063 PTE 80000001237f7161
[  143.155593] Oops: 0003 [#1] PREEMPT SMP NOPTI
[  143.155596] CPU: 0 PID: 631 Comm: z_wr_iss Tainted: P           O       6.2.16-3-pve #1
[  143.155598] Hardware name: Dell Inc. Precision 7960 Tower/01G0M6, BIOS 1.1.10 07/27/2023
[  143.155600] RIP: 0010:kfpu_begin+0x31/0xa0 [zcommon]
[  143.155612] Code: 3f 48 89 e5 fa 0f 1f 44 00 00 48 8b 15 88 89 00 00 65 8b 05 6d d5 78 3f 48 98 48 8b 0c c2 0f 1f 44 00 00 b8 ff ff ff ff 89 c2 <0f> c7 29 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc 0f 1f 44
[  143.155616] RSP: 0018:ff69853287b0f930 EFLAGS: 00010082
[  143.155619] RAX: 00000000ffffffff RBX: ff2c3744fd314000 RCX: ff2c3744a37f5000
[  143.155621] RDX: 00000000ffffffff RSI: ff2c3744fd314000 RDI: ff69853287b0fa80
[  143.155623] RBP: ff69853287b0f930 R08: 0000000000000000 R09: 0000000000000000
[  143.155625] R10: 0000000000000000 R11: 0000000000000000 R12: ff2c3744fd315000
[  143.155627] R13: ff69853287b0fa80 R14: 0000000000001000 R15: 0000000000000000
[  143.155629] FS:  0000000000000000(0000) GS:ff2c3753cfe00000(0000) knlGS:0000000000000000
[  143.155632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  143.155635] CR2: ff2c3744a37f7cff CR3: 000000010e620002 CR4: 0000000000773ef0
[  143.155637] PKRU: 55555554
[  143.155638] Call Trace:
[  143.155641]  <TASK>
[  143.155644]  fletcher_4_avx512f_native+0x1d/0xb0 [zcommon]
[  143.155658]  abd_fletcher_4_iter+0x71/0xe0 [zcommon]
[  143.155668]  abd_iterate_func+0x104/0x1e0 [zfs]
[  143.155789]  ? __pfx_abd_fletcher_4_iter+0x10/0x10 [zcommon]
[  143.155795]  ? __pfx_abd_fletcher_4_native+0x10/0x10 [zfs]
[  143.155912]  abd_fletcher_4_native+0x89/0xd0 [zfs]
[  143.156005]  ? txg_all_lists_empty+0x4f/0xa0 [zfs]
[  143.156091]  ? zio_vdev_io_done+0x4e/0x240 [zfs]
[  143.156169]  zio_checksum_compute+0x154/0x550 [zfs]
[  143.156240]  ? __kmem_cache_alloc_node+0x19d/0x340
[  143.156247]  ? spl_kmem_alloc+0xc3/0x120 [spl]
[  143.156257]  ? spl_kmem_alloc+0xc3/0x120 [spl]
[  143.156263]  ? __kmalloc_node+0x52/0xe0
[  143.156266]  ? spl_kmem_alloc+0xc3/0x120 [spl]
[  143.156273]  zio_checksum_generate+0x4d/0x80 [zfs]
[  143.156344]  zio_execute+0x94/0x170 [zfs]
[  143.156414]  taskq_thread+0x2ac/0x4d0 [spl]
[  143.156422]  ? __pfx_default_wake_function+0x10/0x10
[  143.156426]  ? __pfx_zio_execute+0x10/0x10 [zfs]
[  143.156497]  ? __pfx_taskq_thread+0x10/0x10 [spl]
[  143.156504]  kthread+0xe6/0x110
[  143.156508]  ? __pfx_kthread+0x10/0x10
[  143.156511]  ret_from_fork+0x29/0x50
[  143.156514]  </TASK>
[  143.156515] Modules linked in: tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables bonding tls softdog sunrpc nfnetlink_log binfmt_misc nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit snd_sof_pci_intel_tgl x86_pkg_temp_thermal snd_sof_intel_hda_common intel_powerclamp soundwire_intel soundwire_generic_allocation coretemp soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_ctl_led snd_sof_utils snd_soc_hdac_hda kvm_intel snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus snd_hda_codec_realtek kvm snd_hda_codec_generic snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine crct10dif_pclmul polyval_clmulni snd_hda_intel polyval_generic ghash_clmulni_intel snd_intel_dspcfg dell_wmi pmt_crashlog sha512_ssse3 pmt_telemetry ledtrig_audio snd_intel_sdw_acpi intel_sdsi pmt_class aesni_intel snd_virtuoso
[  143.156549]  snd_hda_codec crypto_simd snd_oxygen_lib snd_mpu401_uart cryptd dell_wmi_ddv snd_hda_core snd_rawmidi rapl snd_hwdep snd_seq_device dell_smbios snd_pcm dell_wmi_sysman intel_cstate sparse_keymap dcdbas ucsi_ccg pcspkr cmdlinepart firmware_attributes_class video snd_timer typec_ucsi dell_wmi_descriptor isst_if_mmio wmi_bmof isst_if_mbox_pci spi_nor idxd snd mei_me typec isst_if_common intel_vsec idxd_bus soundcore mtd mei input_leds mac_hid vhost_net vhost vhost_iotlb tap vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd drm efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor hid_generic usbmouse usbkbd usbhid hid raid6_pq libcrc32c simplefb rtsx_pci_sdmmc nvme xhci_pci i2c_nvidia_gpu xhci_pci_renesas nvme_core crc32_pclmul atlantic i2c_ccgx_ucsi spi_intel_pci nvme_common i2c_i801 e1000e ahci rtsx_pci spi_intel i2c_smbus macsec xhci_hcd libahci wmi
[  143.156601]  pinctrl_alderlake
[  143.156610] CR2: ff2c3744a37f7cff
[  143.156612] ---[ end trace 0000000000000000 ]---
[  143.316004] RIP: 0010:kfpu_begin+0x31/0xa0 [zcommon]
[  143.316030] Code: 3f 48 89 e5 fa 0f 1f 44 00 00 48 8b 15 88 89 00 00 65 8b 05 6d d5 78 3f 48 98 48 8b 0c c2 0f 1f 44 00 00 b8 ff ff ff ff 89 c2 <0f> c7 29 5d 31 c0 31 d2 31 c9 31 f6 31 ff c3 cc cc cc cc 0f 1f 44
[  143.316034] RSP: 0018:ff69853287b0f930 EFLAGS: 00010082
[  143.316037] RAX: 00000000ffffffff RBX: ff2c3744fd314000 RCX: ff2c3744a37f5000
[  143.316039] RDX: 00000000ffffffff RSI: ff2c3744fd314000 RDI: ff69853287b0fa80
[  143.316041] RBP: ff69853287b0f930 R08: 0000000000000000 R09: 0000000000000000
[  143.316043] R10: 0000000000000000 R11: 0000000000000000 R12: ff2c3744fd315000
[  143.316044] R13: ff69853287b0fa80 R14: 0000000000001000 R15: 0000000000000000
[  143.316046] FS:  0000000000000000(0000) GS:ff2c3753cfe00000(0000) knlGS:0000000000000000
[  143.316048] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  143.316050] CR2: ff2c3744a37f7cff CR3: 000000010e620002 CR4: 0000000000773ef0
[  143.316052] PKRU: 55555554
[  143.316053] note: z_wr_iss[631] exited with irqs disabled
[  143.316074] note: z_wr_iss[631] exited with preempt_count 1
Complete dmesg : https://pastebin.com/BVSJQrYL
Another tests : https://pastebin.com/jdVH7Yy0 and https://pastebin.com/v4Pk2BHQ

On the BIOS, up to date, all relevant virtualization support options are enabled already : VT, VT for Direct I/O, TXT (tried with and without) and Pre-Boot DMA protection + OS kernel DMA support (tried with and without). I also tried an option to "limit memory to less than 1 TB" (I have 64GB) because it is supposed to improve compatibility with some PCIE adapters, but no luck.

I also tested PVE 7 on this computer to check if that would work and, despite not having kernel oops, I was not able to launch a VM at all. I had a QEMU error preventing the start and did not spent much more time trying since it was just a test, but it made me suppose that the problem is somehow linked the (rather new) hardware configuration.

Any idea ? What can I do ?

Thanks.
 
Last edited:
Looking at my call trace, it totally looks like it.
I was not aware of that issue, thanks !
I'll try on Thursday but I'm optimistic.

Edit : It did indeed work, thank you!
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!