I went ahead and updated to PVE 9 even though I knew it came with 6.14 and I had this problem and posted in a different thread. Back then I was using patched drivers (I'm using a Tesla P4) because the 16.9 version of the drivers didn't support newer kernels.
I saw that 16.11 supported kernel 6.14 so I went ahead and updated. I am using the vgpu unlock from PolloLoco if that matters, but I was using it with 6.8 and never had an issue.
Whenever I start up a VM that has a VGPU assigned, I get the following (multiple times). It says "WARNING" so I don't know if this an issue or not. It looks scary but things seem to be working. I can see inside the VM and it is usable.
uname -r
6.14.8-2-pve
To be clear. I had this when I was running PVE 8, with Kernel 6.11 and 6.14 back when I had to patch the nvidia drivers to support those kernels (I tried with 16.8 and 16.9) and now with Kernel 6.14 and Nvidia 6.11 (no patch needed since it "supports" that kernel).
I saw that 16.11 supported kernel 6.14 so I went ahead and updated. I am using the vgpu unlock from PolloLoco if that matters, but I was using it with 6.8 and never had an issue.
Whenever I start up a VM that has a VGPU assigned, I get the following (multiple times). It says "WARNING" so I don't know if this an issue or not. It looks scary but things seem to be working. I can see inside the VM and it is usable.
uname -r
6.14.8-2-pve
To be clear. I had this when I was running PVE 8, with Kernel 6.11 and 6.14 back when I had to patch the nvidia drivers to support those kernels (I tried with 16.8 and 16.9) and now with Kernel 6.14 and Nvidia 6.11 (no patch needed since it "supports" that kernel).
Code:
[ 305.923429] ------------[ cut here ]------------
[ 305.923452] WARNING: CPU: 9 PID: 2925 at ./include/linux/rwsem.h:85 remap_pfn_range_internal+0x47b/0x590
[ 305.923472] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace netfs veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter sctp ip6_udp_tunnel udp_tunnel nf_tables softdog sunrpc binfmt_misc bonding tls openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 psample nfnetlink_log nvidia_vgpu_vfio(OE) intel_rapl_msr amd_atl intel_rapl_common amd64_edac edac_mce_amd kvm_amd ipmi_ssif nvidia(POE) kvm polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd ast rapl pcspkr mdev acpi_ipmi ccp k10temp ipmi_si ipmi_devintf ipmi_msghandler joydev input_leds mac_hid sch_fq_codel vhost_net vhost vhost_iotlb tap vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq mlx4_ib ib_uverbs ib_core mlx4_en hid_generic usbmouse usbkbd usbhid hid igb i2c_algo_bit nvme xhci_pci dca ahci
[ 305.923543] mlx4_core nvme_core libahci xhci_hcd nvme_auth i2c_piix4 ptdma i2c_smbus
[ 305.926272] CPU: 9 UID: 0 PID: 2925 Comm: CPU 3/KVM Tainted: P OE 6.14.8-2-pve #1
[ 305.926544] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 305.926813] Hardware name: Supermicro Super Server/H11SSL-NC, BIOS 3.2 01/24/2025
[ 305.927077] RIP: 0010:remap_pfn_range_internal+0x47b/0x590
[ 305.927337] Code: 5d 41 5e 41 5f 5d 31 d2 31 c9 31 f6 31 ff 45 31 c0 45 31 c9 45 31 d2 45 31 db c3 cc cc cc cc 0f 0b be ea ff ff ff eb b8 0f 0b <0f> 0b e9 0b fc ff ff 48 8b 7d a8 4c 89 fa 4c 89 ee 4c 89 55 b0 4c
[ 305.927871] RSP: 0018:ffffadb4425ff238 EFLAGS: 00010246
[ 305.928139] RAX: 00000000280200fb RBX: 000073aa9fe01000 RCX: 0000000000001000
[ 305.928412] RDX: 0000000000000000 RSI: 000073aa9fe00000 RDI: ffff8b5c6d0dc678
[ 305.928684] RBP: ffffadb4425ff2e8 R08: 8000000000000037 R09: 0000000000000000
[ 305.928964] R10: ffff8b5c5578c200 R11: 0000000000000000 R12: ffff8b5c6d0dc678
[ 305.929238] R13: 000000002000fdf1 R14: 8000000000000037 R15: 000073aa9fe00000
[ 305.929513] FS: 000073aabe7fd6c0(0000) GS:ffff8b7c0fa80000(0000) knlGS:0000000000000000
[ 305.929797] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 305.930077] CR2: 00007f9e7b3630d8 CR3: 0000001053f98000 CR4: 0000000000350ef0
[ 305.930361] Call Trace:
[ 305.930642] <TASK>
[ 305.930933] ? pat_pagerange_is_ram+0x7a/0xa0
[ 305.931216] ? memtype_lookup+0x3b/0x70
[ 305.931498] ? lookup_memtype+0xd1/0xf0
[ 305.931785] remap_pfn_range+0x5c/0xb0
[ 305.932068] ? up+0x56/0xa0
[ 305.932351] vgpu_mmio_fault_wrapper+0x1f4/0x2e0 [nvidia_vgpu_vfio]
[ 305.932638] __do_fault+0x3a/0x190
[ 305.932934] do_fault+0x11a/0x4e0
[ 305.933215] __handle_mm_fault+0x841/0x1040
[ 305.933492] handle_mm_fault+0x10e/0x350
[ 305.933765] __get_user_pages+0x86e/0x1540
[ 305.934033] get_user_pages_unlocked+0xe7/0x360
[ 305.934309] hva_to_pfn+0x373/0x520 [kvm]
[ 305.934626] ? __kvm_io_bus_write+0x2d/0xc0 [kvm]
[ 305.934944] ? kvm_io_bus_write+0x54/0x90 [kvm]
[ 305.935252] kvm_follow_pfn+0x91/0xf0 [kvm]
[ 305.935560] __kvm_faultin_pfn+0x5c/0x90 [kvm]
[ 305.935873] kvm_mmu_faultin_pfn+0x1af/0x6f0 [kvm]
[ 305.936203] kvm_tdp_page_fault+0x8e/0xe0 [kvm]
[ 305.936518] kvm_mmu_do_page_fault+0x244/0x280 [kvm]
[ 305.936832] kvm_mmu_page_fault+0x86/0x630 [kvm]
[ 305.937132] ? x86_emulate_instruction+0x42d/0x7a0 [kvm]
[ 305.937437] ? timerqueue_add+0x72/0xe0
[ 305.937688] ? enqueue_hrtimer+0x4d/0xb0
[ 305.937938] npf_interception+0xb9/0x190 [kvm_amd]
[ 305.938180] svm_invoke_exit_handler+0x11e/0x150 [kvm_amd]
[ 305.938417] svm_handle_exit+0x17d/0x220 [kvm_amd]
[ 305.938646] ? svm_vcpu_run+0x481/0x8d0 [kvm_amd]
[ 305.938876] vcpu_enter_guest+0x372/0x1630 [kvm]
[ 305.939134] ? kvm_arch_vcpu_load+0xaa/0x290 [kvm]
[ 305.939386] ? restore_fpregs_from_fpstate+0x3d/0xd0
[ 305.939596] ? fpu_swap_kvm_fpstate+0x80/0x120
[ 305.939805] kvm_arch_vcpu_ioctl_run+0x1b2/0x730 [kvm]
[ 305.940048] ? finish_task_switch.isra.0+0x9c/0x340
[ 305.940249] kvm_vcpu_ioctl+0x139/0xaa0 [kvm]
[ 305.940480] ? lock_timer_base+0x73/0xa0
[ 305.940665] ? __timer_delete_sync+0x86/0xf0
[ 305.940851] __x64_sys_ioctl+0xa4/0xe0
[ 305.941030] x64_sys_call+0x1053/0x2310
[ 305.941207] do_syscall_64+0x7e/0x170
[ 305.941384] ? nv_wait_for_plugin_completion.constprop.0+0xa9/0x150 [nvidia_vgpu_vfio]
[ 305.941565] ? up+0x56/0xa0
[ 305.941745] ? nv_vgpu_vfio_access+0x179/0x450 [nvidia_vgpu_vfio]
[ 305.941931] ? nv_vgpu_vfio_write+0xb4/0x140 [nvidia_vgpu_vfio]
[ 305.942109] ? nv_vfio_mdev_write+0x2b/0x40 [nvidia_vgpu_vfio]
[ 305.942286] ? vfio_device_fops_write+0x27/0x50 [vfio]
[ 305.942460] ? vfs_write+0x107/0x460
[ 305.942631] ? vfio_device_fops_read+0x27/0x50 [vfio]
[ 305.942810] ? __rseq_handle_notify_resume+0x9e/0x4a0
[ 305.942984] ? kvm_arch_vcpu_ioctl_run+0x41f/0x730 [kvm]
[ 305.943202] ? arch_exit_to_user_mode_prepare.isra.0+0xc8/0xd0
[ 305.943380] ? syscall_exit_to_user_mode+0x38/0x1d0
[ 305.943558] ? do_syscall_64+0x8a/0x170
[ 305.943741] ? syscall_exit_to_user_mode+0x38/0x1d0
[ 305.943915] ? do_syscall_64+0x8a/0x170
[ 305.944090] ? fire_user_return_notifiers+0x34/0x70
[ 305.944267] ? arch_exit_to_user_mode_prepare.isra.0+0x22/0xd0
[ 305.944445] ? syscall_exit_to_user_mode+0x38/0x1d0
[ 305.944622] ? do_syscall_64+0x8a/0x170
[ 305.944805] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 305.944986] RIP: 0033:0x73aed260b8db
[ 305.945164] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[ 305.945548] RSP: 002b:000073aabe7f8b30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 305.945753] RAX: ffffffffffffffda RBX: 0000569fefd992a0 RCX: 000073aed260b8db
[ 305.945954] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000037
[ 305.946156] RBP: 000000000000ae80 R08: 0000000000000000 R09: 0000000000000000
[ 305.946358] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 305.946558] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 305.946763] </TASK>
[ 305.946959] ---[ end trace 0000000000000000 ]---