Trouble compiling VGPU driver kernel, Proxmox 7.4

Reimern

New Member
Sep 3, 2024
2
0
1
Hey everyone,

First time here and I'm diving way deeper than I have any idea what I'm doing, and I need help. I'll try to give as much information as I can, I'm sorry if I'm missing something that would help here.

I'm trying to set up a hypervisor with proxmox on my 1U server Supermicro 1028GR-TR with two Tesla K80 GPUs. I'll try to give as much information as I can and my step by step progress that I have had and where I'm stuck.

I Started with a new 7.4 instal of Proxmox (I'm having issues with 8.2 too) on my server and changed my repositories to pve-no-sub.
I have updated and upgraded the dependancies that I am thinking i need:

apt -y install python3 python3-pip git build-essential pve-headers dkms jq
pip3 install frida
git clone https://github.com/DualCoder/vgpu_unlock
wget http://ftp.br.debian.org/debian/pool/main/m/mdevctl/mdevctl_0.81-1_all.deb
chmod -R +x vgpu_unlock
dpkg -i mdevctl_0.81-1_all.deb

I have also made sure that IOMMU is enabled while blacklisting NVidia drivers, I have also made sure to load the VFIO modules at boot as well.

I am now in the process of trying to compile the kernel for my NVidia drivers. I am using NVIDIA-Linux-x86_64-470.223.02-vgpu-kvm as when I go into the 5** drivers I no longer have compatibility with the Tesla K80 GPUs.

After making the .run file exacutable, and trying to compile with --dkms and I get stuck at 5% and am running into these errors.

image_2024-09-03_102925281.png

image_2024-09-03_095003447.png

Code:
DKMS make.log for nvidia-470.223.02 for kernel 5.15.158-2-pve (x86_64)
Tue Sep  3 09:36:15 EDT 2024
make[1]: Entering directory '/usr/src/linux-headers-5.15.158-2-pve'
warning: the compiler differs from the one used to build the kernel
  The kernel was built by: gcc (Debian 10.2.1-6) 10.2.1 20210110
  You are using:           cc (Debian 10.2.1-6) 10.2.1 20210110
  SYMLINK /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-kernel.o
 CONFTEST: hash__remap_4k_pfn
 CONFTEST: set_pages_uc
 CONFTEST: list_is_first
 CONFTEST: set_memory_uc
 CONFTEST: set_memory_array_uc
 CONFTEST: set_pages_array_uc
 CONFTEST: acquire_console_sem
 CONFTEST: console_lock
 CONFTEST: ioremap_cache
 CONFTEST: ioremap_wc
 CONFTEST: acpi_walk_namespace
 CONFTEST: sg_alloc_table
 CONFTEST: pci_get_domain_bus_and_slot
 CONFTEST: get_num_physpages
 CONFTEST: efi_enabled
 CONFTEST: pde_data
 CONFTEST: PDE_DATA
 CONFTEST: proc_remove
 CONFTEST: pm_vt_switch_required
 CONFTEST: xen_ioemu_inject_msi
 CONFTEST: phys_to_dma
 CONFTEST: get_dma_ops
 CONFTEST: dma_attr_macros
 CONFTEST: dma_map_page_attrs
 CONFTEST: write_cr4
 CONFTEST: of_get_property
 CONFTEST: of_find_node_by_phandle
 CONFTEST: of_node_to_nid
 CONFTEST: pnv_pci_get_npu_dev
 CONFTEST: of_get_ibm_chip_id
 CONFTEST: node_end_pfn
 CONFTEST: pci_bus_address
 CONFTEST: pci_stop_and_remove_bus_device
 CONFTEST: pci_remove_bus_device
 CONFTEST: register_cpu_notifier
 CONFTEST: cpuhp_setup_state
 CONFTEST: dma_map_resource
 CONFTEST: backlight_device_register
 CONFTEST: get_backlight_device_by_name
 CONFTEST: timer_setup
 CONFTEST: pci_enable_msix_range
 CONFTEST: kernel_read_has_pointer_pos_arg
 CONFTEST: kernel_write
 CONFTEST: kthread_create_on_node
 CONFTEST: of_find_matching_node
 CONFTEST: dev_is_pci
 CONFTEST: dma_direct_map_resource
 CONFTEST: tegra_get_platform
 CONFTEST: tegra_bpmp_send_receive
 CONFTEST: flush_cache_all
 CONFTEST: vmf_insert_pfn
 CONFTEST: jiffies_to_timespec
 CONFTEST: ktime_get_raw_ts64
 CONFTEST: ktime_get_real_ts64
 CONFTEST: full_name_hash
 CONFTEST: hlist_for_each_entry
 CONFTEST: pci_enable_atomic_ops_to_root
 CONFTEST: vga_tryget
 CONFTEST: pgprot_decrypted
 CONFTEST: cc_mkdec
 CONFTEST: iterate_fd
 CONFTEST: seq_read_iter
 CONFTEST: sg_page_iter_page
 CONFTEST: unsafe_follow_pfn
 CONFTEST: drm_gem_object_get
 CONFTEST: drm_gem_object_put_unlocked
 CONFTEST: set_close_on_exec
 CONFTEST: dma_set_coherent_mask
 CONFTEST: acpi_bus_get_device
 CONFTEST: get_task_ioprio
 CONFTEST: vfio_register_notifier
 CONFTEST: mdev_parent_dev
 CONFTEST: mdev_dev
 CONFTEST: mdev_get_type_group_id
 CONFTEST: mdev_uuid
 CONFTEST: mdev_from_dev
 CONFTEST: mdev_set_iommu_device
 CONFTEST: pci_irq_vector_helpers
 CONFTEST: kvmalloc
 CONFTEST: is_export_symbol_gpl_of_node_to_nid
 CONFTEST: is_export_symbol_gpl_sme_active
 CONFTEST: is_export_symbol_present_swiotlb_map_sg_attrs
 CONFTEST: is_export_symbol_present_swiotlb_dma_ops
 CONFTEST: is_export_symbol_present___close_fd
 CONFTEST: is_export_symbol_present_close_fd
 CONFTEST: is_export_symbol_present_get_unused_fd
 CONFTEST: is_export_symbol_present_get_unused_fd_flags
 CONFTEST: is_export_symbol_present_nvhost_get_default_device
 CONFTEST: is_export_symbol_present_nvhost_syncpt_unit_interface_get_byte_offset
 CONFTEST: is_export_symbol_present_nvhost_syncpt_unit_interface_get_aperture
 CONFTEST: is_export_symbol_present_tegra_dce_register_ipc_client
 CONFTEST: is_export_symbol_present_tegra_dce_unregister_ipc_client
 CONFTEST: is_export_symbol_present_tegra_dce_client_ipc_send_recv
 CONFTEST: is_export_symbol_present_dram_clk_to_mc_clk
 CONFTEST: is_export_symbol_present_get_dram_num_channels
 CONFTEST: is_export_symbol_present_tegra_dram_types
 CONFTEST: is_export_symbol_present_screen_info
 CONFTEST: acpi_op_remove
 CONFTEST: file_operations
 CONFTEST: file_inode
 CONFTEST: kuid_t
 CONFTEST: dma_ops
 CONFTEST: swiotlb_dma_ops
 CONFTEST: noncoherent_swiotlb_dma_ops
 CONFTEST: vm_fault_has_address
 CONFTEST: backlight_properties_type
 CONFTEST: vm_insert_pfn_prot
 CONFTEST: vmf_insert_pfn_prot
 CONFTEST: address_space_init_once
 CONFTEST: vm_ops_fault_removed_vma_arg
 CONFTEST: vmbus_channel_has_ringbuffer_page
 CONFTEST: device_driver_of_match_table
 CONFTEST: device_of_node
 CONFTEST: node_states_n_memory
 CONFTEST: kmem_cache_has_kobj_remove_work
 CONFTEST: sysfs_slab_unlink
 CONFTEST: proc_ops
 CONFTEST: timespec64
 CONFTEST: vmalloc_has_pgprot_t_arg
 CONFTEST: acpi_fadt_low_power_s0
 CONFTEST: mm_has_mmap_lock
 CONFTEST: pci_channel_state
 CONFTEST: num_registered_fb
 CONFTEST: vm_area_struct_has_const_vm_flags
 CONFTEST: mdev_parent
 CONFTEST: vfio_info_add_capability_has_cap_type_id_arg
 CONFTEST: vfio_device_gfx_plane_info
 CONFTEST: vfio_device_migration_info
 CONFTEST: vm_fault_t
 CONFTEST: vfio_device_migration_has_start_pfn
 CONFTEST: mdev_parent_ops_has_open_device
 CONFTEST: dom0_kernel_present
 CONFTEST: nvidia_vgpu_kvm_build
 CONFTEST: nvidia_grid_build
 CONFTEST: nvidia_grid_csp_build
 CONFTEST: get_user_pages
 CONFTEST: get_user_pages_remote
 CONFTEST: pm_runtime_available
 CONFTEST: pci_class_multimedia_hd_audio
 CONFTEST: drm_available
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pci.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-acpi.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-cray.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-dma.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-i2c.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-mmap.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-p2p.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pat.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-procfs.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-procfs-utils.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-usermap.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vm.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vtophys.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-interface.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-mlock.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-pci.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-registry.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-usermap.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-modeset-interface.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pci-table.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-kthread-q.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-memdbg.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-ibmnpu.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-report-err.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-rsync.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-msi.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-caps.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-frontend.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv_uvm_interface.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vgpu-vfio-interface.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nvlink_linux.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nvlink_caps.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/linux_nvswitch.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/procfs_nvswitch.o
/var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-dma.c:963: warning: "IMPORT_SGT_STUBS_NEEDED" redefined
  963 | #define IMPORT_SGT_STUBS_NEEDED 0
      |
/var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-dma.c:957: note: this is the location of the previous definition
  957 | #define IMPORT_SGT_STUBS_NEEDED 1
      |
/var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-mmap.c: In function 'nv_encode_caching':
/var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-mmap.c:348:16: warning: this statement may fall through [-Wimplicit-fallthrough=]
  348 |             if (NV_ALLOW_CACHING(memory_type))
      |                ^
/var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-mmap.c:351:9: note: here
  351 |         default:
      |         ^~~~~~~
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/i2c_nvswitch.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/nvidia-vgpu-vfio.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/vgpu-devices.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/nv-pci-table.o
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/vgpu-devices.c: In function 'nv_vfio_vgpu_get_attach_device':
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/vgpu-devices.c:739:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=]
  739 | }
      | ^
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/vgpu-devices.c: In function 'nv_vgpu_dev_ioctl':
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/vgpu-devices.c:356:1: warning: the frame size of 1120 bytes is larger than 1024 bytes [-Wframe-larger-than=]
  356 | }
      | ^
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/nvidia-vgpu-vfio.c: In function 'nv_vgpu_vfio_open':
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/nvidia-vgpu-vfio.c:2070:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 2070 | }
      | ^
ld -r -o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-interface.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pci.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-acpi.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-cray.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-dma.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-i2c.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-mmap.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-p2p.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pat.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-procfs.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-procfs-utils.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-usermap.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vm.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vtophys.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-interface.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-mlock.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-pci.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-registry.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-usermap.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-modeset-interface.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pci-table.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-kthread-q.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-memdbg.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-ibmnpu.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-report-err.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-rsync.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-msi.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-caps.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-frontend.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv_uvm_interface.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vgpu-vfio-interface.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nvlink_linux.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nvlink_caps.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/linux_nvswitch.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/procfs_nvswitch.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/i2c_nvswitch.o
  LD [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia.o
  LD [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio.o
  MODPOST /var/lib/dkms/nvidia/470.223.02/build/Module.symvers
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'rcu_read_unlock_strict'
make[2]: *** [scripts/Makefile.modpost:133: /var/lib/dkms/nvidia/470.223.02/build/Module.symvers] Error 1
make[2]: *** Deleting file '/var/lib/dkms/nvidia/470.223.02/build/Module.symvers'
make[1]: *** [Makefile:1830: modules] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.15.158-2-pve'
make: *** [Makefile:80: modules] Error 2

image_2024-09-03_095532369.png

image_2024-09-03_095555349.png

Code:
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Tue Sep  3 09:36:07 2024
installer version: 470.223.02

PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

nvidia-installer command line:
    ./nvidia-installer
    -dkms

Using: nvidia-installer ncurses v6 user interface
-> Detected 56 CPUs online; setting concurrency level to 32.
-> Installing NVIDIA driver version 470.223.02.
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later. (Answer: Yes)
-> Searching for conflicting files:
-> done.
-> Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (470.223.02):
   executing: '/usr/sbin/ldconfig'...
   executing: '/usr/bin/systemctl daemon-reload'...
-> done.
-> Driver file installation is complete.
-> Installing DKMS kernel module:
ERROR: Failed to run `/usr/sbin/dkms build -m nvidia -v 470.223.02 -k 5.15.158-2-pve`:
Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area...
'make' -j32 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.15.158-2-pve IGNORE_CC_MISMATCH='' modules.....(bad exit status: 2)
Error! Bad return status for module build on kernel: 5.15.158-2-pve (x86_64)
Consult /var/lib/dkms/nvidia/470.223.02/build/make.log for more information.
-> error.
ERROR: Failed to install the kernel module through DKMS. No kernel module was installed; please try installing again without DKMS, or check the DKMS logs for more information.

Any help would be appreciated.

Thank you.
 
I was able to fix the issue with installing a earlier version of the pve-header kernel 5.13.19-6-pve and pinning that in grub to make it the main kernel. This allowed me to install through DKMS and and things are working swimmingly now.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!