Trouble compiling VGPU driver kernel, Proxmox 7.4

Reimern

New Member
Sep 3, 2024
2
0
1
Hey everyone,

First time here and I'm diving way deeper than I have any idea what I'm doing, and I need help. I'll try to give as much information as I can, I'm sorry if I'm missing something that would help here.

I'm trying to set up a hypervisor with proxmox on my 1U server Supermicro 1028GR-TR with two Tesla K80 GPUs. I'll try to give as much information as I can and my step by step progress that I have had and where I'm stuck.

I Started with a new 7.4 instal of Proxmox (I'm having issues with 8.2 too) on my server and changed my repositories to pve-no-sub.
I have updated and upgraded the dependancies that I am thinking i need:

apt -y install python3 python3-pip git build-essential pve-headers dkms jq
pip3 install frida
git clone https://github.com/DualCoder/vgpu_unlock
wget http://ftp.br.debian.org/debian/pool/main/m/mdevctl/mdevctl_0.81-1_all.deb
chmod -R +x vgpu_unlock
dpkg -i mdevctl_0.81-1_all.deb

I have also made sure that IOMMU is enabled while blacklisting NVidia drivers, I have also made sure to load the VFIO modules at boot as well.

I am now in the process of trying to compile the kernel for my NVidia drivers. I am using NVIDIA-Linux-x86_64-470.223.02-vgpu-kvm as when I go into the 5** drivers I no longer have compatibility with the Tesla K80 GPUs.

After making the .run file exacutable, and trying to compile with --dkms and I get stuck at 5% and am running into these errors.

image_2024-09-03_102925281.png

image_2024-09-03_095003447.png

Code:
DKMS make.log for nvidia-470.223.02 for kernel 5.15.158-2-pve (x86_64)
Tue Sep  3 09:36:15 EDT 2024
make[1]: Entering directory '/usr/src/linux-headers-5.15.158-2-pve'
warning: the compiler differs from the one used to build the kernel
  The kernel was built by: gcc (Debian 10.2.1-6) 10.2.1 20210110
  You are using:           cc (Debian 10.2.1-6) 10.2.1 20210110
  SYMLINK /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-kernel.o
 CONFTEST: hash__remap_4k_pfn
 CONFTEST: set_pages_uc
 CONFTEST: list_is_first
 CONFTEST: set_memory_uc
 CONFTEST: set_memory_array_uc
 CONFTEST: set_pages_array_uc
 CONFTEST: acquire_console_sem
 CONFTEST: console_lock
 CONFTEST: ioremap_cache
 CONFTEST: ioremap_wc
 CONFTEST: acpi_walk_namespace
 CONFTEST: sg_alloc_table
 CONFTEST: pci_get_domain_bus_and_slot
 CONFTEST: get_num_physpages
 CONFTEST: efi_enabled
 CONFTEST: pde_data
 CONFTEST: PDE_DATA
 CONFTEST: proc_remove
 CONFTEST: pm_vt_switch_required
 CONFTEST: xen_ioemu_inject_msi
 CONFTEST: phys_to_dma
 CONFTEST: get_dma_ops
 CONFTEST: dma_attr_macros
 CONFTEST: dma_map_page_attrs
 CONFTEST: write_cr4
 CONFTEST: of_get_property
 CONFTEST: of_find_node_by_phandle
 CONFTEST: of_node_to_nid
 CONFTEST: pnv_pci_get_npu_dev
 CONFTEST: of_get_ibm_chip_id
 CONFTEST: node_end_pfn
 CONFTEST: pci_bus_address
 CONFTEST: pci_stop_and_remove_bus_device
 CONFTEST: pci_remove_bus_device
 CONFTEST: register_cpu_notifier
 CONFTEST: cpuhp_setup_state
 CONFTEST: dma_map_resource
 CONFTEST: backlight_device_register
 CONFTEST: get_backlight_device_by_name
 CONFTEST: timer_setup
 CONFTEST: pci_enable_msix_range
 CONFTEST: kernel_read_has_pointer_pos_arg
 CONFTEST: kernel_write
 CONFTEST: kthread_create_on_node
 CONFTEST: of_find_matching_node
 CONFTEST: dev_is_pci
 CONFTEST: dma_direct_map_resource
 CONFTEST: tegra_get_platform
 CONFTEST: tegra_bpmp_send_receive
 CONFTEST: flush_cache_all
 CONFTEST: vmf_insert_pfn
 CONFTEST: jiffies_to_timespec
 CONFTEST: ktime_get_raw_ts64
 CONFTEST: ktime_get_real_ts64
 CONFTEST: full_name_hash
 CONFTEST: hlist_for_each_entry
 CONFTEST: pci_enable_atomic_ops_to_root
 CONFTEST: vga_tryget
 CONFTEST: pgprot_decrypted
 CONFTEST: cc_mkdec
 CONFTEST: iterate_fd
 CONFTEST: seq_read_iter
 CONFTEST: sg_page_iter_page
 CONFTEST: unsafe_follow_pfn
 CONFTEST: drm_gem_object_get
 CONFTEST: drm_gem_object_put_unlocked
 CONFTEST: set_close_on_exec
 CONFTEST: dma_set_coherent_mask
 CONFTEST: acpi_bus_get_device
 CONFTEST: get_task_ioprio
 CONFTEST: vfio_register_notifier
 CONFTEST: mdev_parent_dev
 CONFTEST: mdev_dev
 CONFTEST: mdev_get_type_group_id
 CONFTEST: mdev_uuid
 CONFTEST: mdev_from_dev
 CONFTEST: mdev_set_iommu_device
 CONFTEST: pci_irq_vector_helpers
 CONFTEST: kvmalloc
 CONFTEST: is_export_symbol_gpl_of_node_to_nid
 CONFTEST: is_export_symbol_gpl_sme_active
 CONFTEST: is_export_symbol_present_swiotlb_map_sg_attrs
 CONFTEST: is_export_symbol_present_swiotlb_dma_ops
 CONFTEST: is_export_symbol_present___close_fd
 CONFTEST: is_export_symbol_present_close_fd
 CONFTEST: is_export_symbol_present_get_unused_fd
 CONFTEST: is_export_symbol_present_get_unused_fd_flags
 CONFTEST: is_export_symbol_present_nvhost_get_default_device
 CONFTEST: is_export_symbol_present_nvhost_syncpt_unit_interface_get_byte_offset
 CONFTEST: is_export_symbol_present_nvhost_syncpt_unit_interface_get_aperture
 CONFTEST: is_export_symbol_present_tegra_dce_register_ipc_client
 CONFTEST: is_export_symbol_present_tegra_dce_unregister_ipc_client
 CONFTEST: is_export_symbol_present_tegra_dce_client_ipc_send_recv
 CONFTEST: is_export_symbol_present_dram_clk_to_mc_clk
 CONFTEST: is_export_symbol_present_get_dram_num_channels
 CONFTEST: is_export_symbol_present_tegra_dram_types
 CONFTEST: is_export_symbol_present_screen_info
 CONFTEST: acpi_op_remove
 CONFTEST: file_operations
 CONFTEST: file_inode
 CONFTEST: kuid_t
 CONFTEST: dma_ops
 CONFTEST: swiotlb_dma_ops
 CONFTEST: noncoherent_swiotlb_dma_ops
 CONFTEST: vm_fault_has_address
 CONFTEST: backlight_properties_type
 CONFTEST: vm_insert_pfn_prot
 CONFTEST: vmf_insert_pfn_prot
 CONFTEST: address_space_init_once
 CONFTEST: vm_ops_fault_removed_vma_arg
 CONFTEST: vmbus_channel_has_ringbuffer_page
 CONFTEST: device_driver_of_match_table
 CONFTEST: device_of_node
 CONFTEST: node_states_n_memory
 CONFTEST: kmem_cache_has_kobj_remove_work
 CONFTEST: sysfs_slab_unlink
 CONFTEST: proc_ops
 CONFTEST: timespec64
 CONFTEST: vmalloc_has_pgprot_t_arg
 CONFTEST: acpi_fadt_low_power_s0
 CONFTEST: mm_has_mmap_lock
 CONFTEST: pci_channel_state
 CONFTEST: num_registered_fb
 CONFTEST: vm_area_struct_has_const_vm_flags
 CONFTEST: mdev_parent
 CONFTEST: vfio_info_add_capability_has_cap_type_id_arg
 CONFTEST: vfio_device_gfx_plane_info
 CONFTEST: vfio_device_migration_info
 CONFTEST: vm_fault_t
 CONFTEST: vfio_device_migration_has_start_pfn
 CONFTEST: mdev_parent_ops_has_open_device
 CONFTEST: dom0_kernel_present
 CONFTEST: nvidia_vgpu_kvm_build
 CONFTEST: nvidia_grid_build
 CONFTEST: nvidia_grid_csp_build
 CONFTEST: get_user_pages
 CONFTEST: get_user_pages_remote
 CONFTEST: pm_runtime_available
 CONFTEST: pci_class_multimedia_hd_audio
 CONFTEST: drm_available
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pci.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-acpi.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-cray.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-dma.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-i2c.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-mmap.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-p2p.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pat.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-procfs.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-procfs-utils.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-usermap.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vm.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vtophys.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-interface.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-mlock.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-pci.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-registry.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-usermap.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-modeset-interface.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pci-table.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-kthread-q.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-memdbg.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-ibmnpu.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-report-err.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-rsync.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-msi.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-caps.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-frontend.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv_uvm_interface.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vgpu-vfio-interface.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nvlink_linux.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/nvlink_caps.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/linux_nvswitch.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/procfs_nvswitch.o
/var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-dma.c:963: warning: "IMPORT_SGT_STUBS_NEEDED" redefined
  963 | #define IMPORT_SGT_STUBS_NEEDED 0
      |
/var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-dma.c:957: note: this is the location of the previous definition
  957 | #define IMPORT_SGT_STUBS_NEEDED 1
      |
/var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-mmap.c: In function 'nv_encode_caching':
/var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-mmap.c:348:16: warning: this statement may fall through [-Wimplicit-fallthrough=]
  348 |             if (NV_ALLOW_CACHING(memory_type))
      |                ^
/var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-mmap.c:351:9: note: here
  351 |         default:
      |         ^~~~~~~
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia/i2c_nvswitch.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/nvidia-vgpu-vfio.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/vgpu-devices.o
  CC [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/nv-pci-table.o
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/vgpu-devices.c: In function 'nv_vfio_vgpu_get_attach_device':
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/vgpu-devices.c:739:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=]
  739 | }
      | ^
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/vgpu-devices.c: In function 'nv_vgpu_dev_ioctl':
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/vgpu-devices.c:356:1: warning: the frame size of 1120 bytes is larger than 1024 bytes [-Wframe-larger-than=]
  356 | }
      | ^
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/nvidia-vgpu-vfio.c: In function 'nv_vgpu_vfio_open':
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/nvidia-vgpu-vfio.c:2070:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
 2070 | }
      | ^
ld -r -o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-interface.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pci.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-acpi.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-cray.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-dma.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-i2c.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-mmap.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-p2p.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pat.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-procfs.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-procfs-utils.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-usermap.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vm.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vtophys.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-interface.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-mlock.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-pci.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-registry.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-usermap.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-modeset-interface.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pci-table.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-kthread-q.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-memdbg.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-ibmnpu.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-report-err.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-rsync.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-msi.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-caps.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-frontend.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv_uvm_interface.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vgpu-vfio-interface.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nvlink_linux.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nvlink_caps.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/linux_nvswitch.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/procfs_nvswitch.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/i2c_nvswitch.o
  LD [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia.o
  LD [M]  /var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio.o
  MODPOST /var/lib/dkms/nvidia/470.223.02/build/Module.symvers
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'rcu_read_unlock_strict'
make[2]: *** [scripts/Makefile.modpost:133: /var/lib/dkms/nvidia/470.223.02/build/Module.symvers] Error 1
make[2]: *** Deleting file '/var/lib/dkms/nvidia/470.223.02/build/Module.symvers'
make[1]: *** [Makefile:1830: modules] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.15.158-2-pve'
make: *** [Makefile:80: modules] Error 2

image_2024-09-03_095532369.png

image_2024-09-03_095555349.png

Code:
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Tue Sep  3 09:36:07 2024
installer version: 470.223.02

PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

nvidia-installer command line:
    ./nvidia-installer
    -dkms

Using: nvidia-installer ncurses v6 user interface
-> Detected 56 CPUs online; setting concurrency level to 32.
-> Installing NVIDIA driver version 470.223.02.
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later. (Answer: Yes)
-> Searching for conflicting files:
-> done.
-> Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (470.223.02):
   executing: '/usr/sbin/ldconfig'...
   executing: '/usr/bin/systemctl daemon-reload'...
-> done.
-> Driver file installation is complete.
-> Installing DKMS kernel module:
ERROR: Failed to run `/usr/sbin/dkms build -m nvidia -v 470.223.02 -k 5.15.158-2-pve`:
Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area...
'make' -j32 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.15.158-2-pve IGNORE_CC_MISMATCH='' modules.....(bad exit status: 2)
Error! Bad return status for module build on kernel: 5.15.158-2-pve (x86_64)
Consult /var/lib/dkms/nvidia/470.223.02/build/make.log for more information.
-> error.
ERROR: Failed to install the kernel module through DKMS. No kernel module was installed; please try installing again without DKMS, or check the DKMS logs for more information.

Any help would be appreciated.

Thank you.
 
I was able to fix the issue with installing a earlier version of the pve-header kernel 5.13.19-6-pve and pinning that in grub to make it the main kernel. This allowed me to install through DKMS and and things are working swimmingly now.