Hey everyone,
First time here and I'm diving way deeper than I have any idea what I'm doing, and I need help. I'll try to give as much information as I can, I'm sorry if I'm missing something that would help here.
I'm trying to set up a hypervisor with proxmox on my 1U server Supermicro 1028GR-TR with two Tesla K80 GPUs. I'll try to give as much information as I can and my step by step progress that I have had and where I'm stuck.
I Started with a new 7.4 instal of Proxmox (I'm having issues with 8.2 too) on my server and changed my repositories to pve-no-sub.
I have updated and upgraded the dependancies that I am thinking i need:
apt -y install python3 python3-pip git build-essential pve-headers dkms jq
pip3 install frida
git clone https://github.com/DualCoder/vgpu_unlock
wget http://ftp.br.debian.org/debian/pool/main/m/mdevctl/mdevctl_0.81-1_all.deb
chmod -R +x vgpu_unlock
dpkg -i mdevctl_0.81-1_all.deb
I have also made sure that IOMMU is enabled while blacklisting NVidia drivers, I have also made sure to load the VFIO modules at boot as well.
I am now in the process of trying to compile the kernel for my NVidia drivers. I am using NVIDIA-Linux-x86_64-470.223.02-vgpu-kvm as when I go into the 5** drivers I no longer have compatibility with the Tesla K80 GPUs.
After making the .run file exacutable, and trying to compile with --dkms and I get stuck at 5% and am running into these errors.
Any help would be appreciated.
Thank you.
First time here and I'm diving way deeper than I have any idea what I'm doing, and I need help. I'll try to give as much information as I can, I'm sorry if I'm missing something that would help here.
I'm trying to set up a hypervisor with proxmox on my 1U server Supermicro 1028GR-TR with two Tesla K80 GPUs. I'll try to give as much information as I can and my step by step progress that I have had and where I'm stuck.
I Started with a new 7.4 instal of Proxmox (I'm having issues with 8.2 too) on my server and changed my repositories to pve-no-sub.
I have updated and upgraded the dependancies that I am thinking i need:
apt -y install python3 python3-pip git build-essential pve-headers dkms jq
pip3 install frida
git clone https://github.com/DualCoder/vgpu_unlock
wget http://ftp.br.debian.org/debian/pool/main/m/mdevctl/mdevctl_0.81-1_all.deb
chmod -R +x vgpu_unlock
dpkg -i mdevctl_0.81-1_all.deb
I have also made sure that IOMMU is enabled while blacklisting NVidia drivers, I have also made sure to load the VFIO modules at boot as well.
I am now in the process of trying to compile the kernel for my NVidia drivers. I am using NVIDIA-Linux-x86_64-470.223.02-vgpu-kvm as when I go into the 5** drivers I no longer have compatibility with the Tesla K80 GPUs.
After making the .run file exacutable, and trying to compile with --dkms and I get stuck at 5% and am running into these errors.
Code:
DKMS make.log for nvidia-470.223.02 for kernel 5.15.158-2-pve (x86_64)
Tue Sep 3 09:36:15 EDT 2024
make[1]: Entering directory '/usr/src/linux-headers-5.15.158-2-pve'
warning: the compiler differs from the one used to build the kernel
The kernel was built by: gcc (Debian 10.2.1-6) 10.2.1 20210110
You are using: cc (Debian 10.2.1-6) 10.2.1 20210110
SYMLINK /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-kernel.o
CONFTEST: hash__remap_4k_pfn
CONFTEST: set_pages_uc
CONFTEST: list_is_first
CONFTEST: set_memory_uc
CONFTEST: set_memory_array_uc
CONFTEST: set_pages_array_uc
CONFTEST: acquire_console_sem
CONFTEST: console_lock
CONFTEST: ioremap_cache
CONFTEST: ioremap_wc
CONFTEST: acpi_walk_namespace
CONFTEST: sg_alloc_table
CONFTEST: pci_get_domain_bus_and_slot
CONFTEST: get_num_physpages
CONFTEST: efi_enabled
CONFTEST: pde_data
CONFTEST: PDE_DATA
CONFTEST: proc_remove
CONFTEST: pm_vt_switch_required
CONFTEST: xen_ioemu_inject_msi
CONFTEST: phys_to_dma
CONFTEST: get_dma_ops
CONFTEST: dma_attr_macros
CONFTEST: dma_map_page_attrs
CONFTEST: write_cr4
CONFTEST: of_get_property
CONFTEST: of_find_node_by_phandle
CONFTEST: of_node_to_nid
CONFTEST: pnv_pci_get_npu_dev
CONFTEST: of_get_ibm_chip_id
CONFTEST: node_end_pfn
CONFTEST: pci_bus_address
CONFTEST: pci_stop_and_remove_bus_device
CONFTEST: pci_remove_bus_device
CONFTEST: register_cpu_notifier
CONFTEST: cpuhp_setup_state
CONFTEST: dma_map_resource
CONFTEST: backlight_device_register
CONFTEST: get_backlight_device_by_name
CONFTEST: timer_setup
CONFTEST: pci_enable_msix_range
CONFTEST: kernel_read_has_pointer_pos_arg
CONFTEST: kernel_write
CONFTEST: kthread_create_on_node
CONFTEST: of_find_matching_node
CONFTEST: dev_is_pci
CONFTEST: dma_direct_map_resource
CONFTEST: tegra_get_platform
CONFTEST: tegra_bpmp_send_receive
CONFTEST: flush_cache_all
CONFTEST: vmf_insert_pfn
CONFTEST: jiffies_to_timespec
CONFTEST: ktime_get_raw_ts64
CONFTEST: ktime_get_real_ts64
CONFTEST: full_name_hash
CONFTEST: hlist_for_each_entry
CONFTEST: pci_enable_atomic_ops_to_root
CONFTEST: vga_tryget
CONFTEST: pgprot_decrypted
CONFTEST: cc_mkdec
CONFTEST: iterate_fd
CONFTEST: seq_read_iter
CONFTEST: sg_page_iter_page
CONFTEST: unsafe_follow_pfn
CONFTEST: drm_gem_object_get
CONFTEST: drm_gem_object_put_unlocked
CONFTEST: set_close_on_exec
CONFTEST: dma_set_coherent_mask
CONFTEST: acpi_bus_get_device
CONFTEST: get_task_ioprio
CONFTEST: vfio_register_notifier
CONFTEST: mdev_parent_dev
CONFTEST: mdev_dev
CONFTEST: mdev_get_type_group_id
CONFTEST: mdev_uuid
CONFTEST: mdev_from_dev
CONFTEST: mdev_set_iommu_device
CONFTEST: pci_irq_vector_helpers
CONFTEST: kvmalloc
CONFTEST: is_export_symbol_gpl_of_node_to_nid
CONFTEST: is_export_symbol_gpl_sme_active
CONFTEST: is_export_symbol_present_swiotlb_map_sg_attrs
CONFTEST: is_export_symbol_present_swiotlb_dma_ops
CONFTEST: is_export_symbol_present___close_fd
CONFTEST: is_export_symbol_present_close_fd
CONFTEST: is_export_symbol_present_get_unused_fd
CONFTEST: is_export_symbol_present_get_unused_fd_flags
CONFTEST: is_export_symbol_present_nvhost_get_default_device
CONFTEST: is_export_symbol_present_nvhost_syncpt_unit_interface_get_byte_offset
CONFTEST: is_export_symbol_present_nvhost_syncpt_unit_interface_get_aperture
CONFTEST: is_export_symbol_present_tegra_dce_register_ipc_client
CONFTEST: is_export_symbol_present_tegra_dce_unregister_ipc_client
CONFTEST: is_export_symbol_present_tegra_dce_client_ipc_send_recv
CONFTEST: is_export_symbol_present_dram_clk_to_mc_clk
CONFTEST: is_export_symbol_present_get_dram_num_channels
CONFTEST: is_export_symbol_present_tegra_dram_types
CONFTEST: is_export_symbol_present_screen_info
CONFTEST: acpi_op_remove
CONFTEST: file_operations
CONFTEST: file_inode
CONFTEST: kuid_t
CONFTEST: dma_ops
CONFTEST: swiotlb_dma_ops
CONFTEST: noncoherent_swiotlb_dma_ops
CONFTEST: vm_fault_has_address
CONFTEST: backlight_properties_type
CONFTEST: vm_insert_pfn_prot
CONFTEST: vmf_insert_pfn_prot
CONFTEST: address_space_init_once
CONFTEST: vm_ops_fault_removed_vma_arg
CONFTEST: vmbus_channel_has_ringbuffer_page
CONFTEST: device_driver_of_match_table
CONFTEST: device_of_node
CONFTEST: node_states_n_memory
CONFTEST: kmem_cache_has_kobj_remove_work
CONFTEST: sysfs_slab_unlink
CONFTEST: proc_ops
CONFTEST: timespec64
CONFTEST: vmalloc_has_pgprot_t_arg
CONFTEST: acpi_fadt_low_power_s0
CONFTEST: mm_has_mmap_lock
CONFTEST: pci_channel_state
CONFTEST: num_registered_fb
CONFTEST: vm_area_struct_has_const_vm_flags
CONFTEST: mdev_parent
CONFTEST: vfio_info_add_capability_has_cap_type_id_arg
CONFTEST: vfio_device_gfx_plane_info
CONFTEST: vfio_device_migration_info
CONFTEST: vm_fault_t
CONFTEST: vfio_device_migration_has_start_pfn
CONFTEST: mdev_parent_ops_has_open_device
CONFTEST: dom0_kernel_present
CONFTEST: nvidia_vgpu_kvm_build
CONFTEST: nvidia_grid_build
CONFTEST: nvidia_grid_csp_build
CONFTEST: get_user_pages
CONFTEST: get_user_pages_remote
CONFTEST: pm_runtime_available
CONFTEST: pci_class_multimedia_hd_audio
CONFTEST: drm_available
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pci.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-acpi.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-cray.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-dma.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-i2c.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-mmap.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-p2p.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pat.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-procfs.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-procfs-utils.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-usermap.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vm.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vtophys.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-interface.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-mlock.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-pci.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-registry.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-usermap.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-modeset-interface.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pci-table.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-kthread-q.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-memdbg.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-ibmnpu.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-report-err.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-rsync.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-msi.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-caps.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-frontend.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv_uvm_interface.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vgpu-vfio-interface.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nvlink_linux.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/nvlink_caps.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/linux_nvswitch.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/procfs_nvswitch.o
/var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-dma.c:963: warning: "IMPORT_SGT_STUBS_NEEDED" redefined
963 | #define IMPORT_SGT_STUBS_NEEDED 0
|
/var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-dma.c:957: note: this is the location of the previous definition
957 | #define IMPORT_SGT_STUBS_NEEDED 1
|
/var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-mmap.c: In function 'nv_encode_caching':
/var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-mmap.c:348:16: warning: this statement may fall through [-Wimplicit-fallthrough=]
348 | if (NV_ALLOW_CACHING(memory_type))
| ^
/var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-mmap.c:351:9: note: here
351 | default:
| ^~~~~~~
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia/i2c_nvswitch.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/nvidia-vgpu-vfio.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/vgpu-devices.o
CC [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/nv-pci-table.o
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/vgpu-devices.c: In function 'nv_vfio_vgpu_get_attach_device':
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/vgpu-devices.c:739:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=]
739 | }
| ^
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/vgpu-devices.c: In function 'nv_vgpu_dev_ioctl':
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/vgpu-devices.c:356:1: warning: the frame size of 1120 bytes is larger than 1024 bytes [-Wframe-larger-than=]
356 | }
| ^
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/nvidia-vgpu-vfio.c: In function 'nv_vgpu_vfio_open':
/var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio/nvidia-vgpu-vfio.c:2070:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
2070 | }
| ^
ld -r -o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-interface.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pci.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-acpi.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-cray.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-dma.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-i2c.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-mmap.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-p2p.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pat.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-procfs.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-procfs-utils.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-usermap.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vm.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vtophys.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-interface.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-mlock.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-pci.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-registry.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/os-usermap.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-modeset-interface.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-pci-table.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-kthread-q.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-memdbg.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-ibmnpu.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-report-err.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-rsync.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-msi.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-caps.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-frontend.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv_uvm_interface.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nv-vgpu-vfio-interface.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nvlink_linux.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/nvlink_caps.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/linux_nvswitch.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/procfs_nvswitch.o /var/lib/dkms/nvidia/470.223.02/build/nvidia/i2c_nvswitch.o
LD [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia.o
LD [M] /var/lib/dkms/nvidia/470.223.02/build/nvidia-vgpu-vfio.o
MODPOST /var/lib/dkms/nvidia/470.223.02/build/Module.symvers
ERROR: modpost: GPL-incompatible module nvidia.ko uses GPL-only symbol 'rcu_read_unlock_strict'
make[2]: *** [scripts/Makefile.modpost:133: /var/lib/dkms/nvidia/470.223.02/build/Module.symvers] Error 1
make[2]: *** Deleting file '/var/lib/dkms/nvidia/470.223.02/build/Module.symvers'
make[1]: *** [Makefile:1830: modules] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-5.15.158-2-pve'
make: *** [Makefile:80: modules] Error 2
Code:
nvidia-installer log file '/var/log/nvidia-installer.log'
creation time: Tue Sep 3 09:36:07 2024
installer version: 470.223.02
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
nvidia-installer command line:
./nvidia-installer
-dkms
Using: nvidia-installer ncurses v6 user interface
-> Detected 56 CPUs online; setting concurrency level to 32.
-> Installing NVIDIA driver version 470.223.02.
-> Would you like to register the kernel module sources with DKMS? This will allow DKMS to automatically build a new module, if you install a different kernel later. (Answer: Yes)
-> Searching for conflicting files:
-> done.
-> Installing 'NVIDIA Accelerated Graphics Driver for Linux-x86_64' (470.223.02):
executing: '/usr/sbin/ldconfig'...
executing: '/usr/bin/systemctl daemon-reload'...
-> done.
-> Driver file installation is complete.
-> Installing DKMS kernel module:
ERROR: Failed to run `/usr/sbin/dkms build -m nvidia -v 470.223.02 -k 5.15.158-2-pve`:
Kernel preparation unnecessary for this kernel. Skipping...
Building module:
cleaning build area...
'make' -j32 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.15.158-2-pve IGNORE_CC_MISMATCH='' modules.....(bad exit status: 2)
Error! Bad return status for module build on kernel: 5.15.158-2-pve (x86_64)
Consult /var/lib/dkms/nvidia/470.223.02/build/make.log for more information.
-> error.
ERROR: Failed to install the kernel module through DKMS. No kernel module was installed; please try installing again without DKMS, or check the DKMS logs for more information.
Any help would be appreciated.
Thank you.