NVIDIA vGPU + Grid on Proxmox 8

petrij98

New Member
Feb 18, 2024
3
0
1
Hi all! I've begun the unimaginably painful journey of attempting to implement NVIDIA grid on my Proxmox 8 homelab, so I can split my Tesla P40's 24GB of Memory among 3 of my Virtual Machines (8GB per VM). However, after following the guidance of CraftComputing and the official Proxmox vGPU documentation, I have run into a problem with installing the NVIDIA drivers onto my hardware. I have downloaded version 16.5 of NVIDIA's vGPU driver package (535.161.05 for Linux KVM), brought it over to Proxmox, and attempted to run it as root after making it executable. However, I am greeted with 'ERROR: An error occurred while performing the step: "Building kernel modules" See /var/log/nvidia-installer.log for details' after the initial progress bar reaches 97%. I have done everything from fresh reboots to retracing my hardware and software prerequisites and am completely stumped on how to move forward. I'm hoping this is an easy fix, so I can get back up and running. However, I haven't easily seen any troubleshooting tips or guides that call out this issue. Does anyone have any pointers? I attached my nvidia-installer.log for context.
 

Attachments

  • nvidia-installer.log
    57.1 KB · Views: 9
535.161.05 won't compile on 6.8 kernel, try 550.90.05
I confirm that on Proxmox 8.2.4 with Kernel 6.8.8.2-pve the 550.90.05 compiles successfully and nvidia-smi reports the card working but

mdevctl types output is empty even though its A40 GPU - vGPU Capable
 
Last edited:
did you copy the vgpuConfig.xml from 535 to /usr/share/nvidia/gpu ? have you rebooted after that?
 
did you copy the vgpuConfig.xml from 535 to /usr/share/nvidia/gpu ? have you rebooted after that?
oh yes I did all of that... no luck.

but i have an AMD EPYC Server so iommu is enabled by default but i have still put amd_iommu=pt

but after enabling the VFs i get write errors
if i check iommu

dmesg | grep -e DMAR -e IOMMU
[ 3.467959] pci 0000:c0:00.2: AMD-Vi: IOMMU performance counters supported
[ 3.478973] pci 0000:80:00.2: AMD-Vi: IOMMU performance counters supported
[ 3.489280] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 3.499356] pci 0000:40:00.2: AMD-Vi: IOMMU performance counters supported
[ 3.510055] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[ 3.510070] perf/amd_iommu: Detected AMD IOMMU #1 (2 banks, 4 counters/bank).
[ 3.510084] perf/amd_iommu: Detected AMD IOMMU #2 (2 banks, 4 counters/bank).
[ 3.510098] perf/amd_iommu: Detected AMD IOMMU #3 (2 banks, 4 counters/bank).
[ 212.079616] NVRM: Aborting probe for VF 0000:81:00.4 since IOMMU is not present on the system.
[ 212.130687] NVRM: Aborting probe for VF 0000:81:00.5 since IOMMU is not present on the system.
[ 212.180933] NVRM: Aborting probe for VF 0000:81:00.6 since IOMMU is not present on the system.
[ 212.231022] NVRM: Aborting probe for VF 0000:81:00.7 since IOMMU is not present on the system.
[ 212.280967] NVRM: Aborting probe for VF 0000:81:01.0 since IOMMU is not present on the system.
[ 212.331030] NVRM: Aborting probe for VF 0000:81:01.1 since IOMMU is not present on the system.
[ 212.381150] NVRM: Aborting probe for VF 0000:81:01.2 since IOMMU is not present on the system.
[ 212.431210] NVRM: Aborting probe for VF 0000:81:01.3 since IOMMU is not present on the system.
[ 212.481321] NVRM: Aborting probe for VF 0000:81:01.4 since IOMMU is not present on the system.
[ 212.531320] NVRM: Aborting probe for VF 0000:81:01.5 since IOMMU is not present on the system.
[ 212.581441] NVRM: Aborting probe for VF 0000:81:01.6 since IOMMU is not present on the system.
[ 212.631506] NVRM: Aborting probe for VF 0000:81:01.7 since IOMMU is not present on the system.
[ 212.681585] NVRM: Aborting probe for VF 0000:81:02.0 since IOMMU is not present on the system.
[ 212.731677] NVRM: Aborting probe for VF 0000:81:02.1 since IOMMU is not present on the system.
[ 212.781746] NVRM: Aborting probe for VF 0000:81:02.2 since IOMMU is not present on the system.
[ 212.831899] NVRM: Aborting probe for VF 0000:81:02.3 since IOMMU is not present on the system.
[ 212.881961] NVRM: Aborting probe for VF 0000:81:02.4 since IOMMU is not present on the system.
[ 212.932134] NVRM: Aborting probe for VF 0000:81:02.5 since IOMMU is not present on the system.
[ 212.982137] NVRM: Aborting probe for VF 0000:81:02.6 since IOMMU is not present on the system.
[ 213.032310] NVRM: Aborting probe for VF 0000:81:02.7 since IOMMU is not present on the system.
[ 213.082465] NVRM: Aborting probe for VF 0000:81:03.0 since IOMMU is not present on the system.
[ 213.132459] NVRM: Aborting probe for VF 0000:81:03.1 since IOMMU is not present on the system.
[ 213.182583] NVRM: Aborting probe for VF 0000:81:03.2 since IOMMU is not present on the system.
[ 213.232648] NVRM: Aborting probe for VF 0000:81:03.3 since IOMMU is not present on the system.
[ 213.282693] NVRM: Aborting probe for VF 0000:81:03.4 since IOMMU is not present on the system.
[ 213.332955] NVRM: Aborting probe for VF 0000:81:03.5 since IOMMU is not present on the system.
[ 213.382973] NVRM: Aborting probe for VF 0000:81:03.6 since IOMMU is not present on the system.
[ 213.433166] NVRM: Aborting probe for VF 0000:81:03.7 since IOMMU is not present on the system.
[ 213.483282] NVRM: Aborting probe for VF 0000:81:04.0 since IOMMU is not present on the system.
[ 213.533369] NVRM: Aborting probe for VF 0000:81:04.1 since IOMMU is not present on the system.
[ 213.583545] NVRM: Aborting probe for VF 0000:81:04.2 since IOMMU is not present on the system.
[ 213.633593] NVRM: Aborting probe for VF 0000:81:04.3 since IOMMU is not present on the system.


planing on going to back to kernel 6.5
 
I'm trying to install vGPU driver on Proxmox 8.2.4 but it always fails with An error occurred while performing the step: "Building kernel modules"
I have tied nvidia driver version 13.10, 16.5 and 17.1 all have the same issue
 

Attachments

  • nvidia-installer.log
    58.7 KB · Views: 5
t
I'm trying to install vGPU driver on Proxmox 8.2.4 but it always fails with An error occurred while performing the step: "Building kernel modules"
I have tied nvidia driver version 13.10, 16.5 and 17.1 all have the same issue
try the last one 17.3 (550.90.05)
 
Hello guys,

hope that someone can help me. I have setup zfs. Added on /etc/kernel/cmdline the intel_iommu=on and the iommu=pt and on /etc/modprobe/modules u added following:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

disabled the nouveau driver


made a update-initramfs and rebooted. I installed the newes nvidia driver 17.3 and downloaded the older driver for my Nvidia Tesla P40 and added the vgpuConfig.xm to /usr/share/nvidia/vgpu and rebooted.

On nvidia-smi i see the graphic card but if i make nvidia-smi vgpu it tells me that there is no graphics card which supports vgpu. I have already set the mode with the displaymodeselector from nvidia with following command: ./displaymodeselector --gpumode and selected the Option 1 which disabled it.

If i run the command nvidia-smi -q i see, that the vgpu mode is on N/A.

./displaymodeselector --gpumode

NVIDIA Display Mode Selector Utility (Version 1.67.0)
Copyright (C) 2015-2021, NVIDIA Corporation. All Rights Reserved.


WARNING: This operation updates the firmware on the board and could make
the device unusable if your host system lacks the necessary support.

Are you sure you want to continue?
Press 'y' to confirm (any other key to abort):
y
Select a number:
<0> physical_display_enabled_256MB_bar1
<1> physical_display_disabled -> selected this
<2> physical_display_enabled_8GB_bar1

Select a number (ESC to quit):

nvidia-smi vgpu
No supported devices in vGPU mode

Can somebody help me:
==============NVSMI LOG==============

Timestamp : Wed Aug 21 08:18:54 2024
Driver Version : 550.90.05
CUDA Version : Not Found

Attached GPUs : 1
GPU 00000000:86:00.0
Product Name : Tesla P40
Product Brand : Tesla
Product Architecture : Pascal
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
Addressing Mode : N/A
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0421118016877
GPU UUID : GPU-cac66e41-766f-6aba-dbd4-e9634800a7d6
Minor Number : 0
VBIOS Version : 86.02.23.00.00
MultiGPU Board : No
Board ID : 0x8600
Board Part Number : 699-2G610-0200-100
GPU Part Number : 1B38-895-A1
FRU Part Number : N/A
Module ID : 1
Inforom Version
Image Version : G610.0200.00.03
OEM Object : 1.1
ECC Object : 4.1
Power Management Object : N/A
Inforom BBX Object Flush
Latest Timestamp : N/A
Latest Duration : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU C2C Mode : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
vGPU Heterogeneous Mode : N/A
GPU Reset Status
Reset Required : No
Drain and Reset Recommended : N/A
GSP Firmware Version : N/A
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!