Check support Nvidia L40 with Proxmox VE 8.2

watcharapong

New Member
Jun 28, 2024
1
0
1
we need use Proxmox VE 8.2 with Nvidia AI Enterprise License.
The Proxmox software can support Nvidia L40 for vGPU with passthrough or vGPU?
 
I'm not sure if it will be the same with L40, but I managed to make it work with A40. I tried to follow the instructions, which unfortunately worked only to pass through physical GPU e.g. entire NVIDIA A40 to a single VM. But, in order to make vGPU work and use profiles, such as NVIDIA A40-1Q...2Q...12Q etc., I had to switch to kernel 6.2.16-20-pve.
 
I'm not sure if it will be the same with L40, but I managed to make it work with A40. I tried to follow the instructions, which unfortunately worked only to pass through physical GPU e.g. entire NVIDIA A40 to a single VM. But, in order to make vGPU work and use profiles, such as NVIDIA A40-1Q...2Q...12Q etc., I had to switch to kernel 6.2.16-20-pve.
I'm trying to install the Nvidia Linux KVM driver version 550.90.05 on a fresh installation of Proxmox 8.2.2. However, when I install the driver and run the command nvidia-smi, I get the following error:

>> NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

When I check the NVIDIA daemon, it appears to be dead, and I see the following messages:

>> nvidia-vgpud[1130]: error: failed to allocate client: 59
>> nvidia-vgpud[1130]: error: failed to read pGPU information: 9
>> nvidia-vgpud[1130]: error: failed to send vGPU configuration info to RM: 9
>> systemd[1]: nvidia-vgpud.service: Deactivated successfully.
>> systemd[1]: Finished nvidia-vgpud.service - NVIDIA vGPU Daemon.

Do you have any idea what might be causing this issue? My goal is to be able to use vGPU profiles.

My kernel version is 6.8.12-1-pve
 
Last edited:
I never managed to make it work with latest kernel, so if you are trying to do that, I don't think it's going to work. Apparently, next version of nvidia driver (coming in October) should be compatible with latest kernel, but let's wait and see.
If you don't want to wait, you can try the following, which worked for me... (hope I didn't miss anything important)

Uninstall nvidia driver and remove existing proxmox headers

Install proxmox-kernel-6.5
Install dkms libc6-dev proxmox-headers-6.5

Use proxmox-boot-tool to pin kernel 6.5

Reboot

Edit /etc/default/grub variable GRUB_CMDLINE_LINUX_DEFAULT add "intel_iommu=on"

update-grub

Edit /etc/modules to include following modules:
vfio
vfio_iommu_type1
vfio_pci

update-initramfs -u -k all

Reboot

To check:
lsmod | grep vfio
response should include the four modules from above

dmesg | grep -e DMAR -e IOMMU -e AMD-Vi
response should display that IOMMU, Directed I/O or Interrupt Remapping is enabled

echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf

Reboot

Install nvidia host driver

Disable ECC (unless you really need it)
nvidia-smi -e 0

Reboot

Finally, enable SR-IOV (after host reboot)

/usr/lib/nvidia/sriov-manage -e ALL -r 1
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!