[TUTORIAL] PVE 8.22 / Kernel 6.8 and NVidia vGPU

Hey! I stumbled on this thread yesterday night :) And indeed, this provided a lot of the answers needed.

I managed to get the vGPU working on Ubuntu, and partially on Win11. It works on Win11, but just one time, then it goes into infinite boot loading... Trying to figure that out is again not trivial, as Windows does not give a clear log of what is going on during boot...

https://forum.proxmox.com/threads/vgpu-with-nvidia-on-kernel-6-8.150840/page-2#post-694982

The issue was that as I am using professional grade GPUs and quite recent ones (L40S), they Vendor Specific VFIO Framework instead of the "normal" mdev framework. That newer framework is a pain, and works quite differently from the previous one.

At least, now it seems to go somewhere, but given that it's not yet working in Win11, I hold my breath :)

Thanks again for all the efforts and input, it really helped me better understand how all this works!
 
I'm experiencing a significant increase in memory usage on the PVE host when using PVE with Nvidia P4 and vGPU. On average, each virtual machine with a 1Q vGPU results in an additional 4GB of memory consumption.

Code:
CPU(s) 32 x Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz (2 Sockets)
Kernel Version Linux 6.8.12-1-pve (2024-08-05T16:17Z)
Boot Mode Legacy BIOS
Manager Version pve-manager/8.2.2/9355359cd7afbae
 
Hello everyone,

till now, I was just a silent reader but finally, I run into some issues. I own a P4, which is recognized and was working with drivers 16.x but didn't get the profile overide up and running. I tried also the 17.x drivers (patched), which also worked. But mdevctl types always shows the P40 profiles, not the P4 profiles. I tried the vgpuConfig.xml from 16.4 and 16.6, still the same issue. I just see the P40 profiles. I strictly followed the instructions in the polloloco guide.

Any ideas, what else I can try to get the right profiles for the P4?

Regards,
Harald
 
I am also trying to get the new 17.3 kvm driver working on the new kernel, also using the Tesla P4

I ran
Code:
./NVIDIA-Linux-x86_64-550.90.05-vgpu-kvm-cust
om.run --dkms -m=kernel --no-drm
Then copied the vgpuConfig.xml over and rebooted, worked perfectly. (Would not work without —no-drm)

mdev types returns the usual profile listing.

but now my issue is my LXCs report an api mismatch and will not use nvidia-smi or access the card on 535 or 550 both, neither work now inside LXCs.

But VMs are working fine with vGPU profiles.

@Shogun1978 seeing the profiles for the P40 just means the unlock patch is activated and your vGPU profiles should function fine, that is what my P4 shows up as too, but everything is perfectly functional, theres nothing wrong with seeing that.

 
actually I'm using quadro m1000m on thinkpad p50 laptop (proxmox with desktop)

on host i can get
1729752645473.png

but my host desktop lags and also which driver I have to install on windows guest vm

also mdevctl types give this

1729752735183.png


why am I getting Available instances: 0

update:

./NVIDIA-Linux-x86_64-550.90.05-vgpu-kvm-custom.run --dkms -m=kernel --no-drm

this fix the issue of lagging and available instances

Now I need to know which driver I need to install on widnows 10 guest vm
 
Last edited:
Hello, maybe it helped someone: https://wvthoog.nl/proxmox-7-vgpu-v2/ and newer Version: https://wvthoog.nl/proxmox-vgpu-v3/
This Guy make it simple with a Script to install what it needed. This thing use Kernel 6.5
Tested 4x Win11 VM with my Tesla T4 with this Script works out of the Box.
it didn't work so i removed all changes actually i want to use windows vm with my proxmox workstation but my hdmi output slot is damaged so I can't connect any hdmi to output the display so how am I going to passthrough and use for gaming
 
it didn't work so i removed all changes actually i want to use windows vm with my proxmox workstation but my hdmi output slot is damaged so I can't connect any hdmi to output the display so how am I going to passthrough and use for gaming
Looking over the scripts they appear to be for the standard drivers not the kvm drivers for vGPU

Your output ports do not really have anything to do with passthrough. But If your HDMI is damaged and you need it for your monitor/tv, you can get a cheap DVI to HDMI adapter off ebay/amazon or something and just use your DVI port instead.
 
So If I'm getting code 43 it's something else then not the port

Okay but via script (v3) it says not supported

So I tried this one https://forum.proxmox.com/threads/pve-8-22-kernel-6-8-and-nvidia-vgpu.147039/unread?new=1

It does split my vgpu but on guest which driver I need to download to work (m1000m) this one is my gpu

When I try to download from geforce experience it says windows is not compatible with this version


And again same for direct pass through
1. If my gpu with audio in same iommu group it's give same error
2. If it is in different groups then I only pass the card not audio then drivers installs but give's code 43

Maybe my laptop doesn't work for pass through
 
So If I'm getting code 43 it's something else then not the port

Okay but via script (v3) it says not supported

So I tried this one https://forum.proxmox.com/threads/pve-8-22-kernel-6-8-and-nvidia-vgpu.147039/unread?new=1

It does split my vgpu but on guest which driver I need to download to work (m1000m) this one is my gpu

When I try to download from geforce experience it says windows is not compatible with this version


And again same for direct pass through
1. If my gpu with audio in same iommu group it's give same error
2. If it is in different groups then I only pass the card not audio then drivers installs but give's code 43

Maybe my laptop doesn't work for pass through
Try these:
https://gitlab.com/polloloco/vgpu-proxmox - vgpu unlock
https://git.collinwebdesigns.de/oscar.krause/fastapi-dls - licensing

You also need the kvm driver, which the first explains how to obtain from nvidia or just search online for “nvidia kvm archive” and you should be able to find a download somewhere of the kvm package which should include the 50mb kvm host version of the driver then the appropriate driver packages for the guests.

Also test output of mdevctl types & nvidia-smi vgpu to see status of vgpu profiles and GPUs enabled for vgpu, you can do that now to see if it is currently enabled and you just need a different guest driver or if its not working and you need to try the tutorials above.

Also full passthrough without mdev/vGPU should work regardless of vGPU support. Probably just not the right driver or something.
 
Full pass through isn't working for both in windows or linux guest

NVRM: failed to copy vbios to system memory.
NVRM: RmInitAdapter failed!
NVRM: rm_init_adapter failed for device bearing minor number 0

This is the error I got in linux guest (pop os)

Also for vgpu I'm using that only see the guide I have been using and the output after that

mdevctl types

Does return the output for different nvidia profiles

It's just that the nvclean gives no device found inside windows

Maybe I should try vgpu for linux guest and check if it is working or not for that
 

Attachments

  • 1729752735183.png
    1729752735183.png
    170.7 KB · Views: 15
  • 1729752645473.png
    1729752645473.png
    52.8 KB · Views: 13
I don't have a supported card, i'm currently using a 1050ti, and i'm using this patch https://github.com/VGPU-Community-Drivers/vGPU-Unlock-patcher

# git clone --recursive https://github.com/VGPU-Community-Drivers/vGPU-Unlock-patcher
# cd vGPU-Unlock-patcher
# wget somewhere.NVIDIA-Linux-x86_64-550.90.05-vgpu-kvm.run
# ./patch.sh vgpu-kvm
# cd NVIDIA-Linux-x86_64-550.90.05-vgpu-kvm-patched
# ./nvidia-installer --dkms -m=kernel

that install the full driver including the drm (no need for --no-drm), then i have the intermediated devices, needed for vgpu:

Code:
# mdevctl  types
0000:10:00.0
  nvidia-1024
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-1Q
    Description: num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=3840x2160, max_instance=4
  nvidia-2048
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-2Q
    Description: num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=3840x2160, max_instance=2
  nvidia-4096
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-4Q
    Description: num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=3840x2160, max_instance=1
  nvidia-512
    Available instances: 4
    Device API: vfio-pci
    Name: GRID P40-0.5Q
    Description: num_heads=4, frl_config=60, framebuffer=512M, max_resolution=3840x2160, max_instance=8

Code:
# nvidia-smi
Sat Aug 10 10:22:59 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.05              Driver Version: 550.90.05      CUDA Version: N/A      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1050 Ti     On  |   00000000:10:00.0 Off |                  N/A |
| 45%   25C    P8             N/A /   75W |    1973MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2842    C+G   vgpu                                          488MiB |
|    0   N/A  N/A      2907    C+G   vgpu                                          488MiB |
|    0   N/A  N/A      2977    C+G   vgpu                                          488MiB |
|    0   N/A  N/A      3047    C+G   vgpu                                          488MiB |
+-----------------------------------------------------------------------------------------+

it's a small card, i created my own profiles of 512mb ram, also i add the device to the vm thru the gui, not editing the .conf

View attachment 72780

once i forgot the add the MDev type, and i have those strange bsod you talking about, about iommu, make sure you're using grub, if not the correct file is /etc/kernel/cmdline and the update is thru # proxmox-boot-tool refresh.
How did you create your own profile with the 512MB Ram?
 
How did you create your own profile with the 512MB Ram?
NVIDIA vGPU Guide

you usually do so with vGPU overrides and a file at /etc/vgpu_unlock/profile_override.toml

Code:
framebuffer = 0x1A000000
framebuffer_reservation = 0x6000000

is 512mb

[profile.nvidia-#] or [vm.#] is used to specify which profile or vm config to override, if you just follow the guide above where necessary it will get you all sorted out.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!