Issues with nVidia vGPU since upgrading to kernel 6.5.13-5-pve

kesawi

New Member
Jan 20, 2024
19
4
3
Brisbane, Australia
Since upgrading from 6.5.13-3-pve to 6.5.13-5-pve I've noticed the issues with my VMs that I have my GTX1660Ti passedthrough to as a vGPU.

I run Plex in a Docker container within a Ubuntu 22.04.4 VM. When I start a media file that requires hardware transcoding within Windows, it makes several attempts to start the GPU process. If I change the playback settings during playback, it attempts to restart the transcoding process and then fails (refer to this thread on the Plex forums for more details).

I also run a Xpenology VM and use Deep Video Analytics for people and vehicle detection under Synology Surveillance Assistance. Typically after 48-72 hours the DVA tasks stop working (i.e. don't detect any further events) despite the GPU processes showing up when using the nvidia-smi command.

I'm using the nVidia gridd drivers v535.161.05 and followed the instructions at https://gitlab.com/polloloco/vgpu-proxmox to get my card working with vGPU.

I've tried uninstalling and reinstalling the nVidia drivers, but with no change.
 
I'm on same boat. But I want to ask you about service.
I using Proxmox 8.1 with Nvidia Tesla P4 (it support grid out of the box and have vgpu support to at least in win10 drivers)

in instruction https://gitlab.com/polloloco/vgpu-proxmox there 3 small steps:
1) creation of config for services nvidia-vgpud and nvidia-vgpu-mgr. On https://github.com/wvthoog/proxmox-vgpu-installer this step completed with enabling of that services. but in Proxmox 8.1 I get an error that service was not enabled because there are no /etc/systemd/system/nvidia-vgpud.service and /etc/systemd/system/nvidia-vgpu-mgr.service. I tried to make simple template for service and enabled it. As I understand they should create environment variable LD_PRELOAD. They executed but there are no variable LD_PRELOAD in env.
2) I patched driver 535.161.08 by instruction and install it.
3) nvidia-smi return ok result but "nvidia-smi vgpu" return "No supported devices in vGPU mode"
4) mdevctl types return empty

so I'm stuck on 1) (all this service should do is set env variable, but it don't and don't load any module) and don't understand why 3) happened?

Can you help me to show how you fix 1) and do you have this variable LD_PRELOAD in env?
 
I'm on same boat. But I want to ask you about service.
I using Proxmox 8.1 with Nvidia Tesla P4 (it support grid out of the box and have vgpu support to at least in win10 drivers)

in instruction https://gitlab.com/polloloco/vgpu-proxmox there 3 small steps:
1) creation of config for services nvidia-vgpud and nvidia-vgpu-mgr. On https://github.com/wvthoog/proxmox-vgpu-installer this step completed with enabling of that services. but in Proxmox 8.1 I get an error that service was not enabled because there are no /etc/systemd/system/nvidia-vgpud.service and /etc/systemd/system/nvidia-vgpu-mgr.service. I tried to make simple template for service and enabled it. As I understand they should create environment variable LD_PRELOAD. They executed but there are no variable LD_PRELOAD in env.

Can you help me to show how you fix 1) and do you have this variable LD_PRELOAD in env?
I followed the steps manually at https://gitlab.com/polloloco/vgpu-proxmox and did not use the script at https://github.com/wvthoog/proxmox-vgpu-installer.

Have you tried the following manually?

Bash:
mkdir /etc/systemd/system/{nvidia-vgpud.service.d,nvidia-vgpu-mgr.service.d}
echo -e "[Service]\nEnvironment=LD_PRELOAD=/opt/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so" > /etc/systemd/system/nvidia-vgpud.service.d/vgpu_unlock.conf
echo -e "[Service]\nEnvironment=LD_PRELOAD=/opt/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so" > /etc/systemd/system/nvidia-vgpu-mgr.service.d/vgpu_unlock.conf

If I do printenv from the command line, then I don't see an LD_PRELOAD variable listed.

Both services created with the command above are running for me.
 
First I found https://gitlab.com/polloloco/vgpu-proxmox and go manually step by step. I already have passthrough for iGPU AMD 610M and don't want to damage it with the script.
Because I make an experiment around service descriptions, I get an Proxmox boot error. I have to reinstall Proxmox )) so after small search I found script https://github.com/wvthoog/proxmox-vgpu-installer. And it worked on clean installation of Proxmox 8.1. I update sources and upgrade it. As result kernel updated to 6.5.13 and everything works.

uname -a
Linux h340 6.5.13-5-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-5 (2024-04-05T11:03Z) x86_64 GNU/Linux

There are some differences with what I do manually.
1) Script stuck to older drivers. With P4 I limited to 16.x branch. I tried latest 535.161.08, patch it and vgpu don't work. Script using 535.104.06, patch it too and VGPU works.
2) Before I read any articles I tried 551 from 17.x but it is not installed.
3) Script install UNSIGNED kernel module, as result it is not loaded with secure boot. I have to manually reinstall patched 535.104.06 that script created. Choose signed version. Generate key pair and import it with mokutils. Next boot I have to make MOK enrolment of keypair.
4) On profile list I have A, B, Q profiles but I don't see any С profiles.

Right now I don't know how to split Tesla P4 to 2 instances. I want one instance with 4Gb, and other 3.76 Gb. Default option is to split to 2 Gb instances and waste 3.76 Gb
 
Right now I don't know how to split Tesla P4 to 2 instances. I want one instance with 4Gb, and other 3.76 Gb. Default option is to split to 2 Gb instances and waste 3.76 Gb
I've stopped using vGPU with my nVidia card and am just doing a straight passthrough of the GTX1660Ti to my Xpenology VM. I am using vGPU for the iGPU on my i7-7700K which I'm passing through to my Ubuntu VM for Plex.

When I was using vGPU for my GTX1660TI, I had options to create Q, A & B profiles. I don't remember whether there were any C profiles. I used a Q profile and could select VRAM sizes in 1GB increments from 1GB to 6GB.

I couldn't mix VRAM sizes, i.e apply 2GB to one VM and 4GB to another, they all had to be the same. With the P4 you should be able to split it into two 4GB instances.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!