[SOLVED] vGPU just stopped working randomly (solution includes 6.14, pascal fixes for 17.5, changing mock p4 to A5500 thanks to GreenDam )

the line 376 in lib.rs is the only one you should have to change, my guess the second number doesnt align or you need to manually restart the service
this can be done with systemctl restart nvidia-vgpu-mgr.service i have had ot use this to get mdevctl types to work before, which if thats the case you usually have to use it after every reboot to get it to work again, you could try making an issue on github too, to see if the person who edited it has a fix
 
Looks like I figured it out.

I uninstalled the 17.5 driver and vgpu-unlock-rs.

I rebooted to start 'clean'.

Installed 17.6 driver NVIDIA-Linux-x86_64-550.163.02-vgpu-kvm-custom.run after patching it with 550.163.02.patch.

Installed vgpu-unlock-rs from https://github.com/mbilker/vgpu_unlock-rs.

Rebooted, and viola, have P40 profiles :)

Code:
 nvidia-smi
Wed Nov 12 08:12:24 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.163.02             Driver Version: 550.163.02     CUDA Version: N/A      |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA TITAN Xp                On  |   00000000:03:00.0 Off |                  N/A |
| 23%   31C    P8             18W /  250W |      47MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Quadro P6000                   On  |   00000000:81:00.0 Off |                  Off |
| 26%   43C    P8             19W /  250W |   24434MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    1   N/A  N/A    128883    C+G   vgpu                                         8126MiB |
|    1   N/A  N/A    129032    C+G   vgpu                                         8126MiB |
|    1   N/A  N/A    129049    C+G   vgpu                                         8126MiB |
+-----------------------------------------------------------------------------------------+
 
that is good to hear you got it working, also for these new drivers you do not want p4/p40 profiles anymore if you meant p40 and not t40, you need the t4/t40 or the newer drivers wont install. p4/p40 is limited to 535 drivers as the maximum supported version.
 
Yeah, couldn't install the 17.6 client drivers, so I installed the 16.12 client drivers and they work fine. I doubt newer drivers will bring any improvements to these old cards... The 17.6 driver on proxmox installed fine, only problem is the client driver.

I might try and be advetureous and see if I can get the unlocker to fake my Pascal cards into a V100 so that newer drivers can be installed as the guy suggested in the repo:

https://github.com/mbilker/vgpu_unl...632c327e683d642ecaa93#commitcomment-170322337
 
Yeah, couldn't install the 17.6 client drivers, so I installed the 16.12 client drivers and they work fine. I doubt newer drivers will bring any improvements to these old cards... The 17.6 driver on proxmox installed fine, only problem is the client driver.

I might try and be advetureous and see if I can get the unlocker to fake my Pascal cards into a V100 so that newer drivers can be installed as the guy suggested in the repo:

https://github.com/mbilker/vgpu_unl...632c327e683d642ecaa93#commitcomment-170322337
it is good to change it to something that supports the newer drivers, in my experience too linux can get pretty picky of client drivers do not match the host exactly. lol

they do bring some improvements, since optimizations and all work on any cards, as do new features, if you do anything with AI, etc, the system fallback setting is in the new drivers but not on 535, as well as many improvements for newer games, ai, etc, the only reason the old cards do not work is because nvidia changed a few lines of code to remove them from the supported devices lists to phase them out. nothing is stopping them from doing most the things newer cards do. pre-pascal are kind of pretty bad but pascal and newer are all pretty comparable besides the tensor cores and built in ai features, newer cards are just of course much faster at the same advanced compute tasks and often have more vram. as long as you don't mind it being slower then your card can do everything a newer card can do that doesn't specifically use tensor cores.
 
  • Like
Reactions: shpitz461
I have been able to install the nvidia-grdd guest driver in rhel10 VM after lots of reading and troubleshooting. The xserver issue was easy as rhel10 does not even have xserver or xorg stuff (it was removed in rhel10) but the the hardcoded .run installer was just complaining as false alarm. I also patched the driver using instructions at gridd-unlock-patcher. Downloaed the client token on client VM using the following command

Code:
wget --no-check-certificate -O /etc/nvidia/ClientConfigToken/client_configuration_token_$(date '+%d-%m-%Y-%H-%M-%S').tok https://10.10.1.135/-/client-token


Now the only issue remains is licencing. Running
Code:
systemctl status nvidia-gridd.service
gives the following erros:

Sep 09 22:14:20 localhost.localdomain nvidia-gridd[3148]: vGPU Software package (0)
Sep 09 22:14:20 localhost.localdomain nvidia-gridd[3148]: Ignore service provider and node-locked licensing
Sep 09 22:14:20 localhost.localdomain nvidia-gridd[3148]: NLS initialized
Sep 09 22:14:20 localhost.localdomain nvidia-gridd[3148]: Acquiring license. (Info: 10.10.1.135; NVIDIA Virtual Applications)
Sep 09 22:14:22 localhost.localdomain nvidia-gridd[3148]: Mismatch between client and server with respect to licenses held. Returning the licenses
Sep 09 22:14:22 localhost.localdomain nvidia-gridd[3148]: License returned successfully. (Info: 10.10.1.135)
Sep 09 22:14:22 localhost.localdomain nvidia-gridd[3148]: Failed to verify signature (error:02000068:rsa routines::bad signature)
Sep 09 22:14:22 localhost.localdomain nvidia-gridd[3148]: Failed to verify signature on lease response
Sep 09 22:14:22 localhost.localdomain nvidia-gridd[3148]: Failed to validate lease response
Sep 09 22:14:22 localhost.localdomain nvidia-gridd[3148]: Failed to acquire license from 10.10.1.135


I have checked and confirmed that the time zone on both the fastapi license server and client VM is identical. Fastapi-DLS server is running on an lxc container (Ubuntu 24.04). In fact I also recreated the webserver.crt and webserver.key and then patched the original nvidia-gridd file again.

What could be wrong. After so much of effort still not working.
HI, how exactly do you run the gridd-unlocker-patcher? which file should I download?