[SOLVED] vGPU just stopped working randomly (solution includes 6.14, pascal fixes for 17.5, changing mock p4 to A5500 thanks to GreenDam )

I reinstalled the license server as a docker container this time and it all worked. The VM received a 3 month licence. I am not sure what is going on in lxc container, may be something to do with openssl or other libraries. The Failed to verify signature on lease response error is a cryptographic and configuration problem and so is Failed to acquire license from 10.10.1.135.
 
@Randell - FYI I stopped getting this error when I did a clean install with PVE8. Haven't worked out what the cause is yet, but so far on a clean install with only the 570 unlocked drivers that error is gone. now checking whether it has actually made my GPU encoding stable, and then if it has, will upgrade to PVE9 again and see whether anything changes.
I believe I only got this error with kernel version after (not including) 6.8 (doesn't seem to matter the driver version, official supported 16.11, or older "patched" 16.x drivers, or 17.x (patched/unpatched).

I have yet to try patching everything with patched 18 series drivers yet. I'm considering just finding a Turing based card so I can use newer drivers without worry about patching but since I play around with a 3 node cluster at home, I don't want to buy 3 cards. (and I expect Turing will get dropped before I know it and I'll be right back here in the next release)
 
Hi everyone, I'm running a TESLA P4 on PVE9 kernel 6.14.11-2 using non-patched 16.11 (535.261.04) drivers.
What I've done was basically nvidia-uninstall the last working drivers for PVE8, keeping all the vGPU shenanigans untouched except recompiling the unlock repo, and install unpatched 16.11 drivers (I wasn't able to patch it on any kernel I have installed).
Simply ./NVIDIA.......run, reboot et voila!
P4 is shown as a T4 in mediated devices, but everything works fine as before as T4-8Q on a Win10 machine on which I successfully updated the guest drivers as well.

I also have a P40 but I use it on a direct passthrough to a Linux VM, so it's not shown on nvidia-smi output. Lmk if you want me to investigate more on that card as well.
 

Attachments

  • Screenshot (93).png
    Screenshot (93).png
    30.3 KB · Views: 3

@zenowl77

Hello, are you able to use the override_profiles with your setup? I run proxmox 9, with a 1060 6GB gpu, 18.4 driver on host and guest. All working but without overrides, so frame rate is limited to 45fps or the resolution is 1280x0124.
It worked fine on the 6.8 kernel with 16.5 drivers but now it ignores the override file? What am i missing?
 

@zenowl77

Hello, are you able to use the override_profiles with your setup? I run proxmox 9, with a 1060 6GB gpu, 18.4 driver on host and guest. All working but without overrides, so frame rate is limited to 45fps or the resolution is 1280x0124.
It worked fine on the 6.8 kernel with 16.5 drivers but now it ignores the override file? What am i missing?
yes it works fine for me, i use it with a 1080p monitor via remote desktop every day.

here is the exact override config i use for that VM most the time:
Code:
[vm.128]
display_width = 3840
display_height = 2160
max_pixels = 8294400
cuda_enabled = 1
frl_enabled = 0
framebuffer = 0x1DC000000
framebuffer_reservation = 0x24000000 # 8GB