[TUTORIAL] NVIDIA vGPU on Proxmox VE 7.x

the links for these vgpu drivers are not publicly available, but you can download them if you have the proper nvidia grid licenses. if you do have them and still can't find it, i would ask nvidia for support.
 
got the driver up and running, enabled iommu etc...

installed mdevctl too but no types listed

root@pve:/# mdevctl types
root@pve:/#
root@pve:/# nvidia-smi vgpu
Fri Dec 2 15:12:34 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03 Driver Version: 510.108.03 |
|---------------------------------+------------------------------+------------+
| GPU Name | Bus-Id | GPU-Util |
| vGPU ID Name | VM ID VM Name | vGPU-Util |
|=================================+==============================+============|
| 0 NVIDIA RTX A5000 | 00000000:01:00.0 | 0% |
+---------------------------------+------------------------------+------------+
root@pve:/#

root@pve:/# lspci -d 10de:
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2231 (rev a1)
01:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)

- not getting the list of virtual functions

got upto here. Anyone can guide me further

Thanks in advance
 
it seems you did not disable the display mode (see the wiki article from the first post), otherwise there would be no audio device
also you must enable sriov on each boot (thats also explained in the wiki article)
 
got the driver up and running, enabled iommu etc...

installed mdevctl too but no types listed

root@pve:/# mdevctl types
root@pve:/#
root@pve:/# nvidia-smi vgpu
Fri Dec 2 15:12:34 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03 Driver Version: 510.108.03 |
|---------------------------------+------------------------------+------------+
| GPU Name | Bus-Id | GPU-Util |
| vGPU ID Name | VM ID VM Name | vGPU-Util |
|=================================+==============================+============|
| 0 NVIDIA RTX A5000 | 00000000:01:00.0 | 0% |
+---------------------------------+------------------------------+------------+
root@pve:/#

root@pve:/# lspci -d 10de:
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2231 (rev a1)
01:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)

- not getting the list of virtual functions

got upto here. Anyone can guide me further

Thanks in advance
hello, could you drop this archive NVIDIA-GRID-Linux-KVM-510.108.03-513.91.zip
 
Where.. It's simple. Just make an account on the NVIDIA site; it's available for free download. I was looking in the wrong place earlier. it's on the 2nd or 3rd page.
I applied for an NVIDIA Enterprise Account twice. And I don't get an answer. Therefore, I can not go in and download the necessary driver.
 
Hi all.

I have 1 small question.I am using 1 GPU card for our system and it is currently full intences. Will the number of instances increase when we add 1 more GPU card?

Best regards,
 
it seems you did not disable the display mode (see the wiki article from the first post), otherwise there would be no audio device
also you must enable sriov on each boot (thats also explained in the wiki article)

What is the "display mode" exactly? I dont see much reference for that on the article. I'm using Ubuntu 22.04 Server with a P2200 and getting no output from:

Code:
nvidia-smi vgpu
No supported devices in vGPU mode

or

Code:
mdevctl types
 
Can't install nvidia vGPU driver. This is what I have as error message.

Bash:
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[ 2620.739636]  ? do_syscall_64+0x69/0xc0
[ 2620.739639]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 2620.739642] RIP: 0033:0x7fc2afe4aa66
[ 2620.739645] Code: 7c 24 08 e8 7c 0f f9 ff 4c 8b 54 24 18 48 8b 74 24 10 41 b8 08 00 00 00 41 89 c1 48 8b 7c 24 08 4c 89 e2 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2a 44 89 cf 89 44 24 08 e8 a6 0f f9 ff 8b 44
[ 2620.739647] RSP: 002b:00007ffc28e4b350 EFLAGS: 00000293 ORIG_RAX: 000000000000010f
[ 2620.739650] RAX: 0000000000000000 RBX: 0000558845c401a0 RCX: 00007fc2afe4aa66
[ 2620.739652] RDX: 00007ffc28e4b370 RSI: 00000000000000a3 RDI: 0000558847c0a660
[ 2620.739654] RBP: 00007ffc28e4b3dc R08: 0000000000000008 R09: 0000000000000000
[ 2620.739656] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffc28e4b370
[ 2620.739658] R13: 0000558845c401a0 R14: 00007ffc28e4b3e0 R15: 0000000000000000
[ 2620.739662]  </TASK>
[ 6027.229412] nvidia-nvlink: Nvlink Core is being initialized, major device number 509

[ 6027.231269] nvidia 0000:07:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[ 6027.231466] NVRM: This is a 64-bit BAR mapped above 4GB by the system
               NVRM: BIOS or the Linux kernel, but the PCI bridge
               NVRM: immediately upstream of this GPU does not define
               NVRM: a matching prefetchable memory window.
[ 6027.231470] NVRM: This may be due to a known Linux kernel bug.  Please
               NVRM: see the README section on 64-bit BARs for additional
               NVRM: information.
[ 6027.231480] nvidia: probe of 0000:07:00.0 failed with error -1
[ 6027.231509] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 6027.231511] NVRM: None of the NVIDIA devices were initialized.
[ 6027.231666] nvidia-nvlink: Unregistered Nvlink Core, major device number 509
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.


Any suggestion?
 
Can't install nvidia vGPU driver. This is what I have as error message.

Bash:
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[ 2620.739636]  ? do_syscall_64+0x69/0xc0
[ 2620.739639]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 2620.739642] RIP: 0033:0x7fc2afe4aa66
[ 2620.739645] Code: 7c 24 08 e8 7c 0f f9 ff 4c 8b 54 24 18 48 8b 74 24 10 41 b8 08 00 00 00 41 89 c1 48 8b 7c 24 08 4c 89 e2 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2a 44 89 cf 89 44 24 08 e8 a6 0f f9 ff 8b 44
[ 2620.739647] RSP: 002b:00007ffc28e4b350 EFLAGS: 00000293 ORIG_RAX: 000000000000010f
[ 2620.739650] RAX: 0000000000000000 RBX: 0000558845c401a0 RCX: 00007fc2afe4aa66
[ 2620.739652] RDX: 00007ffc28e4b370 RSI: 00000000000000a3 RDI: 0000558847c0a660
[ 2620.739654] RBP: 00007ffc28e4b3dc R08: 0000000000000008 R09: 0000000000000000
[ 2620.739656] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffc28e4b370
[ 2620.739658] R13: 0000558845c401a0 R14: 00007ffc28e4b3e0 R15: 0000000000000000
[ 2620.739662]  </TASK>
[ 6027.229412] nvidia-nvlink: Nvlink Core is being initialized, major device number 509

[ 6027.231269] nvidia 0000:07:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[ 6027.231466] NVRM: This is a 64-bit BAR mapped above 4GB by the system
               NVRM: BIOS or the Linux kernel, but the PCI bridge
               NVRM: immediately upstream of this GPU does not define
               NVRM: a matching prefetchable memory window.
[ 6027.231470] NVRM: This may be due to a known Linux kernel bug.  Please
               NVRM: see the README section on 64-bit BARs for additional
               NVRM: information.
[ 6027.231480] nvidia: probe of 0000:07:00.0 failed with error -1
[ 6027.231509] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 6027.231511] NVRM: None of the NVIDIA devices were initialized.
[ 6027.231666] nvidia-nvlink: Unregistered Nvlink Core, major device number 509
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.


Any suggestion?
Hi,
the error you get might indicate that you have blacklisted the `nvidia` kernel module. What's the output of modprobe nvidia?
 
root@pve:~# modprobe nvidia modprobe: FATAL: Module nvidia not found in directory /lib/modules/5.15.107-2-pve

blacklist.conf's content:

blacklist radeon
blacklist nouveau
blacklist nvidiafb

Thank you for your assistance.
 
What is the "display mode" exactly? I dont see much reference for that on the article. I'm using Ubuntu 22.04 Server with a P2200 and getting no output from:
display mode refers to a feature some cards have where they're either 'normal' gpus without vgpu functions or in vgpu mode but without display out functionality (e.g. the rtx a5000 is such a card)

but since your p2200 is not a vgpu capable card anyway this does not matter?


Can't install nvidia vGPU driver. This is what I have as error message.

Bash:
ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.

Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.
-> Kernel module load error: No such device
-> Kernel messages:
[ 2620.739636]  ? do_syscall_64+0x69/0xc0
[ 2620.739639]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
[ 2620.739642] RIP: 0033:0x7fc2afe4aa66
[ 2620.739645] Code: 7c 24 08 e8 7c 0f f9 ff 4c 8b 54 24 18 48 8b 74 24 10 41 b8 08 00 00 00 41 89 c1 48 8b 7c 24 08 4c 89 e2 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2a 44 89 cf 89 44 24 08 e8 a6 0f f9 ff 8b 44
[ 2620.739647] RSP: 002b:00007ffc28e4b350 EFLAGS: 00000293 ORIG_RAX: 000000000000010f
[ 2620.739650] RAX: 0000000000000000 RBX: 0000558845c401a0 RCX: 00007fc2afe4aa66
[ 2620.739652] RDX: 00007ffc28e4b370 RSI: 00000000000000a3 RDI: 0000558847c0a660
[ 2620.739654] RBP: 00007ffc28e4b3dc R08: 0000000000000008 R09: 0000000000000000
[ 2620.739656] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffc28e4b370
[ 2620.739658] R13: 0000558845c401a0 R14: 00007ffc28e4b3e0 R15: 0000000000000000
[ 2620.739662]  </TASK>
[ 6027.229412] nvidia-nvlink: Nvlink Core is being initialized, major device number 509

[ 6027.231269] nvidia 0000:07:00.0: can't change power state from D3cold to D0 (config space inaccessible)
[ 6027.231466] NVRM: This is a 64-bit BAR mapped above 4GB by the system
               NVRM: BIOS or the Linux kernel, but the PCI bridge
               NVRM: immediately upstream of this GPU does not define
               NVRM: a matching prefetchable memory window.
[ 6027.231470] NVRM: This may be due to a known Linux kernel bug.  Please
               NVRM: see the README section on 64-bit BARs for additional
               NVRM: information.
[ 6027.231480] nvidia: probe of 0000:07:00.0 failed with error -1
[ 6027.231509] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 6027.231511] NVRM: None of the NVIDIA devices were initialized.
[ 6027.231666] nvidia-nvlink: Unregistered Nvlink Core, major device number 509
ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.


Any suggestion?
can you post the content of the /var/log/nvidia-installer.log file too? also the output of 'pveversion -v' would be interesting
 
but since your p2200 is not a vgpu capable card anyway this does not matter?
This card is vGPU capable actually, maybe not "supported". I was able to get it running. I also have a P4 and a T4 so let's try and be helpful instead of writing things off as "does not matter"...
 
Last edited:
This card is vGPU capable actually, maybe not "supported". I was able to get it running. I also have a P4 and a T4 so let's try and be helpful instead of writing things off as "does not matter"...
i hope you can understand that we cannot really support "unofficial" ways to get vgpu running (though i think i know what you mean ;) ) but i did already answer your question regarding display mode

it was not really clear what you meant with
I'm using Ubuntu 22.04 Server with a P2200 and getting no output from:
ubuntu as a guest? where did you run the commands?
 
can you post the content of the /var/log/nvidia-installer.log file too? also the output of 'pveversion -v' would be interesting
Here is the /var/log/nvidia-installer.log file (attached).

pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.107-2-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-3
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.107-1-pve: 5.15.107-1
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-1
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.6
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-2
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1


Thank you for the assistance.
 

Attachments

  • nvidia-installer.log
    20 KB · Views: 5
ok the module builds fine but only fails to load... do you have the nouveau module loaded? what kind of card is it? whats the rest of the hardware?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!