Trying to understand nVidia vGPU

proxwolfe

Well-Known Member
Jun 20, 2020
499
50
48
49
Hi,

so far, I have only used PCIe passthrough to assin a single graphics card to a single VM - that works well as long as the CPU supports IOMMU, which isn't the case with my budget Xeon E3-12xx servers.

But I understand there is now another way: vGPU where I can assign (parts of) the same GPU to more than one VM without relying necessarily on IOMMU. This, however, is only supposed to work with a limited number of nVIDIAs professional GPUs. Recently, I was able to acquire an affordable RTX A5000 which, I believe, lists amongst them.

Alas, there is another hurdle: If I understand it correctly, this functionality is available only under a licensing scheme which isn't open to end users. And even if I were able to obtain a license, it would probably be prohibitively expensive.

But, theoretically, if I manage to obtain such license, I should be able to use my GPU in more than one VM at the same time (well, I understand it is time sliced, so not exactly at the same time but you get what I mean), right? What confuses me a bit is that nVIDIA only mentions app virtualization for the A5000. So I am wondering whether there are any further limitations I am not aware of yet.

Thanks!
 
IIRC, there exists two different technologies to clustering your GPU by NVIDIA:

- First is vGPU, which is just using virtual pcie devices to clustering your gpu, then you can assign those devices with your vm. Here is the detailed guide: https://docs.nvidia.com/grid/latest/grid-vgpu-release-notes-generic-linux-kvm/index.html.
- Another is Multi-Instance GPU, which is more limited on available models, but far easier to install and configure(which don't require any additional software, only standard drivers). The MIG, which is bounded with CUDA, will clustering your GPU into different segment with uuid as a handle, then let CUDA only *see* some of them. Do nothing(maybe something?) with pcie.

But they both are hard to reach. One requires an additional license + software, another requires really expensive hardware(A100 and later). Both of them may limited on its usage, I'm not digging in too much, but for example, if your task will not involve with CUDA, the MIG approach may not be suitable for you.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!