NVIDIA MIG (Multi-Instance GPU) on Proxmox

Thank you @dcapak

This is not VGPU though, it's MIG, a new(ish from 2020) technology from NVIDIA.

And I can confirm that creating a virtual GPRU with MIG does work correctly on Proxmox command line. But after that, nothing else can't be done because (unlike with VGPU instances that Proxmox allows attributing to a VM), Proxmox isn't aware they exist and we can't assign them to a VM.
 
ah ok, sorry i only skimmed the linked article.

after looking a bit more it seems that the default deployment is via devices in '/dev' ? so it should
simply be possible to bindmount that into containers? (there is no mention of qemu/kvm/vms on that page, so i assume that will not work with MIG)
 
Sorry for taking so long but I've been trying a lot of different stuff to get this work and nothing works except for using Ubuntu or Suse on bare metal.

What do you mean by bindmount the /dev device into containers? I got the idea bindmount in Proxmox was just of LXC. Can I use this in a VM?
 
so it should simply be possible to bindmount that into containers? (there is no mention of qemu/kvm/vms on that page, so i assume that will not work with MIG)
 
I got the idea bindmount in Proxmox was just of LXC. Can I use this in a VM?
yes exactly, the docs does not mention any use of vms, so i guess that will not work, but they mention containers here: https://docs.nvidia.com/datacenter/tesla/mig-user-guide/index.html#cuda-containers
although i did not read through the whole documentation, and they only mention their custom docker toolset, my educated guess is that they passthrough the relevant devices generated in /dev which the toolkit in the container can use
 
also, you can of course passthrough the whole gpu into a vm and use ubuntu/opensuse in there

further edit: it seems those gpus also support 'vgpu' but that requires a different driver AFAICT that is not freely available (only to subscriber of nvidia licenses AFAIK)
 
Last edited:
I can pass it with full passthrough, yes, and that works, but it really defeats the purpose since it can't only be assigned to a single VM instead of several.

VGPU is supposed to work, and I tried with the subscriber NVIDIA drivers, but `mdevctl` complains it can't find any device, so I end up not being able to create a VGPU as a temporary workaround.

As a direction, creating a MIG virtualized instance works by creating a SR-IOV device. It even shows it's UUID when queried. But on Proxmox, it doesn't show up under `/sys/bus/mdev/devices/` like it does in SUSE for instance.

If it did show up under mdev devices, this would be much easier to pull up with some hacking (and to properly add to proxmox interface in the future).

P.S.: SUSE has a nice guide about it all: https://documentation.suse.com/sles/15-SP3/html/SLES-all/article-nvidia-vgpu.html
 
Last edited:
VGPU is supposed to work, and I tried with the subscriber NVIDIA drivers, but `mdevctl` complains it can't find any device, so I end up not being able to create a VGPU as a temporary workaround.
i guess this is the problem, even the suse docs want to create a vgpu with the conventional method (that we already support -> mdev)

As a direction, creating a MIG virtualized instance works by creating a SR-IOV device. It even shows it's UUID when queried. But on Proxmox, it doesn't show up under `/sys/bus/mdev/devices/` like it does in SUSE for instance.
how did you create that instance?
 
sorry for the late answer..

where do you see the vgpu then? in the sysfs ?

if you want to use vgpus in pve, there has to be a pci device that exposes the 'mediated devices' in the sysfs. you can then select this device in the pve gui as pci device, and the 'mediated device' drop down should
give you the available models. pve will then create a new instance (via sysfs) on vm start and clean it up on vm stop
 
No, I don't see it anywhere in the system under Proxmox.
It's the `nvidia-smi` that shows it with the command: nvidia-smi -L

For instance:
Code:
> nvidia-smi -L
GPU 0: NVIDIA A100-PCIE-40GB (UUID: GPU-ee14e29d-dd5b-2e8e-eeaf-9d3debd10788)
 MIG 4g.20gb     Device  0: (UUID: MIG-fed03f85-fd95-581b-837f-d582496d0260)

On SUSE this shows up under: /sys/bus/mdev/devices, (i.e. /sys/bus/mdev/devices/fed03f85-fd95-581b-837f-d582496d0260). In Ubuntu I don't remember exactly where. But on Proxmox I can't find any device this way.
 
Last edited:
Right, you can do that, but you don't need to. This is for the cases where you want to create a MIG device, and then divide it into further VGPU device(s).

But, are you saying that would be the workaround though? I tried to use directly mdevctl on the MIG (which didn't work), maybe I'm missing the SR-IOV step there.
 
@jbssm did you manage to get this working?
I have Nvidia A100 80 GB card, i got it working in passthrough but same as your mdevctl types are not loaded in proxmox.
 
Hello! I am actively working on this for a project at my employment. There are a lot of misunderstandings in this threat. First the A100 and A30 support MIG (Multi-Instance Graphics) which is not GRID. MIG partitions the A100 and A30s in to smaller contexts. GRID uses vGPU technology that create PCI devices that can passthru to VMs. I'm not certain that MIG utilizes SR-IOV in any capacity. The instances of the graphics cards are exposed via /dev. To be honest, I don't believe GRID uses SR-IOV either, it uses software mediated devices, e.g. not hardware virtualization like the Intel XE or AMDPRO GPUs do.
I believe that MIG will not work with virtual machines, but will require containers, as mentioned in this thread. This is due to the devices not being exposed as hardware, like with GRID. GRID is a licensed feature and comes at a price.
I don't wish to encourage license circumventing software, but if you are looking for vGPU to work with Proxmox, as purely an educational endeavour, vgpu-unlock can unlock software mediated devices on consumer grade hardware. The last time I was researching this, for the project I'm currently on, I could not get an RTX3080 to work, but I was able to see a drop down list of all supported mediated devices within the Proxmox user interface. I could not get the cards to be recognized; however in the discourse of getting the A100s to be passed into my VMs I discovered that VM UEFI bios for Proxmox is bugged and I could not get the devices. I would like to circle back to this and discover whether I can get vGPU with mdevctl devices to function with SeaBIOS vs OVMF, which is what seems to be bugged. For the record, the bug is that the nvidia kmod will not recognize the card being passed in, regardless if it's a full device or a virtual GPU.
I hope to report back for anyone who is stuck on this project, or at least complete the missing components to this thread; however this will likely be a couple month's project. The answer may still be to use containers only, as outlined in the aforementioned article.
*I should clarify that the Ampere architecture does support SR-IOV; presumably also is the reason vgpu-unlock doesn't yet support Ampere devices. I was working with 1080s as well as 3080s for this.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!