NVIDIA MIG (Multi-Instance GPU) on Proxmox

Someone. Please! I'm not smart enough to figure this out myself. Same for AMD SEV-SNP which I know works with this technology
 
I ended up buying Intel B50 and running SV-IOR and I now have 8 glorious virtual gpus for my vms.
 
We recently tried the MIG instances.
There's just no way to use them with VMs, but with LXC containers they seem to be working. Prefererrably with Nvidian container toolkit so it's simplier to mount the to containers config by using the MIG instances IDs.
The MIGs for VMs would require the paid Enterprise AI license for VGPU feature, but that's not yet officially supported with Proxmox and H200 card so not sure if this would work.
 
Hello guys,

Is there any updates on MIG with Proxmox?
I tried to set it up with Proxmox 9.1 and kernel 6.17. I installed driver as described at - https://forum.proxmox.com/threads/proxmox-9-1-nvidia-drivers-desktop-gui.178308/

And was able to slice my A100 80GB into 2 40GB.
But I am stuck trying to assign this slices to VM. The system is still seeing only single PCI device.
I see slices with
Bash:
nvidia-smi
. But I could not see devices in the Mapped Devices adding new PCI in the UI.
And as well there is no at all folder -
Bash:
/sys/class/mdev_bus/
What is the proper process of assigning Nvidia MIG slices to VM in proxmox?

As far as I understand to use MIG for VMs need to enable
Bash:
sriov_numvfs

But in my case I got:

Code:
root@kourouna:~# lspci -nn | grep -i nvidia
61:00.0 3D controller [0302]: NVIDIA Corporation GA100 [A100 PCIe 80GB] [10de:20b5] (rev a1)

root@kourouna:~# ls "/sys/bus/pci/devices/0000:61:00.0/" | grep sriov
sriov_drivers_autoprobe
sriov_numvfs
sriov_offset
sriov_stride
sriov_totalvfs
sriov_vf_device
sriov_vf_total_msix

root@kourouna:~# echo 1 > "/sys/bus/pci/devices/0000:61:00.0/sriov_numvfs"
-bash: echo: write error: No such file or directory

root@kourouna:~# cat "/sys/bus/pci/devices/0000:61:00.0/sriov_numvfs"
0
I do not understand what is wrong here? I could not change this file. And looks like that's why I do not have virtual PCI devices.
 
Last edited: