nVIDIA vGPU mdev setup not working (as per wiki)

proxwolfe

Well-Known Member
Jun 20, 2020
534
64
48
50
Hi,

I am trying to set up a NVIDIA A5000 as vGPU as per this wiki article: https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE_7.x

Got to the point where I enabled SRIOV and can list the virtual functions via lspci.

But then it says under guest configuration that I should pick an mdev in the GUI but for me the selection menu remains empty. (My GUI differentiates between a mapped device and a raw device. the mdev selection menu is shown right to the mapped device radio button (but, as said, remains empty). Under the selection menu of the raw device, I now see a number of virtual devices but - unlike in the wiki - without any information about their individual configuration (like ram, max. instances etc.).

Is this behaviour normal (because the GUI has changed compared to the wiki) or have I misconfigured something and should change anything to get to the point the wiki describes?

Thanks!
 
So I just tried to pass one of the vGPU devices to a VM but I get this error message:

Code:
TASK ERROR: Cannot bind 0000:02:00.0 to vfio

Interestingly, after this error, mdevctl types returns nothing. Before the error, it returns a list of mdevs.

So something is off. But what?
 
I've bumped into this issue too.
In a previous version of proxmox the mdev type dropdown showed all available mdev supported devices as per https://pve.proxmox.com/wiki/NVIDIA_vGPU_on_Proxmox_VE#Guest_Configuration

Code:
# cat /sys/bus/pci/devices/0000\:c7\:00.4/mdev_supported_types/nvidia-710/name
NVIDIA A16-2B
# cat /sys/bus/pci/devices/0000\:c7\:00.4/mdev_supported_types/nvidia-710/description
num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=8

I'd love to submit a fix for this if it's a lesser used feature, but trying to find the handling of this as a beginner to the project is a bit tricky which is fine. But the dropdown appears to be here: https://github.com/proxmox/pve-mana...228d003/www/manager6/form/MDevSelector.js#L80
And calling: https://proxmox:8006/api2/json/nodes/node-01/hardware/pci/0000:c7:00.4/mdev

And that returns
Code:
{
    "data": [
        {
            "description": "num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=8\n",
            "available": 0,
            "name": "NVIDIA A16-2Q",
            "type": "nvidia-712"
        },
        {
            "available": 0,
            "description": "num_heads=1, frl_config=60, framebuffer=16384M, max_resolution=1280x1024, max_instance=1\n",
            "type": "nvidia-720",
            "name": "NVIDIA A16-16A"
        },

It's a convenience factor for sure, but editing /etc/pve/qemu-server/<machine-id>.conf is fine too until I figure this one out.
I'll also note that
Code:
# mdevctl types --dumpjson
works perfectly fine too, all the different mdev types show up fine in CLI.

I think I started figuring out the code, although Pearl and I are at best crisis buddies.
But the /mdev endpoint appears to be located here: https://github.com/proxmox/pve-mana...e28228d003/PVE/API2/Hardware/PCI.pm#L172-L174

And it calls: https://github.com/proxmox/pve-comm...0d526a3da4a7d1cf52/src/PVE/SysFSTools.pm#L155

Which then used somewhere here (and there's probably a few steps in between that I haven't found/poked on yet):
And the JavaScript in the frontend is assuming: https://github.com/proxmox/pve-mana...4e8504bec02a/www/manager6/qemu/PCIEdit.js#L63

Where `pciDev.data.mdev` won't exist because the pciDev.data is actually:

Code:
{
    "checks": [],
    "id": "A16-VGPU",
    "digest": "7ffa7e4a371321d77",
    "map": [
        "id=10de:25b6,iommugroup=133,node=node-01,path=0000:c7:00.4,subsystem-id=10de:0000"
    ],
    "type": "pci"
}

And that appears to be because this is never called: https://github.com/proxmox/pve-mana...dee/www/manager6/form/MDevSelector.js#L76-L83

Which in turn is because me.pciid isn't set here: https://github.com/proxmox/pve-mana...dee/www/manager6/form/MDevSelector.js#L94-L96
 
Last edited:
  • Like
Reactions: dooferorg
This is by no means perfect, but at least it gets the dropdown working again on pve-manager/8.1.4/ec5affc9e41f1d79:
pvemanagerlib.js
Diff:
--- pvemanagerlib.js.orig    2024-03-19 23:20:55.108008038 +0100
+++ pvemanagerlib.js    2024-03-19 23:25:43.843702312 +0100
@@ -6436,21 +6436,44 @@
         type: 'proxmox',
         url: '/api2/json/nodes/' + me.nodename + '/hardware/pci/' + me.pciid + '/mdev',
     });
-    me.store.load();
+    me.store.load({
+                callback: (recs, op, success) => me.processResponse(recs, op, success)
+        });
+    },
+
+    processResponse: function(records, _op, success) {
+        let me = this;
+        if (!success) {
+            return;
+        }
+
+        let store = me.getStore();
+
+        let recs = []
+        records.forEach((rec) => {
+            recs.push(rec)
+        });
+
+        me.setDisabled(false)
+
+        me.suspendEvent('change');
+        me.setSelection();
+        me.setSelection(recs);
+        me.resumeEvent('change');
     },

     initComponent: function() {
-    var me = this;
+        var me = this;

-    if (!me.nodename) {
-        throw 'no node name specified';
-    }
+        if (!me.nodename) {
+            throw 'no node name specified';
+        }

         me.callParent();

-    if (me.pciid) {
-        me.setPciID(me.pciid, true);
-    }
+        if (me.pciid) {
+            me.setPciID(me.pciid, true);
+        }
     },
 });

@@ -32472,7 +32495,6 @@
         return;
         }

-
         let deviceId = `${device.vendor}:${device.device}`.replace(/0x/g, '');
         let subId = `${device.subsystem_vendor}:${device.subsystem_device}`.replace(/0x/g, '');

@@ -50112,6 +50134,8 @@
         }
         }

+            mdevfield.setPciID(path);
+
         if (pciDev.data.mdev) {
         mdevfield.setPciID(path);
         }
 
Last edited:
  • Like
Reactions: darren2517
Did anyone find a solution to this? I am stuck at the same point as this, but no luck getting anyfurther.