Faild to start second vm with vGPU enabled

drono

New Member
May 29, 2024
4
0
1
After following https://gitlab.com/polloloco/vgpu-proxmox along with https://wvthoog.nl/proxmox-7-vgpu-v2/ and https://www.youtube.com/watch?v=cPrOoeMxzu0&ab_channel=CraftComputing
I've got to cloned VMs with the same settings.

args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/5be2a059-3ad4-45b9-8515-92c6d7081b8b,display=off,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-pci-vendor-id=0x10de,x-pci-device-id=0x0x1b81,x-pci-sub-vendor-id=0x10de,x-pci-sub-device-id=0x11A0' -uuid 5be2a059-3ad4-45b9-8515-92c6d7081b8b

and

args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/4550976a-6ab8-48c5-9825-4cc578fb91fa,display=off,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-pci-vendor-id=0x10de,x-pci-device-id=0x0x1b81,x-pci-sub-vendor-id=0x10de,x-pci-sub-device-id=0x11A0' -uuid 4550976a-6ab8-48c5-9825-4cc578fb91fa

After proxmox reboot I type in CLI

mdevctl start -u 5be2a059-3ad4-45b9-8515-92c6d7081b8b -p 0000:08:00.0 --type nvidia-50
mdevctl start -u 4550976a-6ab8-48c5-9825-4cc578fb91fa -p 0000:08:00.0 --type nvidia-50

Then I'm able to start any of two VM's and GPU works fine there. If I try to start second vm I get:

kvm: -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/5be2a059-3ad4-45b9-8515-92c6d7081b8b,display=off,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-pci-vendor-id=0x10de,x-pci-device-id=0x1c31: vfio 5be2a059-3ad4-45b9-8515-92c6d7081b8b: error getting device from group 26: Connection timed out
Verify all devices in group 26 are bound to vfio-<bus> or pci-stub and not already in use
stopping swtpm instance (pid 3445) due to QEMU startup error
TASK ERROR: start failed: QEMU exited with code 1

and the weird thing is that if I stop the first vm and try to get it running again I revieve the same error but with a different group.
So that I have to reboot and start over
 
if I stop the first vm and try to get it running again I revieve the same error


After shutting down the VM, I would try:

Code:
mdevctl stop -u 5be2a059-3ad4-45b9-8515-92c6d7081b8b
mdevctl stop -u 4550976a-6ab8-48c5-9825-4cc578fb91fa

& then

mdevctl start -u 5be2a059-3ad4-45b9-8515-92c6d7081b8b -p 0000:08:00.0 --type nvidia-50
mdevctl start -u 4550976a-6ab8-48c5-9825-4cc578fb91fa -p 0000:08:00.0 --type nvidia-50


Note: I don't have any system/operation similar to yours, I'm just trying to imagine what I would do based on this Github.
(BTW, I'm also not sure why you are using 2 device ID's, one for each VM, since anyway you will only be able to use one at a time AFAIK. I guess you know what you are doing).
 
After shutting down the VM, I would try:

Code:
mdevctl stop -u 5be2a059-3ad4-45b9-8515-92c6d7081b8b
mdevctl stop -u 4550976a-6ab8-48c5-9825-4cc578fb91fa

& then

mdevctl start -u 5be2a059-3ad4-45b9-8515-92c6d7081b8b -p 0000:08:00.0 --type nvidia-50
mdevctl start -u 4550976a-6ab8-48c5-9825-4cc578fb91fa -p 0000:08:00.0 --type nvidia-50


Note: I don't have any system/operation similar to yours, I'm just trying to imagine what I would do based on this Github.
(BTW, I'm also not sure why you are using 2 device ID's, one for each VM, since anyway you will only be able to use one at a time AFAIK. I guess you know what you are doing).
Thanks for the reply.
I've tried mdevctl stop -u 5be2a059-3ad4-45b9-8515-92c6d7081b8b and for some reason it doesn't help.
Anyway it's not the main issue. The main problem is that I can't run two VM's with vGPU passed through at the same time
 
You cannot give the same (v)GPU to two VMs at the same time. In order to have two VMs with passed-through physical GPUs running at the same time, you need two GPUs in your server.
 
You can have 2 (or more) vGPUs setup on / from your card. But this is dependent on enough video memory available. Also both vGPUs must/will have the same properties/size. In such a (working) scenario it should be possible to give 2 VMs a vGPU each.

IDK the --type nvidia-50 that the OP refers to, but I guess he knows what it refers to & that he has the resources for 2 such vGPUs.

EDIT: I forgot to add - AFAIK to make the above work - you'd probably have to set (in Proxmox GUI) the VM, Hardware, Display to none, otherwise the Windows VM will use the GPU itself - thus making it impossible to use the 2 vGPUs separately.
 
Last edited:
The problem was is that I just had to take a fresh air. After a short walk I've managed it to work.

What I changed is that instead of overriding profile in /etc/vgpu_unlock/profile_override.toml I decided to find one that fits the best to my needs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!