Faild to start second vm with vGPU enabled

drono · May 29, 2024

After following https://gitlab.com/polloloco/vgpu-proxmox along with https://wvthoog.nl/proxmox-7-vgpu-v2/ and https://www.youtube.com/watch?v=cPrOoeMxzu0&ab_channel=CraftComputing
I've got to cloned VMs with the same settings.

args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/5be2a059-3ad4-45b9-8515-92c6d7081b8b,display=off,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-pci-vendor-id=0x10de,x-pci-device-id=0x0x1b81,x-pci-sub-vendor-id=0x10de,x-pci-sub-device-id=0x11A0' -uuid 5be2a059-3ad4-45b9-8515-92c6d7081b8b

and

args: -device 'vfio-pci,sysfsdev=/sys/bus/mdev/devices/4550976a-6ab8-48c5-9825-4cc578fb91fa,display=off,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-pci-vendor-id=0x10de,x-pci-device-id=0x0x1b81,x-pci-sub-vendor-id=0x10de,x-pci-sub-device-id=0x11A0' -uuid 4550976a-6ab8-48c5-9825-4cc578fb91fa

After proxmox reboot I type in CLI

mdevctl start -u 5be2a059-3ad4-45b9-8515-92c6d7081b8b -p 0000:08:00.0 --type nvidia-50
mdevctl start -u 4550976a-6ab8-48c5-9825-4cc578fb91fa -p 0000:08:00.0 --type nvidia-50

Then I'm able to start any of two VM's and GPU works fine there. If I try to start second vm I get:

kvm: -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/5be2a059-3ad4-45b9-8515-92c6d7081b8b,display=off,id=hostpci0.0,bus=ich9-pcie-port-1,addr=0x0.0,x-pci-vendor-id=0x10de,x-pci-device-id=0x1c31: vfio 5be2a059-3ad4-45b9-8515-92c6d7081b8b: error getting device from group 26: Connection timed out

Verify all devices in group 26 are bound to vfio-<bus> or pci-stub and not already in use
stopping swtpm instance (pid 3445) due to QEMU startup error
TASK ERROR: start failed: QEMU exited with code 1

and the weird thing is that if I stop the first vm and try to get it running again I revieve the same error but with a different group.
So that I have to reboot and start over

gfngfn256 · May 29, 2024

drono said:
if I stop the first vm and try to get it running again I revieve the same error

After shutting down the VM, I would try:

Code:

mdevctl stop -u 5be2a059-3ad4-45b9-8515-92c6d7081b8b
mdevctl stop -u 4550976a-6ab8-48c5-9825-4cc578fb91fa

& then

mdevctl start -u 5be2a059-3ad4-45b9-8515-92c6d7081b8b -p 0000:08:00.0 --type nvidia-50
mdevctl start -u 4550976a-6ab8-48c5-9825-4cc578fb91fa -p 0000:08:00.0 --type nvidia-50

Note: I don't have any system/operation similar to yours, I'm just trying to imagine what I would do based on this Github.
(BTW, I'm also not sure why you are using 2 device ID's, one for each VM, since anyway you will only be able to use one at a time AFAIK. I guess you know what you are doing).

drono · May 29, 2024

gfngfn256 said:
After shutting down the VM, I would try:

Code:

mdevctl stop -u 5be2a059-3ad4-45b9-8515-92c6d7081b8b mdevctl stop -u 4550976a-6ab8-48c5-9825-4cc578fb91fa & then mdevctl start -u 5be2a059-3ad4-45b9-8515-92c6d7081b8b -p 0000:08:00.0 --type nvidia-50 mdevctl start -u 4550976a-6ab8-48c5-9825-4cc578fb91fa -p 0000:08:00.0 --type nvidia-50

Note: I don't have any system/operation similar to yours, I'm just trying to imagine what I would do based on this Github.
(BTW, I'm also not sure why you are using 2 device ID's, one for each VM, since anyway you will only be able to use one at a time AFAIK. I guess you know what you are doing).

Thanks for the reply.
I've tried mdevctl stop -u 5be2a059-3ad4-45b9-8515-92c6d7081b8b and for some reason it doesn't help.
Anyway it's not the main issue. The main problem is that I can't run two VM's with vGPU passed through at the same time

B.Otto · May 29, 2024

You cannot give the same (v)GPU to two VMs at the same time. In order to have two VMs with passed-through physical GPUs running at the same time, you need two GPUs in your server.

gfngfn256 · May 29, 2024

You can have 2 (or more) vGPUs setup on / from your card. But this is dependent on enough video memory available. Also both vGPUs must/will have the same properties/size. In such a (working) scenario it should be possible to give 2 VMs a vGPU each.

IDK the --type nvidia-50 that the OP refers to, but I guess he knows what it refers to & that he has the resources for 2 such vGPUs.

EDIT: I forgot to add - AFAIK to make the above work - you'd probably have to set (in Proxmox GUI) the VM, Hardware, Display to none, otherwise the Windows VM will use the GPU itself - thus making it impossible to use the 2 vGPUs separately.

drono · May 29, 2024

The problem was is that I just had to take a fresh air. After a short walk I've managed it to work.

What I changed is that instead of overriding profile in /etc/vgpu_unlock/profile_override.toml I decided to find one that fits the best to my needs.

Search

Search

Faild to start second vm with vGPU enabled

drono

New Member

gfngfn256

Renowned Member

drono

New Member

B.Otto

Active Member

gfngfn256

Renowned Member

drono

New Member