Passthrough failed

mclordo

New Member
Jun 25, 2023
5
1
3
I tried to setup a Tesla P4 as vGPU and followed roughly this guide.

The unlock of the GPU worked so far, and I'm able to list all vGPU profiles with mdevctl types.
Then I added the PCI Device to my VM(102) and select it according to the followed picture (default MDev 285) and started the VM.

1687673466104.png

After a brief moment it crashes with the following Log:
Code:
kvm: -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/00000000-0000-0000-0000-000000000102,id=hostpci0,bus=pci.0,addr=0x10: vfio 00000000-0000-0000-0000-000000000102: error getting device from group 75: Input/output error
Verify all devices in group 75 are bound to vfio-<bus> or pci-stub and not already in use
TASK ERROR: start failed: QEMU exited with code 1

System-Log:
Code:
[ 4359.019970] nvidia-vgpu-vfio 00000000-0000-0000-0000-000000000102: Adding to iommu group 75
[ 4359.019975] nvidia-vgpu-vfio 00000000-0000-0000-0000-000000000102: MDEV: group_id = 75
[ 4359.701417] device tap102i1 entered promiscuous mode
[ 4359.733316] vmbr0: port 4(fwpr102p1) entered blocking state
[ 4359.733321] vmbr0: port 4(fwpr102p1) entered disabled state
[ 4359.733399] device fwpr102p1 entered promiscuous mode
[ 4359.734405] vmbr0: port 4(fwpr102p1) entered blocking state
[ 4359.734408] vmbr0: port 4(fwpr102p1) entered forwarding state
[ 4359.788026] fwbr102i1: port 1(fwln102i1) entered blocking state
[ 4359.788031] fwbr102i1: port 1(fwln102i1) entered disabled state
[ 4359.788114] device fwln102i1 entered promiscuous mode
[ 4359.788179] fwbr102i1: port 1(fwln102i1) entered blocking state
[ 4359.788181] fwbr102i1: port 1(fwln102i1) entered forwarding state
[ 4359.794332] fwbr102i1: port 2(tap102i1) entered blocking state
[ 4359.794336] fwbr102i1: port 2(tap102i1) entered disabled state
[ 4359.794423] fwbr102i1: port 2(tap102i1) entered blocking state
[ 4359.794425] fwbr102i1: port 2(tap102i1) entered forwarding state
[ 4359.875403] [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: start failed. status: 0x0
[ 4360.035734] fwbr102i1: port 2(tap102i1) entered disabled state
[ 4360.072927] fwbr102i1: port 1(fwln102i1) entered disabled state
[ 4360.072988] vmbr0: port 4(fwpr102p1) entered disabled state
[ 4360.075004] device fwln102i1 left promiscuous mode
[ 4360.075007] fwbr102i1: port 1(fwln102i1) entered disabled state
[ 4360.118588] device fwpr102p1 left promiscuous mode
[ 4360.118591] vmbr0: port 4(fwpr102p1) entered disabled state
[ 4370.388906] nvidia-vgpu-vfio 00000000-0000-0000-0000-000000000102: Removing from iommu group 75
[ 4370.388920] nvidia-vgpu-vfio 00000000-0000-0000-0000-000000000102: MDEV: detaching iommu

My Question is now, why it uses the wrong iommu Group? The Card is in 73 and the System uses 75.
Did I miss something?
 
Hi, @mclordo I tried same guide in form of original guide by Polloco (https://gitlab.com/polloloco/vgpu-proxmox), but new drivers have patches instead of separate files editing. I have P4 too and Proxmox 8.1

In video blogger edit several files by hand. last version of patches don't provide separate C files. New patched touches much more files and make some binary patching.

I patched drivers (latest supporting version is 535.161.08) and installed them as signed kernel modules (additional step with mokutils and mok keys rollout on boot), compiled rust script into .so, but I failed to load this .so. I tried unpatched version and it fail to make vgpu too.
In guide there are several fake services that loading that .so lib:
if I use nvidia-vgpud.service.d/vgpu_unlock.config and nvidia-vgpu-mgr.service.d/vgpu_unlock.config. It has same content, but I can't make any "systemctl enable nvidia-vgpu-mgr" and "systemctl enable nvidia-vgpud" because there are no .service files with service template. Without it system never touch them and so is not loaded.

How do you make this step with services? Do you enable them or they stay as it is? Do you apply patch or like polloloco suggested that patching is not needed.

My result right now:
"nvidia-smi vgpu" said that P4 don't support VGPU. in "nvidia-smi -q" VGPU section mentioning "Nvidia Virtual Application support". mdevctl don't return any options for vgpu.

I feel that I'm close to end but I need some fresh example, because I don't see any blog posts with P4 on Proxmox
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!