Passthrough failed

mclordo · Jun 25, 2023

I tried to setup a Tesla P4 as vGPU and followed roughly this guide.

The unlock of the GPU worked so far, and I'm able to list all vGPU profiles with mdevctl types.
Then I added the PCI Device to my VM(102) and select it according to the followed picture (default MDev 285) and started the VM.

After a brief moment it crashes with the following Log:

Code:

kvm: -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/00000000-0000-0000-0000-000000000102,id=hostpci0,bus=pci.0,addr=0x10: vfio 00000000-0000-0000-0000-000000000102: error getting device from group 75: Input/output error
Verify all devices in group 75 are bound to vfio-<bus> or pci-stub and not already in use
TASK ERROR: start failed: QEMU exited with code 1

System-Log:

Code:

[ 4359.019970] nvidia-vgpu-vfio 00000000-0000-0000-0000-000000000102: Adding to iommu group 75
[ 4359.019975] nvidia-vgpu-vfio 00000000-0000-0000-0000-000000000102: MDEV: group_id = 75
[ 4359.701417] device tap102i1 entered promiscuous mode
[ 4359.733316] vmbr0: port 4(fwpr102p1) entered blocking state
[ 4359.733321] vmbr0: port 4(fwpr102p1) entered disabled state
[ 4359.733399] device fwpr102p1 entered promiscuous mode
[ 4359.734405] vmbr0: port 4(fwpr102p1) entered blocking state
[ 4359.734408] vmbr0: port 4(fwpr102p1) entered forwarding state
[ 4359.788026] fwbr102i1: port 1(fwln102i1) entered blocking state
[ 4359.788031] fwbr102i1: port 1(fwln102i1) entered disabled state
[ 4359.788114] device fwln102i1 entered promiscuous mode
[ 4359.788179] fwbr102i1: port 1(fwln102i1) entered blocking state
[ 4359.788181] fwbr102i1: port 1(fwln102i1) entered forwarding state
[ 4359.794332] fwbr102i1: port 2(tap102i1) entered blocking state
[ 4359.794336] fwbr102i1: port 2(tap102i1) entered disabled state
[ 4359.794423] fwbr102i1: port 2(tap102i1) entered blocking state
[ 4359.794425] fwbr102i1: port 2(tap102i1) entered forwarding state
[ 4359.875403] [nvidia-vgpu-vfio] 00000000-0000-0000-0000-000000000102: start failed. status: 0x0
[ 4360.035734] fwbr102i1: port 2(tap102i1) entered disabled state
[ 4360.072927] fwbr102i1: port 1(fwln102i1) entered disabled state
[ 4360.072988] vmbr0: port 4(fwpr102p1) entered disabled state
[ 4360.075004] device fwln102i1 left promiscuous mode
[ 4360.075007] fwbr102i1: port 1(fwln102i1) entered disabled state
[ 4360.118588] device fwpr102p1 left promiscuous mode
[ 4360.118591] vmbr0: port 4(fwpr102p1) entered disabled state
[ 4370.388906] nvidia-vgpu-vfio 00000000-0000-0000-0000-000000000102: Removing from iommu group 75
[ 4370.388920] nvidia-vgpu-vfio 00000000-0000-0000-0000-000000000102: MDEV: detaching iommu

My Question is now, why it uses the wrong iommu Group? The Card is in 73 and the System uses 75.
Did I miss something?

mclordo · Nov 21, 2023

Error is resolved with Proxmox v8.0.3

FancyBee · Apr 20, 2024

Hi, @mclordo I tried same guide in form of original guide by Polloco (https://gitlab.com/polloloco/vgpu-proxmox), but new drivers have patches instead of separate files editing. I have P4 too and Proxmox 8.1

In video blogger edit several files by hand. last version of patches don't provide separate C files. New patched touches much more files and make some binary patching.

I patched drivers (latest supporting version is 535.161.08) and installed them as signed kernel modules (additional step with mokutils and mok keys rollout on boot), compiled rust script into .so, but I failed to load this .so. I tried unpatched version and it fail to make vgpu too.
In guide there are several fake services that loading that .so lib:
if I use nvidia-vgpud.service.d/vgpu_unlock.config and nvidia-vgpu-mgr.service.d/vgpu_unlock.config. It has same content, but I can't make any "systemctl enable nvidia-vgpu-mgr" and "systemctl enable nvidia-vgpud" because there are no .service files with service template. Without it system never touch them and so is not loaded.

How do you make this step with services? Do you enable them or they stay as it is? Do you apply patch or like polloloco suggested that patching is not needed.

My result right now:
"nvidia-smi vgpu" said that P4 don't support VGPU. in "nvidia-smi -q" VGPU section mentioning "Nvidia Virtual Application support". mdevctl don't return any options for vgpu.

I feel that I'm close to end but I need some fresh example, because I don't see any blog posts with P4 on Proxmox

Patrick Simpson · Apr 22, 2024

I am having very similar issues would love to have some guidance here...

Search

Search

Passthrough failed

mclordo

New Member

mclordo

New Member

FancyBee

New Member

Patrick Simpson

Active Member