Quadro P400 Passthrough

Hafnernuss

Member
Aug 11, 2021
10
0
6
31
Hi,

I know there are a lot of posts about (not working) PCie passthrough, and I am pretty sure I have read all of them. However, I do not seem to get it to work on my Quadro P400.

System spec:

Asus Crosshair V Formula
AMD FX-8350
Quadro P400
16 GB Ram
Standard SATA SSDs

The Quadro is the only GPU in the system. (I have read multiple times that this might be an issue, others say it isn't...)

Proxmox version:
Code:
pve-manager/7.0-8/b1dbf562 (running kernel: 5.11.22-1-pve)

Since the Crosshair V Formular (or in generall many boards with this chipset) is known to have a buggy bios, a hack has to be applied.
Here is my etc/default/grub:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=fullflush iommu=pt pcie_acs_override=downstream ivrs_ioapic[9]=00:14.0 ivrs_ioapic[10]=00:00.1 video=efifb:off"

Most people seem to use amd_iommu=on, but this doesnt seem to be valid according to the doc.

This is the content of my /etc/modprobe.d/blacklist.conf file:
Code:
blacklist nouveau
blacklist nvidia
blacklist snd_hda_intel

And the content of my /etc/modules file:
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

This is the output of dmesg | grep "AMD-Vi":
Code:
root@proxmox001:~# dmesg | grep "AMD-Vi"
[    2.050945] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    2.050947] AMD-Vi: Interrupt remapping enabled
[    2.051047] AMD-Vi: IO/TLB flush on unmap enabled
root@proxmox001:~#

Which.... seems fine?

The output of lspci -v: (Removed other details for shortness:
Code:
01:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P400] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: NVIDIA Corporation GP107GL [Quadro P400]
        Flags: bus master, fast devsel, latency 0, IRQ 65, NUMA node 0, IOMMU group 16
        Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Memory at d0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at e000 [size=128]
        Expansion ROM at 000c0000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Legacy Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [250] Latency Tolerance Reporting
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] Secondary PCI Express
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau

01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
        Subsystem: NVIDIA Corporation GP107GL High Definition Audio Controller
        Flags: bus master, fast devsel, latency 0, IRQ 66, NUMA node 0, IOMMU group 16
        Memory at fe080000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

Which seems to confirm that 1) the GPU is found correctly and B) the vfio-pci driver is in use.

At this point, I think that hardware/host wise, I am done? And that everything is working? Is there anything I could do to further check if the issue is related to my hardware or proxmox config?

I have set up a Windows 10 Pro (N) 21H1 VM with the following config:

Code:
bios: ovmf
boot: order=ide0;ide2;net0
cores: 2
cpu: host
efidisk0: SSD1_1TB:101/vm-101-disk-1.qcow2,size=128K
hostpci0: 0000:01:00,pcie=1,x-vga=1
ide0: SSD1_1TB:101/vm-101-disk-0.qcow2,size=32G
ide2: none,media=cdrom
machine: pc-q35-6.0
memory: 2048
name: GPUtest
net0: e1000=7A:F0:1D:85:D4:E0,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=304d1841-932d-41ad-b7fa-9f6517ecb5ac
sockets: 1
vga: virtio
vmgenid: 6fb8131a-7df9-425b-af5d-77f1d95f8760

The setup went smoothly, I installed nvidia driver 450.x and got a bluescreen related to nvlddmkm.sys (System Service Exception). I have read that this might be an issue with MSI support here but this is already enabled on my VM.
Booting with "primary GPU" unchecked works, but then I get a Code 43 in the device Manager for the Quadro P400.
Strangely, in the device manager I also see a PCI Device (bus 6, device 3) with a Code 28, and I have no Idea what that could be.

I would greatly appreciate any suggestions you might have :)
 
Most people seem to use amd_iommu=on, but this doesnt seem to be valid according to the doc.
This is true, I believe this comes from 'intel_iommu' where it is both valid and often necessary, however, the correct solution AFAICT is to just not add it, 'fullflush' does something different. The 'iommu=pt' line should be enough to enable it. `dmesg | grep iommu` should confirm it, if you see lines "Adding to iommu group X" it is working.

The setup went smoothly, I installed nvidia driver 450.x and got a bluescreen related to nvlddmkm.sys (System Service Exception). I have read that this might be an issue with MSI support here but this is already enabled on my VM.
Booting with "primary GPU" unchecked works, but then I get a Code 43 in the device Manager for the Quadro P400.
This would indicate to me that there is broken or limited UEFI or legacy mode support... Have you tried using the "romfile" option with a dumped or downloaded clean vBIOS? See our wiki for example.

Since the Crosshair V Formular (or in generall many boards with this chipset) is known to have a buggy bios, a hack has to be applied.
Here is my etc/default/grub:
Make sure the commandline is applied correctly by checking /proc/cmdline after a reboot. Otherwise I would personally guess that the broken IOMMU support might be to fault - never encountered this before, but it certainly doesn't look promising...
 
  • Like
Reactions: Hafnernuss
Hi Stefan,

Thanks for the quick response!
I will remove the fullflush command and just use iommu=pt.
Maybe the wiki needs an update, because it also mentions to use amd_iommu=on.

This is the output of dmesg grep iommu:
Code:
root@proxmox001:~# dmesg | grep iommu
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.11.22-1-pve root=/dev/mapper/pve-root ro quiet amd_iommu=fullflush iommu=pt pcie_acs_override=downstream ivrs_ioapic[9]=00:14.0 ivrs_ioapic[10]=00:00.1 video=efifb:off
[    0.062796] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.11.22-1-pve root=/dev/mapper/pve-root ro quiet amd_iommu=fullflush iommu=pt pcie_acs_override=downstream ivrs_ioapic[9]=00:14.0 ivrs_ioapic[10]=00:00.1 video=efifb:off
[    0.396169] iommu: Default domain type: Passthrough (set via kernel command line)
[    2.001907] pci 0000:00:00.0: Adding to iommu group 0
[    2.001921] pci 0000:00:02.0: Adding to iommu group 1
[    2.001933] pci 0000:00:04.0: Adding to iommu group 2
[    2.001945] pci 0000:00:05.0: Adding to iommu group 3
[    2.001956] pci 0000:00:06.0: Adding to iommu group 4
[    2.001967] pci 0000:00:07.0: Adding to iommu group 5
[    2.001980] pci 0000:00:09.0: Adding to iommu group 6
[    2.001991] pci 0000:00:11.0: Adding to iommu group 7
[    2.002012] pci 0000:00:12.0: Adding to iommu group 8
[    2.002024] pci 0000:00:12.2: Adding to iommu group 8
[    2.002048] pci 0000:00:13.0: Adding to iommu group 9
[    2.002060] pci 0000:00:13.2: Adding to iommu group 9
[    2.002072] pci 0000:00:14.0: Adding to iommu group 10
[    2.002084] pci 0000:00:14.2: Adding to iommu group 11
[    2.002095] pci 0000:00:14.3: Adding to iommu group 12
[    2.002108] pci 0000:00:14.4: Adding to iommu group 13
[    2.002120] pci 0000:00:14.5: Adding to iommu group 14
[    2.002140] pci 0000:00:16.0: Adding to iommu group 15
[    2.002153] pci 0000:00:16.2: Adding to iommu group 15
[    2.002183] pci 0000:01:00.0: Adding to iommu group 16
[    2.002202] pci 0000:01:00.1: Adding to iommu group 16
[    2.002215] pci 0000:02:00.0: Adding to iommu group 17
[    2.002227] pci 0000:03:00.0: Adding to iommu group 18
[    2.002239] pci 0000:04:00.0: Adding to iommu group 19
[    2.002253] pci 0000:05:00.0: Adding to iommu group 20
[    2.002266] pci 0000:06:00.0: Adding to iommu group 21

So this seems fine?

This would indicate to me that there is broken or limited UEFI or legacy mode support... Have you tried using the "romfile" option with a dumped or downloaded clean vBIOS? See our wiki for example.

I never considered that it might be a limited support, could I even install an EFI vm on a machine where EFI is not working properly on the host? Is there any way to check for that on the host machine? It seems I do not get any errors.

In the BIOS I have enabled the following settings: SVM, IOMMU, IOMMU_MODe is set to 64MB (alternative is OFF), Initia Graphic Adapter to PCI/PEG (alternative is PEG/PCI). However, there seem to be no settings regarding EFI related settings.

Thanks for the tip with the bios, I tried this rom parser tool, and the output is worrisome:

Code:
Valid ROM signature found @0h, PCIR offset 170h
        PCIR: type 0 (x86 PC-AT), vendor: 10de, device: 1cb3, class: 030000
        PCIR: revision 0, vendor revision: 1
Error, ran off the end

I tried to move the card to another PCIe slot, but as soon as I do that, I can no longer connect to the machine via network. (Login screen is shown on a connected monitor, but the Ethernet LEDs are all off)

Greetings from Graz :)
 
Maybe the wiki needs an update, because it also mentions to use amd_iommu=on.
I'll keep it in mind, but it also doesn't hurt, I checked the kernel source and any unsupported argument is simply ignored - I think it might actually help to have it in here, it's easier to say "use intel_iommu=on or amd_iommu=on depending on your chip" than it is to make more exceptions of stuff you only need to do in certain scenarios :)

I never considered that it might be a limited support, could I even install an EFI vm on a machine where EFI is not working properly on the host? Is there any way to check for that on the host machine? It seems I do not get any errors.
Sorry, let me rephrase that: Maybe your GPU has broken UEFI or legacy VGA support. This is relevant to passthrough VMs, as the device is attached on a very low level. This depends on the vBIOS (v for video), which is distinct from the regular BIOS.

For the regular guest and host BIOS you can of course use whatever you want, although UEFI is recommended for passthrough.

Thanks for the tip with the bios, I tried this rom parser tool, and the output is worrisome:
You can try to dump the BIOS when in a clean state, though that would usually require a second GPU to be installed temporarily and used as boot GPU, so the NVIDIA one doesn't get initialized.

You might also be able to find a ROM dump online, at your own risk of course. techpowerup has a collection: https://www.techpowerup.com/vgabios/
 
Is the output of my rom-parser tool maybe the confirmation that my GPUs vbios is broken? I found a bios on techpowerup, but it's not verified, though all the specs map. I'm willing to risk it (its only for testing purposes anyway).

Just so that I get that correctly: Do I have to flash the actual GPU with this Bios or is it only needed for passing to the VM? I would guess... it has to be flashed onto the real card, right?

regards,

Philipp
 
Is the output of my rom-parser tool maybe the confirmation that my GPUs vbios is broken? I found a bios on techpowerup, but it's not verified, though all the specs map. I'm willing to risk it (its only for testing purposes anyway).
Maybe. The rom-parser tool is not perfect either, but it could be a hint.

Just so that I get that correctly: Do I have to flash the actual GPU with this Bios or is it only needed for passing to the VM? I would guess... it has to be flashed onto the real card, right?
Oh no, not at all - read through the article I linked in the first post, you just need to put the parameter into the VM config. This does not permanently alter your card, and since there are also hardware safeguards, unless the vBIOS and/or card is entirely broken it shouldn't be possible to damage anything either.

The reason why it's still a risk is that you're running potentially untrusted code at system privileges within your VM - it can't escape to the host, but when using the unverified techpowerup one, you maybe should be cautious when putting important data into the guest.
 
Tried it, unfortunately nothing changed. (BSOD as soon as I turn on "primary GPU" for the VM).

I have to stop for today, but I will try again in a few days. Maybe I will try it with a clean and fresh install...
In the meantime, thanks for all the valuable input!
 
Hi,
I am able to boot the VM and connect via RDP, however, still Code 43. The last output of dmesg is:
Code:
root@proxmox001:~# dmesg | grep -i vfio
[    4.741904] VFIO - User Level meta-driver version: 0.3
[    4.749689] vfio-pci 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[    4.766409] vfio_pci: add [10de:1cb3[ffffffff:ffffffff]] class 0x000000/00000000
[    4.786392] vfio_pci: add [10de:0fb9[ffffffff:ffffffff]] class 0x000000/00000000
[   71.628708] vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[   71.630402] vfio-pci 0000:01:00.0: No more image in the PCI ROM
[ 1474.101508] vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[ 1474.103010] vfio-pci 0000:01:00.0: No more image in the PCI ROM

I was able to dump the (real) vbios with nvflash and checked it with the rom-parser, oddly enough this seems okay?:

Code:
root@proxmox001:~/rom-parser# ./rom-parser /home/nvflash/x64/original.rom
Valid ROM signature found @a00h, PCIR offset 170h
        PCIR: type 0 (x86 PC-AT), vendor: 10de, device: 1cb3, class: 030000
        PCIR: revision 0, vendor revision: 1
Valid ROM signature found @fa00h, PCIR offset 1ch
        PCIR: type 3 (EFI), vendor: 10de, device: 1cb3, class: 030000
        PCIR: revision 3, vendor revision: 0
                EFI: Signature Valid, Subsystem: Boot, Machine: X64
        Last image

Is there any point in trying with SeaBios?
 
I am able to boot the VM and connect via RDP, however, still Code 43. The last output of dmesg is:
The output looks normal, nothing out of the ordinary. I'd say maybe even look more into troubleshooting the guest - are you using the correct NVIDIA driver? (not sure, but maybe for Quadros you need a special one?) Does it work if you boot Windows bare-metal?

Is there any point in trying with SeaBios?
This is all somewhat experimental technology, if you have the time there's point in trying almost everything ;)
 
I am using the following driver:
471.68-quadro-rtx-desktop-notebook-win10-win11-64bit-international-whql

I didn't try it on a bare metal machine yet (not that much spare parts around...)

But I tried it on a ubuntu vm... and things are looking better?
Code:
est@test-Standard-PC-Q35-ICH9-2009:~$ sudo nvidia-smi
[sudo] password for test:
Mon Aug 16 20:05:40 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P400         Off  | 00000000:01:00.0  On |                  N/A |
| 34%   41C    P8    N/A /  N/A |    140MiB /  2000MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       712      G   /usr/lib/xorg/Xorg                 40MiB |
|    0   N/A  N/A      1005      G   /usr/bin/gnome-shell               97MiB |
+-----------------------------------------------------------------------------+
test@test-Standard-PC-Q35-ICH9-2009:~$

At least no errors? And when connecting a monitor to one of the three P400s DP ports, I actually see the ubuntu screen... This really makes me think that everything (including hardware) on the host is working fine, and the issue is indee something with my windows vm or the drivers there. Maybe it's because im trying to use Windows 10 N (!). I will try a "normal" Win10 tomorrow.
 
I tried it with a "normal" Windows 10 Pro, however, I could not get it to work. I am out of ideas, especially since I think that hostwise everything is working correctly, and I do not know what I could change in the windows guest.
 
Hi !
I'm not familiar with Quadro GPU, but I was struggling with an old GT640 giving me the infamous error 43 until I managed to extract properly the ROM file from my GPU.
Have a look at this thread, the given extraction method worked for me (then I had to add the GOP to the rom file using GOPUpd).
 
Hi, thanks for the suggestion. I dumped the rom file exactly as described, and GOPupd it, also as described. However, the VM fails to even POST when using the resulting ROM, is there a way to find out what happended, are there any logs?

regards
 
dmesg might help to understand what's going on, I'm not sure.

Also, Did you try with the dumped rom before using GopUpdate ?

GopUpd is necessary only if your original GPU rom doesn't have an EFI GOP (usually quite old models), in order to add one. That might not be the case of your Quadro, I don't know... GopUpd let you know if your rom file has the gop or not before you decide to patch it.
 
dmesg output (the same as before):
Code:
root@proxmox001:~# dmesg | grep -i vfio
[    4.741904] VFIO - User Level meta-driver version: 0.3
[    4.749689] vfio-pci 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[    4.766409] vfio_pci: add [10de:1cb3[ffffffff:ffffffff]] class 0x000000/00000000
[    4.786392] vfio_pci: add [10de:0fb9[ffffffff:ffffffff]] class 0x000000/00000000
[   71.628708] vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[   71.630402] vfio-pci 0000:01:00.0: No more image in the PCI ROM
[ 1474.101508] vfio-pci 0000:01:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[ 1474.103010] vfio-pci 0000:01:00.0: No more image in the PCI ROM

I tried with the pure dump before, I can boot with it but Error 43 remains. GOPupd tells me that no EFI signature is found, and I patched it accordingly (with identifier 7, as that stands for GP1xx GPUs (Pascal).

With the GOPupd vbios I seem to get no POST in the vm, but I have no idea how to check the actual error.
 
I don't have much ideas to help you, but are you sure you were able to follow the procedure in the link I gave you to extract the ROM ? This requires to first install the GPU as a secondary on the VM, meaning that you need a working primary GPU for that VM. Then, you should be able to boot, install the driver for the secondary GPU and be sure that it works without errors. Then you switch off the VM and extract the ROM of the secondary GPU from the host. Optionally GopUpdate it if necessary...
 
Not exactly, as I currently don't have a second GPU I could install in the machine (yes, really). I guess that may be a problem, although I can confirm that I have disabled the GPU for use outside the VM alltogether. I will see if I can come up with a second GPU for testing purposes.
 
@Hafnernuss

I know this is not windows but have you looked at this pcie passthrough example? https://www.youtube.com/watch?v=-HCzLhnNf-A

I tried this alternative and it kind of worked for me. The issue I am having is the gpu fans stop spinning but plex does transcode using the gpu but no spinning fans worries me on the long term efficacy of the solution.

It did work for me without the need to do the ROM parsing.

In any case, I hope this helps.

PS I also do not have a second GPU so it was a struggle to troubleshoot when you cannot see the output of the console and your nic stops functioning, not fun.
 
Hi @Hafnernuss, did you get it to work at the last? I have the same GPU, and passthrough was working fine on Proxmox 6.x. I updated to Proxmox 7.x and passthrough stopped working with Error 43. Would be great if you could share any further progress you might have done. Thanks!