PCI passthrough machines (error code 1)

dabaers · Apr 16, 2022

Hey all, back for some more help!

I have two VM's I am trying to spin up with PCI passthrough. On both, I am getting QEMU failed with exit code 1.

VM-win

agent: 1
args: -cpu 'host, +kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off,kernel_irqchip=on'
balloon: 0
bios: ovmf
boot: order=scsi0;scsi2;scsi3
cores: 3
cpu: host,hidden=1,flags=+pcid
efidisk0: vmstore:114/vm-114-disk-1.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:03:00,pcie=1,x-vga=1
machine: pc-q35-6.1
memory: 2048
meta: creation-qemu=6.1.0,ctime=1650122941
name: pcipassthrough
numa: 1
ostype: win10
scsi0: vmstore:114/vm-114-disk-0.qcow2,size=40G
scsi2: proxmoxstore:iso/Windows-10.iso,media=cdrom,size=4449344K
scsi3: proxmoxstore:iso/virtio-win-0.1.208.iso,media=cdrom,size=543390K
scsihw: virtio-scsi-pci
smbios1: uuid=41adc4b4-8be9-40ec-a17c-d1e34acebbbf
sockets: 1
vmgenid: 0fad61cf-1017-4063-95c7-7863b6cea15d

Vm-pfsense

balloon: 0
bios: ovmf
boot: order=scsi0
cores: 1
cpu: host,hidden=1,flags=+pcid
efidisk0: vmstore:113/vm-113-disk-1.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci0: 0000:05:00
hostpci1: 0000:05:00
kvm: 1
machine: q35
memory: 512
meta: creation-qemu=6.1.0,ctime=1650076535
name: pfsense
numa: 0
ostype: l26
scsi0: vmstore:113/vm-113-disk-0.qcow2,size=40G
scsihw: virtio-scsi-pci
smbios1: uuid=8171eed0-1d62-45e1-b1da-1159382af3b1
sockets: 1
vmgenid: 978bcede-55e5-4cd7-9101-7b3139d7dd5f

both have the same errors when trying to start up. Drivers have been blacklisted, iommu and VT-d is working. I have also enabled interrupt remapping. Im sure it's a misconfiguration problem and not a hardware problem I'm just not sure where to look or what to adjust anymore. I have attempted.

echo 1 > /sys/module/kvm/parameters/ignore_msrs

Some additional information with my pci passthrough troubles.
https://forum.proxmox.com/threads/pci-passthrough-issue.108176/

As always, thank you for the help!

leesteken · Apr 16, 2022

Remove args: -cpu 'host, +kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off,kernel_irqchip=on' because it is already taken care of by the x-vga=1. Please don't blindly copy this stuff from others, in principle everything can be setup via the Proxmox web GUI.

Don't use flags=+pcid because AMD Ryzen does not support that (and the option in the Proxmox web GUI even tells you that it is for Intel). Also, no need to use hidden=1 in the cpu; lines. And kvm: 1 is really not necessary.

hostpci0: 0000:05:00
hostpci1: 0000:05:00

will give an error. You are try to passthrough the same multi-function device (with all its functions) twice. Remove the second line, all functions (sub-devices) will be passed through by the first line.

Did you check your IOMMU groups to make sure the 03:00.* and 05:00.* devices don't have other devices in their group (besides PCI bridges)? You can get a nice overview of the groups using

for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done

.

Please give more specific information about the errors (if the things above don't fix it). Is there anything in the Tasks log or does the journalctl -b 0 show something around the time of trying to start the VMs?

dabaers · Apr 17, 2022

leesteken said:
Remove args: -cpu 'host, +kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off,kernel_irqchip=on' because it is already taken care of by the x-vga=1. Please don't blindly copy this stuff from others, in principle everything can be setup via the Proxmox web GUI.

Don't use flags=+pcid because AMD Ryzen does not support that (and the option in the Proxmox web GUI even tells you that it is for Intel). Also, no need to use hidden=1 in the cpu; lines. And kvm: 1 is really not necessary.

hostpci0: 0000:05:00 hostpci1: 0000:05:00 will give an error. You are try to passthrough the same multi-function device (with all its functions) twice. Remove the second line, all functions (sub-devices) will be passed through by the first line.

Did you check your IOMMU groups to make sure the 03:00.* and 05:00.* devices don't have other devices in their group (besides PCI bridges)? You can get a nice overview of the groups using for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done.

Please give more specific information about the errors (if the things above don't fix it). Is there anything in the Tasks log or does the journalctl -b 0 show something around the time of trying to start the VMs?

It is an intel system just to clarify, I have been able to get an ubuntu instance running now with PCI passthrough, but the graphics card does not populate in the 03:00.0 or 03:00.1 by name. When I try hwinfo in terminal, it sees a VGA adapter. RDP worked until I tried to download the nvdia drivers on the VM. Now the VM appears to be dead, doesn't ping and rdp doesn't work.

leesteken · Apr 17, 2022

dabaers said:
It is an intel system just to clarify, I have been able to get an ubuntu instance running now with PCI passthrough, but the graphics card does not populate in the 03:00.0 or 03:00.1 by name. When I try hwinfo in terminal, it sees a VGA adapter. RDP worked until I tried to download the nvdia drivers on the VM. Now the VM appears to be dead, doesn't ping and rdp doesn't work.

Sorry i must have mistakenly looked at another person's post. At least my remark about pcid is therefore wrong.
PCI(e) devices have other ID's inside the VM, that is normal. If you can show the (textual) output of lspci inside the Ubuntu VM, I could double check.
Can you tell what change fixed the error and got you to successfully start the VM?
RDP to Ubuntu or Windows? Which NVidia drivers? Possibly, you need to save the GPU BIOS to a file and patch it. There are some topics here about getting NVidia GPUs to work WIndows VMs, but I have no experience with NVidia and Windows.

dabaers · Apr 17, 2022

All good! Here is the information for the down machine that did work until driver updates

ubuntu 20.04
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;net0
cores: 3
efidisk0: vmstore:113/vm-113-disk-0.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hostpci1: 0000:03:00,pcie=1,x-vga=1
machine: q35
memory: 2048
meta: creation-qemu=6.1.0,ctime=1650138121
name: pci-test
net0: virtio=0E:FF:91:21:34:12,bridge=vmbr0,firewall=1
numa: 1
ostype: l26
scsi0: vmstore:113/vm-113-disk-1.qcow2,size=50G
scsihw: virtio-scsi-pci
smbios1: uuid=27ade06a-a254-45c2-9d55-4615d2af108b
sockets: 1
vmgenid: 45d12fbc-43e8-4417-b568-19aea7d6cec2

This is the machine that I can no longer access with RDP. I downloaded the RDP client for Ubuntu and it worked until I attempted to update drivers from nouveau to nvidia 510. Additionally this is a templated machine clone so I spun up a new one to have the lspci output.

00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
00:1a.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #4 (rev 03)
00:1a.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #5 (rev 03)
00:1a.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #6 (rev 03)
00:1a.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #2 (rev 03)
00:1b.0 Audio device: Intel Corporation 82801I (ICH9 Family) HD Audio Controller (rev 03)
00:1c.0 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:1c.1 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:1c.2 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:1c.3 PCI bridge: Red Hat, Inc. QEMU PCIe Root port
00:1d.0 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #1 (rev 03)
00:1d.1 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #2 (rev 03)
00:1d.2 USB controller: Intel Corporation 82801I (ICH9 Family) USB UHCI Controller #3 (rev 03)
00:1d.7 USB controller: Intel Corporation 82801I (ICH9 Family) USB2 EHCI Controller #1 (rev 03)
00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev 92)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode] (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1fb2 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 10fa (rev a1)
05:01.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
05:02.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
05:03.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
05:04.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
06:05.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI
06:08.0 Communication controller: Red Hat, Inc. Virtio console
06:12.0 Ethernet controller: Red Hat, Inc. Virtio network device

To give you some context that may help, all pci components which I need to run are either going to be ubuntu or pfsense at this time. My main workload is video transcoding and deploying a tdarr server for h264 to h265 workloads. The VM does not have a regular user as all gaming and workstation loads are done through a hyper-v server.

For the server that got the driver update, I have not been able to access the machine to recover it. This is fine for that machine as it was a test bench for gpu passthrough.

leesteken · Apr 17, 2022

Passthrough appears to work. Does the Ubuntu VM show something on an attached physical monitor? I'm sorry but I can't really help with NVidia drivers and have no experience with fixing reset issues with Nvidia GPUs. Although I don't expect Quadro to have issues, as running it inside a VM should be supported by NVidia.
Please make sure the Quadro is not used during the POST or boot of the Proxmox host and prevent drivers from touching it by early binding it to vfio-pci. Does the SuperMicro motherboard have a Aspeed 2000 graphics or something that can be used?

dabaers · Apr 17, 2022

I appreciate all the help so far! As for the board, it does have integrated graphics, I'm pretty sure it is an Aspeed 2000. For the vfio-pci do you mean dropping that into the /etc/modules or would else where?

leesteken · Apr 17, 2022

dabaers said:
For the vfio-pci do you mean dropping that into the /etc/modules or would else where?

Yes, by putting options vfio-pci ids=... in a .conf-file in /etc/modules, using the ID of the GPU (and all other sub-devices) which you can find with lspci -nn. Here is an example. But you still need to make sure the system POST/boot screen does not use the Quadro (via the motherboard BIOS?).

Search

Search

PCI passthrough machines (error code 1)

dabaers

Member

leesteken

Distinguished Member

dabaers

Member

leesteken

Distinguished Member

dabaers

Member

leesteken

Distinguished Member

dabaers

Member

leesteken

Distinguished Member

We value your privacy