Hi All,
First time poster here looking for some assistance. Please be gentle
Okay, so before I go bald trying to solve this issue. I thought I would ask this helpful community for some advice.
For some time now, I have been trying to get GPU passthrough working for my ubuntu 22.04 (6.2.0-32-generic) VM. Reason being, that I have a Jellyfin server running for which I would like to enable HW transcoding.
Following some very helpful guides online, I had managed to get this working for a time. However, after having updated the ubuntu VM to the newest kernel. I was presented with the error message:
pci_hp_register failed with error - 16
Of course, as you often do, I immediately took to google in search for an answer. I stumbled upon some threads on here which mentioned that disabling ACPI Support for the VM would solve the issue. Having tried this though, it seems that when this is enabled the VM doesn't detect the GPU at all.
So of course you will be wondering what my setup looks like:
PVE version 8 (only recently updated as part of troubleshooting, was on pve7.4 before)
Contents of /etc/default/grub:
Output of dmesg | grep -E "DMAR|IOMMU"
/etc/modprobe.d/vfio.conf
/etc/modprobe.d/blacklist.conf
From what i can tell, this seems to be the consensus for this type of setup online.
The VM is setup as follows:
Curiously, when the VM is running I loose the ability to run lshw and lspci, they seem to freeze the ssh and vnc console sessions.
Contents of /dev/dri:
card0 card1 render128
I am fairly new to this sys admin stuff, so any and all help would be appreciated!
If i have dumb questions following your probably very smart answers, that is because I am in fact dumb
First time poster here looking for some assistance. Please be gentle
Okay, so before I go bald trying to solve this issue. I thought I would ask this helpful community for some advice.
For some time now, I have been trying to get GPU passthrough working for my ubuntu 22.04 (6.2.0-32-generic) VM. Reason being, that I have a Jellyfin server running for which I would like to enable HW transcoding.
Following some very helpful guides online, I had managed to get this working for a time. However, after having updated the ubuntu VM to the newest kernel. I was presented with the error message:
pci_hp_register failed with error - 16
Of course, as you often do, I immediately took to google in search for an answer. I stumbled upon some threads on here which mentioned that disabling ACPI Support for the VM would solve the issue. Having tried this though, it seems that when this is enabled the VM doesn't detect the GPU at all.
So of course you will be wondering what my setup looks like:
PVE version 8 (only recently updated as part of troubleshooting, was on pve7.4 before)
Contents of /etc/default/grub:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction video=vesa:off vfio-pci.ids=1002:67df,1002:aaf0 kvm.ignore_msrs=1 initcall_blacklist=sysfb_init"
GRUB_CMDLINE_LINUX=""
Output of dmesg | grep -E "DMAR|IOMMU"
[ 0.000000] Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA
[ 0.008870] ACPI: DMAR 0x00000000BB8051A0 0000A8 (v01 A M I OEMDMAR 00000001 INTL 00000001)
[ 0.008893] ACPI: Reserving DMAR table memory at [mem 0xbb8051a0-0xbb805247]
[ 0.125064] DMAR: IOMMU enabled
[ 0.339221] DMAR: Host address width 46
[ 0.339223] DMAR: DRHD base: 0x000000ef844000 flags: 0x1
[ 0.339231] DMAR: dmar0: reg_base_addr ef844000 ver 1:0 cap d2078c106f0462 ecap f020ff
[ 0.339234] DMAR: RMRR base: 0x000000bb6c1000 end: 0x000000bb6ecfff
[ 0.339236] DMAR: ATSR flags: 0x0
[ 0.339239] DMAR-IR: IOAPIC id 0 under DRHD base 0xef844000 IOMMU 0
[ 0.339241] DMAR-IR: IOAPIC id 2 under DRHD base 0xef844000 IOMMU 0
[ 0.339243] DMAR-IR: HPET id 0 under DRHD base 0xef844000
[ 0.339244] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[ 0.339625] DMAR-IR: Enabled IRQ remapping in x2apic mode
[ 1.273386] DMAR: [Firmware Bug]: RMRR entry for device 0b:00.0 is broken - applying workaround
[ 1.273403] DMAR: No SATC found
[ 1.273405] DMAR: dmar0: Using Queued invalidation
[ 1.276184] DMAR: Intel(R) Virtualization Technology for Directed I/O
[ 68.980977] DMAR: VT-d detected Invalidation Time-out Error: SID 0
[ 68.980984] DMAR: QI HEAD: Interrupt Entry Cache Invalidation qw0 = 0x2500000014, qw1 = 0x0
[ 68.981030] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x10005219c
[ 68.981048] DMAR: Invalidation Time-out Error (ITE) cleared
[ 68.981051] DMAR: VT-d detected Invalidation Completion Error: SID 0
[ 68.981052] DMAR: QI HEAD: Interrupt Entry Cache Invalidation qw0 = 0x2600000014, qw1 = 0x0
[ 68.981083] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000521a4
[ 68.981100] DMAR: Invalidation Completion Error (ICE) cleared
[ 69.337534] DMAR: VT-d detected Invalidation Time-out Error: SID 0
[ 69.337543] DMAR: QI HEAD: Interrupt Entry Cache Invalidation qw0 = 0x3500000014, qw1 = 0x0
[ 69.337592] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000522f4
[ 69.337610] DMAR: Invalidation Time-out Error (ITE) cleared
[ 69.337613] DMAR: VT-d detected Invalidation Completion Error: SID 0
[ 69.337614] DMAR: QI HEAD: Interrupt Entry Cache Invalidation qw0 = 0x3600000014, qw1 = 0x0
[ 69.337646] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000522fc
[ 69.337662] DMAR: Invalidation Completion Error (ICE) cleared
[ 69.617155] DMAR: VT-d detected Invalidation Time-out Error: SID 0
[ 69.617163] DMAR: QI HEAD: Interrupt Entry Cache Invalidation qw0 = 0x1a00000014, qw1 = 0x0
[ 69.617180] DMAR: DRHD: handling fault status reg 60
[ 69.617194] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x10005237c
[ 69.617244] DMAR: Invalidation Time-out Error (ITE) cleared
[ 69.617246] DMAR: VT-d detected Invalidation Completion Error: SID 0
[ 69.617247] DMAR: QI HEAD: Context-cache Invalidation qw0 = 0x40000010031, qw1 = 0x0
[ 69.617277] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x100052384
[ 69.617294] DMAR: Invalidation Completion Error (ICE) cleared
[ 69.619105] DMAR: VT-d detected Invalidation Time-out Error: SID 0
[ 69.619108] DMAR: QI HEAD: Context-cache Invalidation qw0 = 0x40000010031, qw1 = 0x0
[ 69.619145] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x100052394
[ 69.619162] DMAR: Invalidation Time-out Error (ITE) cleared
[ 69.619165] DMAR: VT-d detected Invalidation Completion Error: SID 0
[ 69.619166] DMAR: QI HEAD: IOTLB Invalidation qw0 = 0x100e2, qw1 = 0x0
[ 69.619195] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x10005239c
[ 69.619211] DMAR: Invalidation Completion Error (ICE) cleared
[ 69.621004] DMAR: VT-d detected Invalidation Time-out Error: SID 0
[ 69.621008] DMAR: QI HEAD: Device-TLB Invalidation qw0 = 0x40000000003, qw1 = 0x0
[ 69.621046] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000523ac
[ 69.621064] DMAR: Invalidation Time-out Error (ITE) cleared
[ 69.621066] DMAR: VT-d detected Invalidation Completion Error: SID 0
[ 69.621067] DMAR: QI HEAD: IOTLB Invalidation qw0 = 0x200f2, qw1 = 0x1000
[ 69.621096] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000523b4
[ 69.621112] DMAR: Invalidation Completion Error (ICE) cleared
[ 72.233929] DMAR: VT-d detected Invalidation Time-out Error: SID 0
[ 72.233936] DMAR: QI HEAD: IOTLB Invalidation qw0 = 0x200e2, qw1 = 0x0
[ 72.233944] DMAR: DRHD: handling fault status reg 60
[ 72.233968] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x100052044
[ 72.234014] DMAR: Invalidation Time-out Error (ITE) cleared
[ 72.234017] DMAR: VT-d detected Invalidation Completion Error: SID 0
[ 72.234018] DMAR: QI HEAD: Device-TLB Invalidation qw0 = 0x40000000003, qw1 = 0x3ffff001
[ 72.234049] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x10005204c
[ 72.234065] DMAR: Invalidation Completion Error (ICE) cleared
[ 78.381120] DMAR: VT-d detected Invalidation Time-out Error: SID 0
[ 78.381122] DMAR: DRHD: handling fault status reg 60
[ 78.381126] DMAR: VT-d detected Invalidation Completion Error: SID 0
[ 78.381127] DMAR: QI HEAD: Interrupt Entry Cache Invalidation qw0 = 0x1400000014, qw1 = 0x0
[ 78.381202] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x10005209c
[ 78.381220] DMAR: Invalidation Time-out Error (ITE) cleared
[ 78.381223] DMAR: VT-d detected Invalidation Completion Error: SID 0
[ 78.381224] DMAR: QI HEAD: Interrupt Entry Cache Invalidation qw0 = 0x1400000014, qw1 = 0x0
[ 78.381255] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000520a4
[ 78.381271] DMAR: Invalidation Completion Error (ICE) cleared
[ 78.875845] DMAR: VT-d detected Invalidation Time-out Error: SID 0
[ 78.875848] DMAR: DRHD: handling fault status reg 60
[ 78.875854] DMAR: DRHD: handling fault status reg 60
[ 78.875880] DMAR: QI HEAD: Device-TLB Invalidation qw0 = 0x40000000003, qw1 = 0xffff001
[ 78.875925] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000520e4
[ 78.875943] DMAR: Invalidation Time-out Error (ITE) cleared
[ 78.875946] DMAR: VT-d detected Invalidation Completion Error: SID 0
[ 78.875947] DMAR: QI HEAD: IOTLB Invalidation qw0 = 0x200e2, qw1 = 0x0
[ 78.875975] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000520ec
[ 78.875991] DMAR: Invalidation Completion Error (ICE) cleared
[ 78.875999] DMAR: VT-d detected Invalidation Time-out Error: SID 0
[ 78.876000] DMAR: QI HEAD: IOTLB Invalidation qw0 = 0x200e2, qw1 = 0x0
[ 78.876000] DMAR: DRHD: handling fault status reg 40
[ 78.876005] DMAR: DRHD: handling fault status reg 60
[ 78.876014] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000520ec
[ 78.876067] DMAR: Invalidation Time-out Error (ITE) cleared
[ 78.876069] DMAR: VT-d detected Invalidation Completion Error: SID 0
[ 78.876070] DMAR: QI HEAD: Device-TLB Invalidation qw0 = 0x40000000003, qw1 = 0x3ffff001
[ 78.876100] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1000520f4
[ 78.876117] DMAR: Invalidation Completion Error (ICE) cleared
[ 78.884622] DMAR: VT-d detected Invalidation Time-out Error: SID 0
[ 78.884627] DMAR: QI HEAD: Device-TLB Invalidation qw0 = 0x40000000003, qw1 = 0xef001
[ 78.884645] DMAR: DRHD: handling fault status reg 60
[ 78.884653] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x100052124
[ 78.884702] DMAR: Invalidation Time-out Error (ITE) cleared
[ 78.884705] DMAR: VT-d detected Invalidation Completion Error: SID 0
[ 78.884706] DMAR: QI HEAD: IOTLB Invalidation qw0 = 0x200f2, qw1 = 0x100008
[ 78.884734] DMAR: QI PRIOR: Invalidation Wait qw0 = 0x200000025, qw1 = 0x10005212c
[ 78.885714] DMAR: Invalidation Completion Error (ICE) cleared
root@pve:~# dmesg | grep -i vfio
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.2.16-12-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction video=vesa:off vfio-pci.ids=1002:67df,1002:aaf0 kvm.ignore_msrs=1 initcall_blacklist=sysfb_init
[ 0.124977] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.2.16-12-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction video=vesa:off vfio-pci.ids=1002:67df,1002:aaf0 kvm.ignore_msrs=1 initcall_blacklist=sysfb_init
[ 14.557923] VFIO - User Level meta-driver version: 0.3
[ 14.571592] vfio-pci 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[ 14.571842] vfio_pci: add [1002:67df[ffffffff:ffffffff]] class 0x000000/00000000
[ 14.596201] vfio_pci: add [1002:aaf0[ffffffff:ffffffff]] class 0x000000/00000000
[ 21.047516] vfio-pci 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=io+mem:owns=none
[ 21.050760] vfio-pci 0000:04:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[ 63.422566] vfio-pci 0000:04:00.0: AMD_POLARIS10: version 1.1
[ 63.422574] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing pre-reset
[ 63.434447] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing reset
[ 63.434455] vfio-pci 0000:04:00.0: AMD_POLARIS10: CLOCK_CNTL: 0x0, PC: 0x28e4
[ 63.434458] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing post-reset
[ 63.458469] vfio-pci 0000:04:00.0: AMD_POLARIS10: reset result = 0
[ 68.516604] vfio-pci 0000:04:00.0: enabling device (0400 -> 0403)
[ 68.516756] vfio-pci 0000:04:00.0: AMD_POLARIS10: version 1.1
[ 68.516760] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing pre-reset
[ 68.528986] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing reset
[ 68.528994] vfio-pci 0000:04:00.0: AMD_POLARIS10: CLOCK_CNTL: 0x0, PC: 0x2a3c
[ 68.528997] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing post-reset
[ 68.553289] vfio-pci 0000:04:00.0: AMD_POLARIS10: reset result = 0
[ 68.557765] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
[ 68.558008] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
[ 68.558182] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x1e@0x370
[ 68.606361] vfio-pci 0000:04:00.1: enabling device (0100 -> 0102)
[ 68.742390] vfio-pci 0000:04:00.0: AMD_POLARIS10: version 1.1
[ 68.742399] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing pre-reset
[ 68.742554] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing reset
[ 68.742560] vfio-pci 0000:04:00.0: AMD_POLARIS10: CLOCK_CNTL: 0x0, PC: 0x2a40
[ 68.742562] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing post-reset
[ 68.766441] vfio-pci 0000:04:00.0: AMD_POLARIS10: reset result = 0
[ 1925.654093] vfio-pci 0000:04:00.0: AMD_POLARIS10: version 1.1
[ 1925.654102] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing pre-reset
[ 1925.654274] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing reset
[ 1925.654280] vfio-pci 0000:04:00.0: AMD_POLARIS10: CLOCK_CNTL: 0x0, PC: 0x28dc
[ 1925.654282] vfio-pci 0000:04:00.0: AMD_POLARIS10: performing post-reset
[ 1925.678492] vfio-pci 0000:04:00.0: AMD_POLARIS10: reset result = 0
/etc/modprobe.d/vfio.conf
# Make vfio-pci a pre-dependency of the usual video modules
softdep amdgpu pre: vfio-pci
softdep radeon pre: vfio-pci
# Have vfio-pci grab the XFX 580 device IDs on boot
options vfio-pci ids=1002:67df,1002:aaf0 disable_vga=1
options vfio-pci ids=1002:67df,1002:aaf0
softdep radeon pre: vfio-pci
softdep amdgpu pre: vfio-pci
softdep nouveau pre: vfio-pci
softdep nvidia pre: vfio-pci
softdep nvidiafb pre: vfio-pci
softdep nvidia_drm pre: vfio-pci
softdep drm pre: vfio-pci
softdep snd_hda_intel pre: vfio-pci
softdep snd_hda_codec_hdmi pre: vfio-pci
softdep i915 pre: vfio-pci
softdep snd_hda_codec_hdmi pre: vfio-pci
/etc/modprobe.d/blacklist.conf
blacklist radeon
blacklist amdgpu
blacklist snd_hda_intel
root@pve:~# lshw -c display
*-display
description: VGA compatible controller
product: GK106 [GeForce GTX 660]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:05:00.0
logical name: /dev/fb0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom fb
configuration: depth=32 driver=nouveau latency=0 resolution=2560,1440
resources: irq:58 memory:ee000000-eeffffff memory:d8000000-dfffffff memory:e0000000-e1ffffff ioport:a000(size=128) memory:c0000-dffff
*-display
description: VGA compatible controller
product: Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:04:00.0
version: e1
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller cap_list rom
configuration: driver=vfio-pci latency=0
resources: irq:28 memory:c0000000-cfffffff memory:d0000000-d01fffff ioport:b000(size=256) memory:ef700000-ef73ffff memory:ef740000-ef75ffff
From what i can tell, this seems to be the consensus for this type of setup online.
The VM is setup as follows:
Curiously, when the VM is running I loose the ability to run lshw and lspci, they seem to freeze the ssh and vnc console sessions.
Contents of /dev/dri:
card0 card1 render128
I am fairly new to this sys admin stuff, so any and all help would be appreciated!
If i have dumb questions following your probably very smart answers, that is because I am in fact dumb
Last edited: