Intel Arc A770 PCI passthrough to W11 stuck at PCIe x1 1.1 @ x1 1.1

VioHar

New Member
Feb 6, 2025
1
0
1
I'm trying to figure out why the A770 that I'm passing into a W11 VM seems to be stuck a PCIe x1 speeds, I've checked this via:
  • GPU-Z
  • HWiNFO
  • Intel Graphics Software
All reporting the link speed as x1:
  • GPU-Z
    • Initially shows "PCIe x1 1.1 @ x1 1.1"
    • Then switches to just "PCI-Express"
    • Even when running the load test via the "?" icon
  • HWiNFO
    • PCIe v1.1 x1 (2.5 GT/s) @ x1 (2.5 GT/s)
  • Intel Graphics Software
    • PCIe -1 x-1
To further try and rule out a reporting error I also ran 3DMark Time Spy and that results in ~9000 instead of expected ~14000 which further indicates the GPU being bottlenecked.

The overall system specs are:
  • Ryzen 9 7900X3D
  • MSI MAG B650 TOMAHAWK WIFI
    • Latest BIOS version: 7D75v1L
  • 64GB G.Skill Flare X5 DDR5-5200 CL36 Dual Kit
  • ASRock ARC A770 Phantom Gaming OC
    • Intel Graphics Driver Version 32.0.101.6557
  • 2x 2TB Samsung 970 Evo Plus M.2
  • 1x 2TB Samsung 990 Evo Plus M.2
On the MB I have enabled ReBAR with Above 4G, I've tried with all of the following settings:
  • ASPM Control for CPU PCIe
    • Disabled
    • Auto
    • L0 Entry
    • L1 Entry
    • L0s And L1 Entry
  • Integrated Graphics
    • Force
    • UMA Auto
  • PCI_E1 Lanes Configuration
    • Auto
    • x8+x8
  • PCIe Link Speed
    • Auto
    • Gen4
I've also tried the second PCIe Slot on the MB, same results;

The Windows 11 VM is configured as follows:
Code:
args: -cpu host,-hypervisor,kvm=off -smbios type=0,vendor="American Megatrends International LLC.",version=1.D0,date=12/15/2023
balloon: 0
bios: ovmf
boot: order=scsi0;net0
cores: 12
cpu: host,hidden=1
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hookscript: local:snippets/intel-dGPU-hookscript.sh
hostpci0: 0000:03:00,pcie=1,x-vga=1
hostpci1: 0000:04:00.0,pcie=1
hostpci2: 0000:13:00.0,pcie=1
machine: pc-q35-9.0
memory: 32768
meta: creation-qemu=8.1.5,ctime=1707422170
name: W11
net0: e1000=00:11:75:CC:C8:8B,bridge=vmbr0,firewall=1
numa: 1
ostype: win11
scsi0: vms-ssd-2:vm-100-disk-0,cache=writethrough,iothread=1,size=1T,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=ed7e948c-a8c6-fb17-a892-d843ae50ead7,manufacturer=TWljcm8tU3RhciBJbnRlcm5hdGlvbmFsIENvLiwgTHRkLg==,product=TVMtN0Q3NQ==,version=MS4w,base64=1
sockets: 1
tpmstate0: local-lvm:vm-100-disk-1,size=4M,version=v2.0
vmgenid: 6254df7a-8852-4f08-8c39-6afc3466915e

The PCI devices are as follows:
  • 0000:03:00 => A770 GPU
    • VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
  • 0000:04:00.0 => A770 Audio
    • Audio device: Intel Corporation DG2 Audio Controller
  • 0000:13:00.0 => 990 Evo Plus M.2
I've blacklisted the GPU drivers in /etc/modprobe.d/pve-blacklist.conf as follows:
Code:
blacklist nvidiafb
blacklist nouveau
blacklist nvidia*
blacklist snd_hda_intel
blacklist snd_hda_codec_hdmi
blacklist i915
blacklist xe

and also ran update-initramfs -u then rebooted the machine to apply those when I set this all up.

My /etc/default/grub contains the following line (and I rebuild the grub after adding this):
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt vfio-pci.ids=8086:56a0 initcall_blacklist=sysfb_init pcie_aspm=off pci=pcie_bus_perf"

The hookscript contains the following (based on this thread) in order to prevent the host from looking up due to the flr / bus erros when the system tries to reset the A770, and to pin the VM to specific cpu cores:
Code:
#!/bin/bash
set -e -o errexit -o pipefail -o nounset

# located in /var/lib/vz/snippets/intel-dGPU-hookscript.sh or your 'snippets' location.
# Add to VM via: qm set VMID --hookscript local:snippets/intel-dGPU-hookscript.sh

# Do not modify these variables (set by Proxmox when calling the script)
vmId="$1"
runPhase="$2"
echo "Running $runPhase on VM=$vmId"

case "$runPhase" in
    pre-start)
        # Clear the reset methods before each start of the VM, to prevent the PVE host locking up
        # flr and bus methods dont work and re-appear after reboots.
        # bus method may still be attempted in subseqent VM starts, even if reset_method already cleared.
        echo "VM=$vmId - $runPhase : Clearing Intel dGPU and audio device reset_methods."
        echo > /sys/bus/pci/devices/0000:03:00.0/reset_method
        echo > /sys/bus/pci/devices/0000:04:00.0/reset_method

        # will appear as the following in journalctl:
        #  kernel: vfio-pci 0000:03:00.0: All device reset methods disabled by user
        #  kernel: vfio-pci 0000:04:00.0: All device reset methods disabled by user
    ;;

    post-start)
        main_pid="$(< /run/qemu-server/$vmId.pid)"
        taskset --cpu-list  --all-tasks --pid "0-9" "$main_pid"
        echo "VM=$vmId - $runPhase : Pinning all tasks to threads physical cores 0-9 for PID $main_pid"
      ;;

    pre-stop)
        # placeholder .
        echo "VM=$vmId - $runPhase : No Action."
      ;;
    post-stop)
        # placeholder .
        echo "VM=$vmId - $runPhase : No Action."
      ;;
    *)
      echo "Unknown run phase \"$runPhase\"!"
      ;;
esac
echo "Finished $runPhase on VM=$vmId"

This is the output of lspci -nnv -s 0000:03:00.0:
Code:
03:00.0 VGA compatible controller [0300]: Intel Corporation DG2 [Arc A770] [8086:56a0] (rev 08) (prog-if 00 [VGA controller])
    Subsystem: ASRock Incorporation DG2 [Arc A770] [1849:6010]
    Flags: bus master, fast devsel, latency 0, IRQ 159, IOMMU group 16
    Memory at f4000000 (64-bit, non-prefetchable) [size=16M]
    Memory at f800000000 (64-bit, prefetchable) [size=16G]
    Expansion ROM at f5000000 [disabled] [size=2M]
    Capabilities: [40] Vendor Specific Information: Len=0c <?>
    Capabilities: [70] Express Endpoint, MSI 00
    Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
    Capabilities: [d0] Power Management version 3
    Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
    Capabilities: [420] Physical Resizable BAR
    Capabilities: [400] Latency Tolerance Reporting
    Kernel driver in use: vfio-pci
    Kernel modules: i915, xe

This is the output of lspci -vvv -s 0000:03:00.0:
Code:
03:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
    Subsystem: ASRock Incorporation DG2 [Arc A770]
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin ? routed to IRQ 159
    IOMMU group: 16
    Region 0: Memory at f4000000 (64-bit, non-prefetchable) [size=16M]
    Region 2: Memory at f800000000 (64-bit, prefetchable) [size=16G]
    Expansion ROM at f5000000 [disabled] [size=2M]
    Capabilities: [40] Vendor Specific Information: Len=0c <?>
    Capabilities: [70] Express (v2) Endpoint, MSI 00
        DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
            ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
        DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
            MaxPayload 128 bytes, MaxReadReq 128 bytes
        DevSta:    CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
        LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
            ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
        LnkCtl:    ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 2.5GT/s, Width x1
            TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
        DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
             10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix-
             EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
             FRS- TPHComp- ExtTPHComp-
             AtomicOpsCap: 32bit- 64bit- 128bitCAS-
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
             AtomicOpsCtl: ReqEn-
        LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
        LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
             Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
        LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
             EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
             Retimer- 2Retimers- CrosslinkRes: unsupported
    Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
        Address: 00000000fee00000  Data: 0000
        Masking: 00000000  Pending: 00000000
    Capabilities: [d0] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
        Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
        ARICap:    MFVC- ACS-, Next Function: 0
        ARICtl:    MFVC- ACS-, Function Group: 0
    Capabilities: [420 v1] Physical Resizable BAR
        BAR 2: current size: 16GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB
    Capabilities: [400 v1] Latency Tolerance Reporting
        Max snoop latency: 1048576ns
        Max no snoop latency: 1048576ns
    Kernel driver in use: vfio-pci
    Kernel modules: i915, xe

I have also tried putting the A770 into another bare metal windows system, there it is correctly using PCIe 4.0 x16, when I put my old GTX 1070 TI into the Proxmox VM it's also using PCIe x16. I've also tried downgrading the BIOS to older versions as well as using older Arc drivers (used Display Driver Uninstaller to ensure nothing is interfering with different driver versions).

Does anyone have an idea as to what I might be missing here or what I could further check/try out?