I'm trying to figure out why the A770 that I'm passing into a W11 VM seems to be stuck a PCIe x1 speeds, I've checked this via:
The overall system specs are:
The Windows 11 VM is configured as follows:
The PCI devices are as follows:
and also ran
My
The hookscript contains the following (based on this thread) in order to prevent the host from looking up due to the flr / bus erros when the system tries to reset the A770, and to pin the VM to specific cpu cores:
This is the output of
This is the output of
I have also tried putting the A770 into another bare metal windows system, there it is correctly using PCIe 4.0 x16, when I put my old GTX 1070 TI into the Proxmox VM it's also using PCIe x16. I've also tried downgrading the BIOS to older versions as well as using older Arc drivers (used Display Driver Uninstaller to ensure nothing is interfering with different driver versions).
Does anyone have an idea as to what I might be missing here or what I could further check/try out?
- GPU-Z
- HWiNFO
- Intel Graphics Software
- GPU-Z
- Initially shows "PCIe x1 1.1 @ x1 1.1"
- Then switches to just "PCI-Express"
- Even when running the load test via the "?" icon
- HWiNFO
- PCIe v1.1 x1 (2.5 GT/s) @ x1 (2.5 GT/s)
- Intel Graphics Software
- PCIe -1 x-1
The overall system specs are:
- Ryzen 9 7900X3D
- MSI MAG B650 TOMAHAWK WIFI
- Latest BIOS version: 7D75v1L
- 64GB G.Skill Flare X5 DDR5-5200 CL36 Dual Kit
- ASRock ARC A770 Phantom Gaming OC
- Intel Graphics Driver Version 32.0.101.6557
- 2x 2TB Samsung 970 Evo Plus M.2
- 1x 2TB Samsung 990 Evo Plus M.2
- ASPM Control for CPU PCIe
- Disabled
- Auto
- L0 Entry
- L1 Entry
- L0s And L1 Entry
- Integrated Graphics
- Force
- UMA Auto
- PCI_E1 Lanes Configuration
- Auto
- x8+x8
- PCIe Link Speed
- Auto
- Gen4
The Windows 11 VM is configured as follows:
Code:
args: -cpu host,-hypervisor,kvm=off -smbios type=0,vendor="American Megatrends International LLC.",version=1.D0,date=12/15/2023
balloon: 0
bios: ovmf
boot: order=scsi0;net0
cores: 12
cpu: host,hidden=1
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hookscript: local:snippets/intel-dGPU-hookscript.sh
hostpci0: 0000:03:00,pcie=1,x-vga=1
hostpci1: 0000:04:00.0,pcie=1
hostpci2: 0000:13:00.0,pcie=1
machine: pc-q35-9.0
memory: 32768
meta: creation-qemu=8.1.5,ctime=1707422170
name: W11
net0: e1000=00:11:75:CC:C8:8B,bridge=vmbr0,firewall=1
numa: 1
ostype: win11
scsi0: vms-ssd-2:vm-100-disk-0,cache=writethrough,iothread=1,size=1T,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=ed7e948c-a8c6-fb17-a892-d843ae50ead7,manufacturer=TWljcm8tU3RhciBJbnRlcm5hdGlvbmFsIENvLiwgTHRkLg==,product=TVMtN0Q3NQ==,version=MS4w,base64=1
sockets: 1
tpmstate0: local-lvm:vm-100-disk-1,size=4M,version=v2.0
vmgenid: 6254df7a-8852-4f08-8c39-6afc3466915e
The PCI devices are as follows:
- 0000:03:00 => A770 GPU
- VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
- 0000:04:00.0 => A770 Audio
- Audio device: Intel Corporation DG2 Audio Controller
- 0000:13:00.0 => 990 Evo Plus M.2
/etc/modprobe.d/pve-blacklist.conf
as follows:
Code:
blacklist nvidiafb
blacklist nouveau
blacklist nvidia*
blacklist snd_hda_intel
blacklist snd_hda_codec_hdmi
blacklist i915
blacklist xe
and also ran
update-initramfs -u
then rebooted the machine to apply those when I set this all up.My
/etc/default/grub
contains the following line (and I rebuild the grub after adding this):
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt vfio-pci.ids=8086:56a0 initcall_blacklist=sysfb_init pcie_aspm=off pci=pcie_bus_perf"
The hookscript contains the following (based on this thread) in order to prevent the host from looking up due to the flr / bus erros when the system tries to reset the A770, and to pin the VM to specific cpu cores:
Code:
#!/bin/bash
set -e -o errexit -o pipefail -o nounset
# located in /var/lib/vz/snippets/intel-dGPU-hookscript.sh or your 'snippets' location.
# Add to VM via: qm set VMID --hookscript local:snippets/intel-dGPU-hookscript.sh
# Do not modify these variables (set by Proxmox when calling the script)
vmId="$1"
runPhase="$2"
echo "Running $runPhase on VM=$vmId"
case "$runPhase" in
pre-start)
# Clear the reset methods before each start of the VM, to prevent the PVE host locking up
# flr and bus methods dont work and re-appear after reboots.
# bus method may still be attempted in subseqent VM starts, even if reset_method already cleared.
echo "VM=$vmId - $runPhase : Clearing Intel dGPU and audio device reset_methods."
echo > /sys/bus/pci/devices/0000:03:00.0/reset_method
echo > /sys/bus/pci/devices/0000:04:00.0/reset_method
# will appear as the following in journalctl:
# kernel: vfio-pci 0000:03:00.0: All device reset methods disabled by user
# kernel: vfio-pci 0000:04:00.0: All device reset methods disabled by user
;;
post-start)
main_pid="$(< /run/qemu-server/$vmId.pid)"
taskset --cpu-list --all-tasks --pid "0-9" "$main_pid"
echo "VM=$vmId - $runPhase : Pinning all tasks to threads physical cores 0-9 for PID $main_pid"
;;
pre-stop)
# placeholder .
echo "VM=$vmId - $runPhase : No Action."
;;
post-stop)
# placeholder .
echo "VM=$vmId - $runPhase : No Action."
;;
*)
echo "Unknown run phase \"$runPhase\"!"
;;
esac
echo "Finished $runPhase on VM=$vmId"
This is the output of
lspci -nnv -s 0000:03:00.0
:
Code:
03:00.0 VGA compatible controller [0300]: Intel Corporation DG2 [Arc A770] [8086:56a0] (rev 08) (prog-if 00 [VGA controller])
Subsystem: ASRock Incorporation DG2 [Arc A770] [1849:6010]
Flags: bus master, fast devsel, latency 0, IRQ 159, IOMMU group 16
Memory at f4000000 (64-bit, non-prefetchable) [size=16M]
Memory at f800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at f5000000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Capabilities: [d0] Power Management version 3
Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
Capabilities: [420] Physical Resizable BAR
Capabilities: [400] Latency Tolerance Reporting
Kernel driver in use: vfio-pci
Kernel modules: i915, xe
This is the output of
lspci -vvv -s 0000:03:00.0
:
Code:
03:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: ASRock Incorporation DG2 [Arc A770]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin ? routed to IRQ 159
IOMMU group: 16
Region 0: Memory at f4000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at f800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at f5000000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Address: 00000000fee00000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [d0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [420 v1] Physical Resizable BAR
BAR 2: current size: 16GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB
Capabilities: [400 v1] Latency Tolerance Reporting
Max snoop latency: 1048576ns
Max no snoop latency: 1048576ns
Kernel driver in use: vfio-pci
Kernel modules: i915, xe
I have also tried putting the A770 into another bare metal windows system, there it is correctly using PCIe 4.0 x16, when I put my old GTX 1070 TI into the Proxmox VM it's also using PCIe x16. I've also tried downgrading the BIOS to older versions as well as using older Arc drivers (used Display Driver Uninstaller to ensure nothing is interfering with different driver versions).
Does anyone have an idea as to what I might be missing here or what I could further check/try out?