Hello,
I have been struggling with this issue for awhile, but I am unable to proceed further. I cannot seem to find anyone else who is having a similar issue to me.
In my server I have an Arc A380 that I wanted to upgrade to an Arc Pro B50. My Arc A380 works perfectly. I have it passed through to a VM without any issues. My configuration files are listed below.
As you can see, my BIOS settings are correct for resizable bar and it is enabled on the A380. My motherboard is a Gigabyte X570 Aorus Ultra. I have made sure that (CSM = Disabled, Above 4G Decode = Enabled, Resizeable Bar = Auto, IOMMU = enabled) and has the latest BIOS installed.
First thing I did was installed the B50 into a Windows computer and performed all firmware updates (done on a computer with a Gigabyte B550M Gaming X WIFI). When I install the B50 into my Proxmox server things get very weird. I have tried this on all three kernels I have available on my machine:
First, display output freezes immediately after
I then modified my config files below. I thought I would do this to try going the SR-IOV route instead of pass through and encountered different errors.
Again, display output freezes immediately after
I am unsure what all these errors mean. Trying to search for them has been unfruitful. I have not found anyone else who has also had any issues swapping from a working Arc A-series GPU to an Arc B-series GPU either. I am not sure if this is a hardware compatibility issue, configuration issue or software/driver issue. Any help would be greatly appreciated.
I have been struggling with this issue for awhile, but I am unable to proceed further. I cannot seem to find anyone else who is having a similar issue to me.
In my server I have an Arc A380 that I wanted to upgrade to an Arc Pro B50. My Arc A380 works perfectly. I have it passed through to a VM without any issues. My configuration files are listed below.
/etc/modules
Code:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
/etc/modprobe.d/pve-blacklist.conf
Code:
blacklist nouveau
blacklist nvidiafb
blacklist radeon
blacklist i915
blacklist xe
blacklist snd_hda_intel
/etc/modprobe.d/vfio.conf
Code:
options vfio-pci ids=0e:00.0,0f:00.0
/etc/default/grub
Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`( . /etc/os-release && echo ${NAME} )`
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt"
GRUB_CMDLINE_LINUX=""
lspci -vvv
Code:
0e:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A380] (rev 05) (prog-if 00 [VGA controller])
Subsystem: Device 172f:3943
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin ? routed to IRQ 289
IOMMU group: 33
Region 0: Memory at fa000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at 7c00000000 (64-bit, prefetchable) [size=8G]
Expansion ROM at fb000000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express (v2) Endpoint, IntMsgNum 0
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W TEE-IO-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, LnkDisable- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
AtomicOpsCtl: ReqEn-
IDOReq- IDOCompl- LTR+ EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Address: 00000000fee00000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [d0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [420 v1] Physical Resizable BAR
BAR 2: current size: 8GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB
Capabilities: [400 v1] Latency Tolerance Reporting
Max snoop latency: 1048576ns
Max no snoop latency: 1048576ns
Kernel driver in use: vfio-pci
Kernel modules: i915, xe
As you can see, my BIOS settings are correct for resizable bar and it is enabled on the A380. My motherboard is a Gigabyte X570 Aorus Ultra. I have made sure that (CSM = Disabled, Above 4G Decode = Enabled, Resizeable Bar = Auto, IOMMU = enabled) and has the latest BIOS installed.
First thing I did was installed the B50 into a Windows computer and performed all firmware updates (done on a computer with a Gigabyte B550M Gaming X WIFI). When I install the B50 into my Proxmox server things get very weird. I have tried this on all three kernels I have available on my machine:
- 6.17.13-7-pve
- 7.0.0-3-pve
- 7.0.2-2-pve
/etc/modprobe.d/pve-blacklist.conf, /etc/modprobe.d/vfio.conf and /etc/default/grub, I observe the following behavior:First, display output freezes immediately after
Starting pvestatd.service is shown on screen.lspci -vvv
Code:
0e:00.0 VGA compatible controller: Intel Corporation Battlemage G21 [Intel Graphics] (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 1114
!!! Unknown header type 7f
IOMMU group: 33
Region 0: Memory at d0000000 (64-bit, prefetchable) [size=16M]
Region 2: Memory at c0000000 (64-bit, prefetchable) [size=256M]
Expansion ROM at fbe00000 [disabled] [size=2M]
Kernel modules: xe
0f:00.0 Audio device: Intel Corporation Device e2f7
Subsystem: Intel Corporation Device 1114
!!! Unknown header type 7f
IOMMU group: 34
Region 0: Memory at fc000000 (64-bit, non-prefetchable) [size=16K]
Kernel modules: snd_hda_intel
dmesg | grep 0e:00.0 shows no outputjournalctl | grep 0e:00.0
Code:
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: [8086:e212] type 00 class 0x030000 PCIe Endpoint
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: BAR 0 [mem 0xe0000000-0xe0ffffff 64bit pref]
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: BAR 2 [mem 0xd0000000-0xdfffffff 64bit pref]
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: ROM [mem 0xfbe00000-0xfbffffff pref]
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: PME# supported from D0 D3hot
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: VF BAR 0 [mem 0x00000000-0x00ffffff 64bit pref]
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: VF BAR 0 [mem 0x00000000-0x01ffffff 64bit pref]: contains BAR 0 for 2 VFs
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: VF BAR 2 [mem 0x00000000-0x1ffffffff 64bit pref]
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: VF BAR 2 [mem 0x00000000-0x3ffffffff 64bit pref]: contains BAR 2 for 2 VFs
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: vgaarb: setting as boot VGA device
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: vgaarb: bridge control possible
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: VF BAR 2 [mem size 0x400000000 64bit pref]: can't assign; no space
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: VF BAR 2 [mem size 0x400000000 64bit pref]: failed to assign
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: VF BAR 0 [mem size 0x02000000 64bit pref]: can't assign; no space
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: VF BAR 0 [mem size 0x02000000 64bit pref]: failed to assign
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: VF BAR 2 [mem size 0x400000000 64bit pref]: can't assign; no space
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: VF BAR 2 [mem size 0x400000000 64bit pref]: failed to assign
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: BAR 2 [mem size 0x10000000 64bit pref]: can't assign; no space
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: BAR 2 [mem size 0x10000000 64bit pref]: failed to assign
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: BAR 0 [mem size 0x01000000 64bit pref]: can't assign; no space
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: BAR 0 [mem size 0x01000000 64bit pref]: failed to assign
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: VF BAR 0 [mem size 0x02000000 64bit pref]: can't assign; no space
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: VF BAR 0 [mem size 0x02000000 64bit pref]: failed to assign
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: BAR 2 [mem size 0x10000000 64bit pref]: can't assign; no space
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: BAR 2 [mem size 0x10000000 64bit pref]: failed to assign
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: BAR 0 [mem size 0x01000000 64bit pref]: can't assign; no space
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: BAR 0 [mem size 0x01000000 64bit pref]: failed to assign
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: VF BAR 2 [mem size 0x400000000 64bit pref]: can't assign; no space
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: VF BAR 2 [mem size 0x400000000 64bit pref]: failed to assign
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: BAR 2 [mem size 0x10000000 64bit pref]: can't assign; no space
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: BAR 2 [mem size 0x10000000 64bit pref]: failed to assign
May 09 21:51:48 pve kernel: pci 0000:0e:00.0: BAR 0 [mem size 0x01000000 64bit pref]: can't assign; no space
I then modified my config files below. I thought I would do this to try going the SR-IOV route instead of pass through and encountered different errors.
/etc/modprobe.d/pve-blacklist.conf
Code:
blacklist nouveau
blacklist nvidiafb
blacklist radeon
/etc/modprobe.d/vfio.conf is now blankAgain, display output freezes immediately after
Starting pvestatd.service is shown on screen.lspci -vvv is unchanged
Code:
0e:00.0 VGA compatible controller: Intel Corporation Battlemage G21 [Intel Graphics] (prog-if 00 [VGA controller])
Subsystem: Intel Corporation Device 1114
!!! Unknown header type 7f
IOMMU group: 33
Region 0: Memory at d0000000 (64-bit, prefetchable) [size=16M]
Region 2: Memory at c0000000 (64-bit, prefetchable) [size=256M]
Expansion ROM at fbe00000 [disabled] [size=2M]
Kernel modules: xe
0f:00.0 Audio device: Intel Corporation Device e2f7
Subsystem: Intel Corporation Device 1114
!!! Unknown header type 7f
IOMMU group: 34
Region 0: Memory at fc000000 (64-bit, non-prefetchable) [size=16K]
Kernel modules: snd_hda_intel
dmesg | grep 0e:00.0
Code:
[ 12.166759] xe 0000:0e:00.0: [drm] vblank wait timed out on crtc 0
[ 12.191572] xe 0000:0e:00.0: [drm] *ERROR* Tile0: GT0: Force wake domain 0: wake. MMIO unreliable (forcewake register returns 0xFFFFFFFF)!
[ 12.192714] xe 0000:0e:00.0: [drm] Tile0: GT0: Forcewake domain 0x1 failed to acknowledge awake request
[ 13.254961] xe 0000:0e:00.0: [drm] vblank wait timed out on crtc 0
[ 13.275326] xe 0000:0e:00.0: [drm] *ERROR* Tile0: GT0: Force wake domain 0: wake. MMIO unreliable (forcewake register returns 0xFFFFFFFF)!
[ 13.276219] xe 0000:0e:00.0: [drm] Tile0: GT0: Forcewake domain 0x1 failed to acknowledge awake request
[ 14.343133] xe 0000:0e:00.0: [drm] vblank wait timed out on crtc 0
[ 14.364154] xe 0000:0e:00.0: [drm] *ERROR* Tile0: GT0: Force wake domain 0: wake. MMIO unreliable (forcewake register returns 0xFFFFFFFF)!
[ 14.364992] xe 0000:0e:00.0: [drm] Tile0: GT0: Forcewake domain 0x1 failed to acknowledge awake request
[ 15.430547] xe 0000:0e:00.0: [drm] vblank wait timed out on crtc 0
[ 15.430781] xe 0000:0e:00.0: [drm] *ERROR* Tile0: GT0: Force wake domain 0: wake. MMIO unreliable (forcewake register returns 0xFFFFFFFF)!
[ 15.430784] xe 0000:0e:00.0: [drm] Tile0: GT0: Forcewake domain 0x1 failed to acknowledge awake request
[ 15.430899] xe 0000:0e:00.0: [drm] *ERROR* Tile0: GT0: Force wake domain 0: wake. MMIO unreliable (forcewake register returns 0xFFFFFFFF)!
[ 15.430905] xe 0000:0e:00.0: [drm] Tile0: GT0: Forcewake domain 0x1 failed to acknowledge awake request
[ 15.431899] xe 0000:0e:00.0: [drm] *ERROR* Tile0: GT0: Force wake domain 0: wake. MMIO unreliable (forcewake register returns 0xFFFFFFFF)!
[ 15.431902] xe 0000:0e:00.0: [drm] Tile0: GT0: Forcewake domain 0x1 failed to acknowledge awake request
[ 15.432644] xe 0000:0e:00.0: [drm] *ERROR* [CRTC:150:pipe A] DSB 0 is busy
[ 25.542264] xe 0000:0e:00.0: [drm] *ERROR* [CRTC:150:pipe A] flip_done timed out
[ 25.543305] xe 0000:0e:00.0: [drm] *ERROR* [CRTC:150:pipe A] DSB 0 timed out waiting for idle (current head=0xfedaffff, head=0xfedaffff, tail=0xfedaffff)
journalctl | grep 0e:00.0
Code:
May 09 22:08:46 pve kernel: xe 0000:0e:00.0: [drm] Tile0: GT0: Forcewake domain 0x1 failed to acknowledge awake request
May 09 22:08:47 pve kernel: xe 0000:0e:00.0: [drm] vblank wait timed out on crtc 0
May 09 22:08:47 pve kernel: xe 0000:0e:00.0: [drm] *ERROR* Tile0: GT0: Force wake domain 0: wake. MMIO unreliable (forcewake register returns 0xFFFFFFFF)!
May 09 22:08:47 pve kernel: xe 0000:0e:00.0: [drm] Tile0: GT0: Forcewake domain 0x1 failed to acknowledge awake request
May 09 22:08:48 pve kernel: xe 0000:0e:00.0: [drm] vblank wait timed out on crtc 0
May 09 22:08:48 pve kernel: xe 0000:0e:00.0: [drm] *ERROR* Tile0: GT0: Force wake domain 0: wake. MMIO unreliable (forcewake register returns 0xFFFFFFFF)!
May 09 22:08:48 pve kernel: xe 0000:0e:00.0: [drm] Tile0: GT0: Forcewake domain 0x1 failed to acknowledge awake request
May 09 22:08:49 pve kernel: xe 0000:0e:00.0: [drm] vblank wait timed out on crtc 0
May 09 22:08:49 pve kernel: xe 0000:0e:00.0: [drm] *ERROR* Tile0: GT0: Force wake domain 0: wake. MMIO unreliable (forcewake register returns 0xFFFFFFFF)!
May 09 22:08:49 pve kernel: xe 0000:0e:00.0: [drm] Tile0: GT0: Forcewake domain 0x1 failed to acknowledge awake request
May 09 22:08:50 pve kernel: xe 0000:0e:00.0: [drm] vblank wait timed out on crtc 0
May 09 22:08:50 pve kernel: xe 0000:0e:00.0: [drm] *ERROR* Tile0: GT0: Force wake domain 0: wake. MMIO unreliable (forcewake register returns 0xFFFFFFFF)!
May 09 22:08:50 pve kernel: xe 0000:0e:00.0: [drm] Tile0: GT0: Forcewake domain 0x1 failed to acknowledge awake request
May 09 22:08:51 pve kernel: xe 0000:0e:00.0: [drm] vblank wait timed out on crtc 0
May 09 22:08:51 pve kernel: xe 0000:0e:00.0: [drm] *ERROR* Tile0: GT0: Force wake domain 0: wake. MMIO unreliable (forcewake register returns 0xFFFFFFFF)!
May 09 22:08:51 pve kernel: xe 0000:0e:00.0: [drm] Tile0: GT0: Forcewake domain 0x1 failed to acknowledge awake request
May 09 22:08:52 pve kernel: xe 0000:0e:00.0: [drm] vblank wait timed out on crtc 0
May 09 22:08:52 pve kernel: xe 0000:0e:00.0: [drm] *ERROR* Tile0: GT0: Force wake domain 0: wake. MMIO unreliable (forcewake register returns 0xFFFFFFFF)!
May 09 22:08:52 pve kernel: xe 0000:0e:00.0: [drm] Tile0: GT0: Forcewake domain 0x1 failed to acknowledge awake request
I am unsure what all these errors mean. Trying to search for them has been unfruitful. I have not found anyone else who has also had any issues swapping from a working Arc A-series GPU to an Arc B-series GPU either. I am not sure if this is a hardware compatibility issue, configuration issue or software/driver issue. Any help would be greatly appreciated.
Last edited: