Intel ARC B50 Passthrough causing system hang on Icelake system

woods14

Member
Jun 30, 2023
1
0
6
Hello,
I recently purchased the following server and hit an issue passing through an arc b50 through to a VM resulting in a system hang. I was wondering if anyone else had a similar issue.

System Specs:
CPU: Intel(R) Xeon(R) Platinum 8360Y
GPU: OEM Arc B50
Motherboard: X12SPL-LN4F
BIOS Date: 07/11/2025 Ver 2.4
Kernel : Linux 6.17.2-1-pve
OS: Virtual Environment 9.0.11

List of Troubleshooting steps:
Attempted to pass through an AMD radeon 6600 on Ice lake system which worked as expected.
Attempted to pass through Arc B50 on 5800x system which worked as expected(with Virtual functions).
Toggled NUMA and VT-D settings.
Added intel_iommu=on and iommu=pt to Grub args.
Tried to pass through whole card and VF with no success.
Tried setting the VM memory to 4gv/8gb/16gb.

When starting the VM the following lines are seen in the journal:

Code:
Oct 22 11:38:12 pve750 kernel: VFIO - User Level meta-driver version: 0.3
Oct 22 11:38:12 pve750 kernel: vfio-pci 0000:53:00.4: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+memwns=none
Oct 22 11:38:12 pve750 kernel: vfio-pci 0000:53:00.4: resetting
Oct 22 11:38:12 pve750 kernel: xe 0000:53:00.0: [drm] GT0: PF: VF4 FLR
Oct 22 11:38:12 pve750 kernel: xe 0000:53:00.0: [drm] GT1: PF: VF4 FLR
Oct 22 11:38:12 pve750 kernel: vfio-pci 0000:53:00.4: reset done
Oct 22 11:38:12 pve750 systemd[1]: Created slice qemu.slice - Slice /qemu.
Oct 22 11:38:12 pve750 systemd[1]: Started 101.scope.
Oct 22 11:38:12 pve750 kernel: audit: type=1400 audit(1761151092.851:123): apparmor=“DENIED” operation=“capable” class=“cap” profile=“swtpm” pid=2586 comm=“swtpm” capability=21  capname=“sys_admin”
Oct 22 11:38:13 pve750 kernel: tap101i0: entered promiscuous mode
Oct 22 11:38:13 pve750 kernel: vmbr0: port 2(fwpr101p0) entered blocking state
Oct 22 11:38:13 pve750 kernel: vmbr0: port 2(fwpr101p0) entered disabled state
Oct 22 11:38:13 pve750 kernel: fwpr101p0: entered allmulticast mode
Oct 22 11:38:13 pve750 kernel: fwpr101p0: entered promiscuous mode
Oct 22 11:38:13 pve750 kernel: vmbr0: port 2(fwpr101p0) entered blocking state
Oct 22 11:38:13 pve750 kernel: vmbr0: port 2(fwpr101p0) entered forwarding state
Oct 22 11:38:13 pve750 kernel: fwbr101i0: port 1(fwln101i0) entered blocking state
Oct 22 11:38:13 pve750 kernel: fwbr101i0: port 1(fwln101i0) entered disabled state
Oct 22 11:38:13 pve750 kernel: fwln101i0: entered allmulticast mode
Oct 22 11:38:13 pve750 kernel: fwln101i0: entered promiscuous mode
Oct 22 11:38:13 pve750 kernel: fwbr101i0: port 1(fwln101i0) entered blocking state
Oct 22 11:38:13 pve750 kernel: fwbr101i0: port 1(fwln101i0) entered forwarding state
Oct 22 11:38:13 pve750 kernel: fwbr101i0: port 2(tap101i0) entered blocking state
Oct 22 11:38:13 pve750 kernel: fwbr101i0: port 2(tap101i0) entered disabled state
Oct 22 11:38:13 pve750 kernel: tap101i0: entered allmulticast mode
Oct 22 11:38:13 pve750 kernel: fwbr101i0: port 2(tap101i0) entered blocking state
Oct 22 11:38:13 pve750 kernel: fwbr101i0: port 2(tap101i0) entered forwarding state
Oct 22 11:38:13 pve750 kernel: vfio-pci 0000:53:00.4: enabling device (0000 → 0002)
Oct 22 11:38:13 pve750 kernel: vfio-pci 0000:53:00.4: resetting
Oct 22 11:38:13 pve750 kernel: xe 0000:53:00.0: [drm] GT0: PF: VF4 FLR
Oct 22 11:38:13 pve750 kernel: xe 0000:53:00.0: [drm] GT1: PF: VF4 FLR
Oct 22 11:38:13 pve750 kernel: vfio-pci 0000:53:00.4: reset done
Oct 22 11:38:13 pve750 kernel: DMAR: VT-d detected Invalidation Time-out Error: SID 0
Oct 22 11:38:13 pve750 kernel: DMAR: QI HEAD: Invalidation Wait qw0 = 0x200000025, qw1 = 0x1008c94d8
Oct 22 11:38:13 pve750 kernel: DMAR: DRHD: handling fault status reg 40
Oct 22 11:38:13 pve750 kernel: DMAR: QI PRIOR: Device-TLB Invalidation qw0 = 0x5300530400000003, qw1 = 0x1fff001
Oct 22 11:38:13 pve750 kernel: DMAR: Invalidation Time-out Error (ITE) cleared
Oct 22 11:38:13 pve750 kernel: DMAR: DRHD: handling fault status reg 20

Which leads to:

Code:
Oct 22 12:26:35 pve750 kernel: watchdog: BUG: soft lockup - CPU#20 stuck for 756s! [(udev-worker):2789]
Oct 22 12:26:35 pve750 kernel: Modules linked in: veth vfio_pci vfio_pci_core vfio_iommu_type1 vfio iommufd tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter nf_tables sunrpc bonding tls softdog binfmt_misc nfnetlink_log snd_hda_codec_intelhdmi snd_hda_codec_hdmi mei_gsc_proxy pmt_telemetry pmt_discovery pmt_crashlog mtd_intel_dg pmt_class mei_gsc intel_rapl_msr intel_rapl_common sch_fq_codel intel_uncore_frequency intel_uncore_frequency_common i10nm_edac skx_edac_common nfit xe x86_pkg_temp_thermal intel_powerclamp gpu_sched coretemp drm_gpuvm drm_gpusvm_helper drm_buddy drm_ttm_helper ipmi_ssif ttm drm_exec drm_suballoc_helper kvm_intel drm_display_helper cec snd_hda_intel snd_hda_codec kvm snd_hda_core dax_hmem acpi_power_meter snd_intel_dspcfg cxl_acpi snd_intel_sdw_acpi cxl_port irqbypass snd_hwdep polyval_clmulni snd_pcm ghash_clmulni_intel cxl_core cmdlinepart rc_core snd_timer ipmi_si aesni_intel acpi_ipmi isst_if_mbox_pci isst_if_mmio intel_th_gth spi_nor video
Oct 22 12:26:35 pve750 kernel: snd rapl ipmi_devintf fwctl mei_me joydev input_leds intel_th_pci intel_cstate einj pcspkr soundcore wmi mtd ast isst_if_common mei intel_pch_thermal intel_th intel_vsec ipmi_msghandler acpi_pad mac_hid zfs(PO) spl(O) vhost_net vhost vhost_iotlb tap efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 xfs btrfs blake2b_generic xor raid6_pq rndis_host cdc_ether usbnet mii hid_generic usbmouse usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio nvme nvme_core xhci_pci i2c_i801 igb nvme_keyring i2c_mux spi_intel_pci ahci ioatdma xhci_hcd i2c_algo_bit nvme_auth i2c_smbus spi_intel libahci dca
Oct 22 12:26:35 pve750 kernel: CPU: 20 UID: 0 PID: 2789 Comm: (udev-worker) Tainted: P W O L 6.17.2-1-pve #1 PREEMPT(voluntary)
Oct 22 12:26:35 pve750 kernel: Tainted: [P]=PROPRIETARY_MODULE, [W]=WARN, [O]=OOT_MODULE, [L]=SOFTLOCKUP
Oct 22 12:26:35 pve750 kernel: Hardware name: Supermicro Super Server/X12SPL-F, BIOS 2.4 07/11/2025
Oct 22 12:26:35 pve750 kernel: RIP: 0010:smp_call_function_many_cond+0x148/0x520
Oct 22 12:26:35 pve750 kernel: Code: 08 48 63 d0 e8 a9 b9 65 00 3b 05 d3 b5 49 02 73 26 48 63 d0 49 8b 75 00 48 03 34 d5 00 a1 34 98 8b 56 08 83 e2 01 74 0a f3 90 <8b> 4e 08 83 e1 01 75 f6 83 c0 01 eb c0 48 83 c4 48 5b 41 5c 41 5d
Oct 22 12:26:35 pve750 kernel: RSP: 0018:ff57a5d3ce46b9b8 EFLAGS: 00000202
Oct 22 12:26:35 pve750 kernel: RAX: 0000000000000039 RBX: 0000000000000246 RCX: 0000000000000001
Oct 22 12:26:35 pve750 kernel: RDX: 0000000000000001 RSI: ff2098713fabba40 RDI: 0000000000000000
Oct 22 12:26:35 pve750 kernel: RBP: ff57a5d3ce46ba28 R08: 0000000000000000 R09: 0000000000000000
Oct 22 12:26:35 pve750 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
Oct 22 12:26:35 pve750 kernel: R13: ff2098713e833c40 R14: 0000000000000014 R15: 0000000000000014
Oct 22 12:26:35 pve750 kernel: FS: 0000000000000000(0000) GS:ff209871a5386000(0000) knlGS:0000000000000000
Oct 22 12:26:35 pve750 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 22 12:26:35 pve750 kernel: CR2: 000071f799bc6001 CR3: 000000037f23a003 CR4: 0000000000773ef0
Oct 22 12:26:35 pve750 kernel: PKRU: 55555554
Oct 22 12:26:35 pve750 kernel: Call Trace:
Oct 22 12:26:35 pve750 kernel:

Another thing of possible note is I see the following log when trying to start the VM:
kvm: -device vfio-pci,host=0000:53:00.4,id=hostpci0,bus=ich9-pcie-port-1,addr=0x0: IGD device 0000:53:00.4 is unsupported in legacy mode, try SandyBridge or newer
and I only see this on the Intel system and not on my AMD system.

I am wondering if anyone can reproduce or provide some insite on what is vexing my ice lake system, Thanks!