Troubleshooting: Intel A310 GPU/Audio passthrough to VM works; GPU crashes after VM shutdown, reboot needed

y00

New Member
Dec 12, 2025
1
0
1
I started using Proxmox some days ago and put a considerable amount of hours into trying to get my Intel A310 GPU to work properly for passthrough to a VM, or, to be more precise: to hand it back to the system after the VM is shutdown. I am a Linux novice as well but learned a lot these past days. I read many threads from here, level1techs, reddit, and some other sites but wasn't able to solve my problem. Huge props to leesteken for helping out so many people with PVE.

Problem desc:
I can passthrough my dGPU (Intel A310) and its audio (they sit on separate buses) to a Windows 11 VM and everything works as expected. As soon as I power off the VM, the GPU/VGA part of the device doesn't get handed back correctly, only the audio part does. Thus the GPU can't be used for anything anymore and I have to restart the host.

03:00.0 VGA compatible controller [0300]: Intel Corporation DG2 [Arc A310] [8086:56a6] (rev 05) (prog-if 00 [VGA controller])
Subsystem: Device [172f:4240]
Flags: bus master, fast devsel, latency 0, IRQ 105, IOMMU group 16
Memory at dd000000 (64-bit, non-prefetchable) [size=16M]
Memory at fb00000000 (64-bit, prefetchable) [size=4G]
Expansion ROM at de000000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Len=0c <?>
Capabilities: [70] Express Endpoint, IntMsgNum 0
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Capabilities: [d0] Power Management version 3
Capabilities: [100] Alternative Routing-ID Interpretation (ARI)
Capabilities: [420] Physical Resizable BAR
Capabilities: [400] Latency Tolerance Reporting
Kernel driver in use: i915
Kernel modules: i915, xe
04:00.0 Audio device [0403]: Intel Corporation DG2 Audio Controller [8086:4f92]
Subsystem: Device [172f:4240]
Flags: bus master, fast devsel, latency 0, IRQ 109, IOMMU group 17
Memory at de300000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 3
Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, IntMsgNum 0
Capabilities: [100] Latency Tolerance Reporting
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
#dmesg
[ 2098.601643] vfio-pci 0000:03:00.0: not ready 1023ms after FLR; waiting
[ 2099.689379] vfio-pci 0000:03:00.0: not ready 2047ms after FLR; waiting
[ 2101.801403] vfio-pci 0000:03:00.0: not ready 4095ms after FLR; waiting
[ 2106.153452] vfio-pci 0000:03:00.0: not ready 8191ms after FLR; waiting
[ 2114.857906] vfio-pci 0000:03:00.0: not ready 16383ms after FLR; waiting
[ 2131.753986] vfio-pci 0000:03:00.0: not ready 32767ms after FLR; waiting
[ 2168.106024] vfio-pci 0000:03:00.0: not ready 65535ms after FLR; giving up
[ 2168.106321] pcieport 0000:02:01.0: unlocked secondary bus reset via: pci_reset_bus_function+0x152/0x170
[ 2169.322270] vfio-pci 0000:03:00.0: not ready 1023ms after bus reset; giving up

#lspci
03:00.0 VGA compatible controller [0300]: Intel Corporation DG2 [Arc A310] [8086:56a6] (rev 05) (prog-if 00 [VGA controller])
Subsystem: Device [172f:4240]
!!! Unknown header type 7f
Memory at dd000000 (64-bit, non-prefetchable) [size=16M]
Memory at fb00000000 (64-bit, prefetchable) [size=4G]
Expansion ROM at de000000 [disabled] [size=2M]
Kernel driver in use: vfio-pci
Kernel modules: i915, xe

Interesting observations:
  • The problem only appears when I pass through both the VGA and the audio parts of the A310. If I pass through only the VGA, I can shutdown the VM without losing access to the GPU on host. The lack of sound isn't feasible though, since I want to use this VM for display output on my TV.
  • Like many noobs do, I used apt upgrade at some point and thought that might've broken something. Now I think it's more likely that I only passed through the VGA part before I did that and the problem thus didn't occur yet. I haven't done a fresh install yet.
  • When I pass through the GPU (VGA/audio) with the manually added ARGS, I can use the noVNC from the Proxmox GUI, while when I pass through from Proxmox GUI, the noVNC makes no connection and contents are only displayed through dGPU HDMI on my monitor.
What I tried to solve the problem:
Because of my above observations, I tried some things around the audio device (04:00.0):
  • Added vfio bind option to modprobe.d/vfio.conf (options vfio-pci ids=8086:4f92), and reverted because it didn't help. Also blacklisted snd-hda-intel driver via pve-blacklist.conf, and reverted as well. (incl. update-initramfs -u -k all, reboots)
  • Added snd-hda-intel.conf with options snd-hda-intel enable_msi=1 just to try it out because I thought it maybe did something (real tired already)
  • Using the following ARGS for the VM config instead of GUI PCI passthrough options: args: -device vfio-pci,host=03:00.0,addr=10.0,multifunction=on,x-vga=on,rombar=0 -device vfio-pci,host=04:00.0,addr=10.1
  • Using hookscripts to do the binding/unbinding, since it's not happening automatically with ARGS instead of hostpci0/1.
  • Using different combinations of PCI passthrough options from PVE GUI (Rombar on/off, pcie on/off, all functions on/off, mapped/raw) for "both devices"
  • Edited GRUB with iommu=pt, also tried without. Used update-grub each time, then reboot.
  • Adding vfio modules to /etc/modules
What I didn't try, and why:
  • Isolating the GPU from host through early binding to vfio and disabling vga. I still want to use the GPU for other tasks and the Windows VM isn't running all the time.

agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;net0
cores: 6
cpu: host
efidisk0: VMCT:vm-100-disk-0,efitype=4m,ms-cert=2023,pre-enrolled-keys=1,size=4M
hostpci0: 0000:03:00,pcie=1,x-vga=1
hostpci1: 0000:04:00,pcie=1
machine: pc-q35-10.1
memory: 8192
meta: creation-qemu=10.1.2,ctime=1765058652
name: W11
net0: virtio=BC:24:11:25:AD:7F,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsi0: VMCT:vm-100-disk-1,cache=writeback,discard=on,iothread=1,size=120G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=3eb11987-4c1f-4cfc-856e-e91803dfda2f
sockets: 1
tpmstate0: VMCT:vm-100-disk-2,size=4M,version=v2.0
usb0: host=046d:c332
usb1: host=046d:c333
usb2: mapping=BTdongle
vmgenid: ad6a1c9a-bcbb-40ef-a72c-958792358ca8

# START 100 cfg
agent: 1
args: -device vfio-pci,host=03:00.0,addr=10.0,multifunction=on,x-vga=on,rombar=0 -device vfio-pci,host=04:00.0,addr=10.1
balloon: 0
bios: ovmf
boot: order=scsi0;net0
cores: 6
cpu: host
efidisk0: VMCT:vm-100-disk-0,efitype=4m,ms-cert=2023,pre-enrolled-keys=1,size=4M
hookscript: local:snippets/gpu-hook.sh
machine: pc-q35-10.1
memory: 8192
meta: creation-qemu=10.1.2,ctime=1765058652
name: W11
net0: virtio=BC:24:11:25:AD:7F,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsi0: VMCT:vm-100-disk-1,cache=writeback,discard=on,iothread=1,size=120G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=3eb11987-4c1f-4cfc-856e-e91803dfda2f
sockets: 1
tpmstate0: VMCT:vm-100-disk-2,size=4M,version=v2.0
usb0: host=046d:c332
usb1: host=046d:c333
usb2: mapping=BTdongle
vmgenid: ad6a1c9a-bcbb-40ef-a72c-958792358ca8
# END 100 cfg

nano /var/lib/vz/snippets/gpu-hook.sh

#start Hookscript
#!/bin/bash

vmid="$1"
phase="$2"

if [ "$phase" == "pre-start" ]; then
echo "Hook: Unbinding GPU from host drivers..."

# 1. Unbind Audio from Host
# Use 'true' to ignore errors if it's already unbound
echo "0000:04:00.0" > /sys/bus/pci/drivers/snd_hda_intel/unbind 2>/dev/null || true
echo "vfio-pci" > /sys/bus/pci/devices/0000:04:00.0/driver_override
echo "0000:04:00.0" > /sys/bus/pci/drivers/vfio-pci/bind

# 2. Unbind GPU from Host (i915)
echo "0000:03:00.0" > /sys/bus/pci/drivers/i915/unbind 2>/dev/null || true
echo "vfio-pci" > /sys/bus/pci/devices/0000:03:00.0/driver_override
echo "0000:03:00.0" > /sys/bus/pci/drivers/vfio-pci/bind

echo "Hook: GPU ready for VM."
fi

if [ "$phase" == "post-stop" ]; then
echo "Hook: Rebinding GPU to host drivers..."

# 1. Unbind GPU from VFIO
echo "0000:03:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null || true
echo "i915" > /sys/bus/pci/devices/0000:03:00.0/driver_override
echo "0000:03:00.0" > /sys/bus/pci/drivers/i915/bind

# 2. Unbind Audio from VFIO
echo "0000:04:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null || true
echo "snd_hda_intel" > /sys/bus/pci/devices/0000:04:00.0/driver_override
echo "0000:04:00.0" > /sys/bus/pci/drivers/snd_hda_intel/bind

echo "Hook: GPU returned to host."
fi
# end hookscript

chmod +x /var/lib/vz/snippets/gpu-hook.sh

398.920156] i915 0000:03:00.0: [drm] i915 raw-wakerefs=1 wakelocks=1 on cleanup

[ 398.920174] WARNING: CPU: 7 PID: 6581 at drivers/gpu/drm/i915/intel_runtime_pm.c:445 intel_runtime_pm_driver_release+0x81/0xa0 [i915]

[ 398.920232] Modules linked in: tcp_diag inet_diag snd_seq snd_seq_device cfg80211 veth ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter nf_tables sunrpc binfmt_misc mei_hdcp mei_pxp bonding tls nfnetlink_log mei_gsc mei_me mei xe intel_vsec drm_gpuvm drm_gpusvm_helper snd_hda_codec_intelhdmi amdgpu ipmi_ssif snd_hda_codec_atihdmi amd_atl intel_rapl_msr snd_hda_codec_hdmi i915 intel_rapl_common amdxcp drm_panel_backlight_quirks amd64_edac snd_hda_intel edac_mce_amd gpu_sched snd_hda_codec drm_ttm_helper kvm_amd snd_hda_core drm_buddy drm_exec snd_intel_dspcfg ttm btusb kvm drm_suballoc_helper snd_intel_sdw_acpi drm_display_helper btrtl snd_hwdep polyval_clmulni ghash_clmulni_intel btintel snd_pcm cec aesni_intel acpi_ipmi btbcm ipmi_si snd_timer sch_fq_codel ftdi_sio rapl rc_core pcspkr ast btmtk ipmi_devintf joydev snd input_leds eeepc_wmi bluetooth wmi_bmof k10temp spd5118 i2c_algo_bit soundcore usbserial ccp ipmi_msghandler mac_hid zfs(PO) spl(O) msr vhost_net vhost

[ 398.920271] vhost_iotlb tap vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq hid_generic usbkbd usbmouse dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio cdc_ether usbnet usbhid mii hid mfd_aaeon asus_wmi nvme sparse_keymap nvme_core xhci_pci i2c_piix4 platform_profile ahci nvme_keyring video i2c_smbus xhci_hcd igc libahci nvme_auth wmi gpio_amdpt

[ 398.920295] CPU: 7 UID: 0 PID: 6581 Comm: bash Tainted: P O 6.17.4-1-pve #1 PREEMPT(voluntary)

[ 398.920298] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE

[ 398.920299] Hardware name: ASUS System Product Name/Pro WS B850M-ACE SE, BIOS 1804 11/12/2025

[ 398.920300] RIP: 0010:intel_runtime_pm_driver_release+0x81/0xa0 [i915]

[ 398.920341] Code: ff 48 8b 5f 50 48 85 db 75 03 48 8b 1f e8 47 14 29 fb 45 89 e8 44 89 e1 48 89 da 48 89 c6 48 c7 c7 10 d3 c7 c1 e8 bf 24 7d fa <0f> 0b 5b 41 5c 41 5d 5d 31 c0 31 d2 31 c9 31 f6 31 ff 45 31 c0 c3

[ 398.920343] RSP: 0018:ffffce87e05dfa10 EFLAGS: 00010246

[ 398.920344] RAX: 0000000000000000 RBX: ffff8b92818b6490 RCX: 0000000000000000

[ 398.920345] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000

[ 398.920346] RBP: ffffce87e05dfa28 R08: 0000000000000000 R09: 0000000000000000

[ 398.920347] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001

[ 398.920347] R13: 0000000000000001 R14: ffff8b92821de0c8 R15: ffff8b92821de390

[ 398.920348] FS: 000078b8872bc740(0000) GS:ffff8b99ffd06000(0000) knlGS:0000000000000000

[ 398.920349] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033

[ 398.920350] CR2: 000000c00022c008 CR3: 00000001a3527000 CR4: 0000000000f50ef0

[ 398.920351] PKRU: 55555554

[ 398.920352] Call Trace:

[ 398.920353] <TASK>

[ 398.920355] i915_driver_release+0x76/0x90 [i915]

[ 398.920393] devm_drm_dev_init_release+0x5e/0x90

[ 398.920397] devm_action_release+0x12/0x30

[ 398.920398] release_nodes+0x3a/0xd0

[ 398.920400] devres_release_all+0x94/0x100

[ 398.920403] device_unbind_cleanup+0x12/0x90

[ 398.920405] device_release_driver_internal+0x22b/0x270

[ 398.920406] device_driver_detach+0x14/0x20

[ 398.920407] unbind_store+0xac/0xc0

[ 398.920409] drv_attr_store+0x21/0x50

[ 398.920410] sysfs_kf_write+0x6f/0x90

[ 398.920413] kernfs_fop_write_iter+0x15e/0x210

[ 398.920415] vfs_write+0x271/0x490

[ 398.920417] ksys_write+0x6f/0xf0

[ 398.920419] __x64_sys_write+0x19/0x30

[ 398.920420] x64_sys_call+0x79/0x2330

[ 398.920422] do_syscall_64+0x80/0xa30

[ 398.920424] ? __handle_mm_fault+0xb55/0xfd0

[ 398.920427] ? count_memcg_events+0xd7/0x1a0

[ 398.920429] ? handle_mm_fault+0x254/0x370

[ 398.920431] ? do_user_addr_fault+0x2f8/0x830

[ 398.920433] ? irqentry_exit_to_user_mode+0x2e/0x290

[ 398.920436] ? irqentry_exit+0x43/0x50

[ 398.920437] ? exc_page_fault+0x90/0x1b0

[ 398.920439] entry_SYSCALL_64_after_hwframe+0x76/0x7e

[ 398.920440] RIP: 0033:0x78b88734e687

[ 398.920442] Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff

[ 398.920443] RSP: 002b:00007fff439d4bc0 EFLAGS: 00000202 ORIG_RAX: 0000000000000001

[ 398.920444] RAX: ffffffffffffffda RBX: 000078b8872bc740 RCX: 000078b88734e687

[ 398.920445] RDX: 000000000000000c RSI: 000055c8180b1a80 RDI: 0000000000000001

[ 398.920445] RBP: 000055c8180b1a80 R08: 0000000000000000 R09: 0000000000000000

[ 398.920446] R10: 0000000000000000 R11: 0000000000000202 R12: 000000000000000c

[ 398.920447] R13: 000078b8874a75c0 R14: 000078b8874a4e80 R15: 0000000000000002

[ 398.920448] </TASK>

Any and all help/ideas are appreciated. I might go with a fresh install next but I don't know if it's even worth it. I set up my server follwing the Perfect Media Server tutorial with MergerFS/Snapraid on host and documented all steps and code. Reinstalling should be fine but I would still like to avoid it.

System info: latest PVE/Kernel @ AMD Ryzen 9600X on Asus Pro WS B850M-Ace SE w/ Intel A310 dGPU in Pciex16_1 and ASM1166 in Pciex16(X4)_2.
 
Last edited: