[SOLVED] After starting a virtual machine using PCIe passthrough, stopping it prevents the amdgpu driver from bind to a specific iGPU.

uzumo

Active Member
Apr 5, 2025
278
66
28
After starting a virtual machine using PCIe passthrough, stopping it prevents the amdgpu driver from bind to a specific iGPU.

Background

I was using hookscript on the Ryzen 7 7700 to enable restarting after shutdown.

This worked fine to reduce the hassle of blacklisting and early binding.

However, I became interested when I heard that other users with Ryzen iGPUs couldn't do the same thing, so I bought an 8700G (780m) to try it out and found that it didn't work.

Verification Results

it was determined that an error occurred during driver re-binding, resulting in the binding failure.

This issue does not occur on the 7700 but only on the 8700G, and it functions normally until the vfio-pci driver is bound, so I'm struggling to figure out what the problem is.

I'd like to know if this should be considered a specific CPU-related issue that cannot be improved, or if anyone knows of any possible solutions.

Error
Code:
Nov 23 23:13:06 pve2 kernel: vfio-pci 0000:10:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=none
Nov 23 23:13:13 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: initializing kernel modesetting (IP DISCOVERY 0x1002:0x15BF 0x1849:0x35BF 0x06).
Nov 23 23:13:13 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: register mmio base: 0xF4F00000
Nov 23 23:13:13 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: register mmio size: 524288
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 0 <soc21_common>
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 1 <gmc_v11_0>
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 2 <ih_v6_0>
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 3 <psp>
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 4 <smu>
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 5 <dm>
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 6 <gfx_v11_0>
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 7 <sdma_v6_0>
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 8 <vcn_v4_0>
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 9 <jpeg_v4_0>
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 10 <mes_v11_0>
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Fetched VBIOS from VFCT
Nov 23 23:13:17 pve2 kernel: amdgpu: ATOM BIOS: 113-PHXGENERIC-001
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: vgaarb: deactivate vga console
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: VRAM: 512M 0x0000008000000000 - 0x000000801FFFFFFF (512M used)
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
Nov 23 23:13:17 pve2 kernel: [drm] Detected VRAM RAM=512M, BAR=512M
Nov 23 23:13:17 pve2 kernel: [drm] RAM width 64bits DDR5
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: 512M of VRAM memory ready
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: 15585M of GTT memory ready.
Nov 23 23:13:17 pve2 kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072
Nov 23 23:13:17 pve2 kernel: [drm] PCIE GART of 512M enabled (table at 0x000000801FD00000).
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] Loading DMUB firmware via PSP: version=0x08005300
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Found VCN firmware Version ENC: 1.24 DEC: 9 VEP: 0 Revision: 22
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: psp reg (0x16080) wait timed out, mask: 8000ffff, read: 30000 exp: 80000000
Nov 23 23:13:17 pve2 kernel: [drm:psp_v13_0_4_ring_create [amdgpu]] *ERROR* Failed to wait for trust OS ready for ring creation
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: PSP create ring failed!
Nov 23 23:13:18 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: psp reg (0x16080) wait timed out, mask: 8000ffff, read: 30000 exp: 80000000
Nov 23 23:13:18 pve2 kernel: [drm:psp_v13_0_4_ring_destroy [amdgpu]] *ERROR* Fail to stop psp ring
Nov 23 23:13:18 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: PSP firmware loading failed
Nov 23 23:13:18 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: hw_init of IP block <psp> failed -22
Nov 23 23:13:18 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu_device_ip_init failed
Nov 23 23:13:18 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Fatal error during GPU init
Nov 23 23:13:18 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: finishing device.
Nov 24 03:45:09 pve2 kernel: ------------[ cut here ]------------
Nov 24 03:45:09 pve2 kernel: WARNING: CPU: 10 PID: 2378 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 amdgpu_irq_put+0xbe/0xe0 [amdgpu]
Nov 24 03:45:09 pve2 kernel: Modules linked in: tcp_diag inet_diag cmac nls_utf8 cifs cifs_arc4 nls_ucs2_utils rdma_cm iw_cm ib_cm cifs_md4 netfs ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter nf_tables sunrpc nfnetlink_cttimeout bonding openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 softdog nfnetlink_log binfmt_misc amd_atl mt7921e intel_rapl_msr amdgpu snd_hda_codec_alc662 intel_rapl_common mt7921_common snd_hda_codec_realtek_lib snd_hda_codec_atihdmi snd_hda_codec_generic mt792x_lib snd_hda_codec_hdmi mt76_connac_lib edac_mce_amd amdxcp mt76 drm_panel_backlight_quirks snd_hda_intel drm_buddy kvm_amd drm_ttm_helper snd_hda_codec ttm snd_hda_core btusb drm_exec drm_suballoc_helper mac80211 kvm btrtl snd_intel_dspcfg drm_display_helper btintel snd_intel_sdw_acpi btbcm snd_hwdep snd_pcm cec btmtk polyval_clmulni rc_core snd_timer cfg80211 ghash_clmulni_intel aesni_intel amdxdna snd i2c_algo_bit rapl wmi_bmof bluetooth spd5118 pcspkr ccp gpu_sched
Nov 24 03:45:09 pve2 kernel:  mlx5_fwctl soundcore k10temp libarc4 fwctl input_leds joydev mac_hid sch_fq_codel vhost_net vhost vhost_iotlb tap nct6775 nct6775_core hwmon_vid vfio_pci vfio_pci_core irqbypass vfio_iommu_type1 vfio iommufd nvme_tcp nvme_fabrics efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic xor raid6_pq mlx5_ib ib_uverbs macsec ib_core hid_generic usbmouse usbkbd usbhid hid uas usb_storage mlx5_core mlxfw xhci_pci psample tls nvme i2c_piix4 ahci i2c_smbus xhci_hcd libahci pci_hyperv_intf r8169 nvme_core realtek nvme_keyring nvme_auth video wmi gpio_amdpt
Nov 24 03:45:09 pve2 kernel: CPU: 10 UID: 0 PID: 2378 Comm: bash Tainted: P           O        6.17.2-1-pve #1 PREEMPT(voluntary)
Nov 24 03:45:09 pve2 kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
Nov 24 03:45:09 pve2 kernel: Hardware name: ASRock B850M-X WiFi R2.0/B850M-X WiFi R2.0, BIOS 3.40 08/27/2025
Nov 24 03:45:09 pve2 kernel: RIP: 0010:amdgpu_irq_put+0xbe/0xe0 [amdgpu]
Nov 24 03:45:09 pve2 kernel: Code: fc ff ff 84 c0 75 87 eb 23 44 89 ea 48 89 de 4c 89 e7 e8 45 fd ff ff 5b 41 5c 41 5d 41 5e 5d 31 d2 31 f6 31 ff e9 fd 93 df c1 <0f> 0b b8 ea ff ff ff eb af b8 fe ff ff ff eb a8 e9 d5 51 68 00 66
Nov 24 03:45:09 pve2 kernel: RSP: 0018:ffffd0204165f830 EFLAGS: 00010246
Nov 24 03:45:09 pve2 kernel: RAX: 0000000000000000 RBX: ffff8ec8d0aa5a18 RCX: 0000000000000000
Nov 24 03:45:09 pve2 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Nov 24 03:45:09 pve2 kernel: RBP: ffffd0204165f850 R08: 0000000000000000 R09: 0000000000000000
Nov 24 03:45:09 pve2 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8ec8d0a80000
Nov 24 03:45:09 pve2 kernel: R13: 0000000000000000 R14: 0000000000000001 R15: ffff8ec8d0a98bc8
Nov 24 03:45:09 pve2 kernel: FS:  000076f410a40740(0000) GS:ffff8ecb39286000(0000) knlGS:0000000000000000
Nov 24 03:45:09 pve2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 24 03:45:09 pve2 kernel: CR2: 00007d742b2230e0 CR3: 0000000169e2d000 CR4: 0000000000f50ef0
Nov 24 03:45:09 pve2 kernel: PKRU: 55555554
Nov 24 03:45:09 pve2 kernel: Call Trace:
Nov 24 03:45:09 pve2 kernel:  <TASK>
Nov 24 03:45:09 pve2 kernel:  amdgpu_fence_driver_hw_fini+0x115/0x150 [amdgpu]
Nov 24 03:45:09 pve2 kernel:  amdgpu_device_fini_hw+0xef/0x3a1 [amdgpu]
Nov 24 03:45:09 pve2 kernel:  ? blocking_notifier_chain_unregister+0x38/0x70
Nov 24 03:45:09 pve2 kernel:  amdgpu_driver_unload_kms+0x4f/0x60 [amdgpu]
Nov 24 03:45:09 pve2 kernel:  amdgpu_driver_load_kms.cold+0x19/0x2f [amdgpu]
Nov 24 03:45:09 pve2 kernel:  amdgpu_pci_probe+0x1f6/0x4c0 [amdgpu]
Nov 24 03:45:09 pve2 kernel:  local_pci_probe+0x44/0xa0
Nov 24 03:45:09 pve2 kernel:  pci_device_probe+0xe9/0x280
Nov 24 03:45:09 pve2 kernel:  really_probe+0xf6/0x370
Nov 24 03:45:09 pve2 kernel:  ? pm_runtime_barrier+0x55/0xa0
Nov 24 03:45:09 pve2 kernel:  __driver_probe_device+0x8c/0x140
Nov 24 03:45:09 pve2 kernel:  device_driver_attach+0x55/0xe0
Nov 24 03:45:09 pve2 kernel:  bind_store+0x77/0xd0
Nov 24 03:45:09 pve2 kernel:  drv_attr_store+0x21/0x50
Nov 24 03:45:09 pve2 kernel:  sysfs_kf_write+0x6f/0x90
Nov 24 03:45:09 pve2 kernel:  kernfs_fop_write_iter+0x15e/0x210
Nov 24 03:45:09 pve2 kernel:  vfs_write+0x271/0x490
Nov 24 03:45:09 pve2 kernel:  ksys_write+0x6f/0xf0
Nov 24 03:45:09 pve2 kernel:  __x64_sys_write+0x19/0x30
Nov 24 03:45:09 pve2 kernel:  x64_sys_call+0x79/0x2330
Nov 24 03:45:09 pve2 kernel:  do_syscall_64+0x80/0xa30
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? filp_flush+0x5e/0xb0
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? filp_close+0x1f/0x30
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? do_dup2+0xc2/0x160
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? get_close_on_exec+0x34/0x50
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? ksys_dup3+0x9d/0x120
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? __x64_sys_dup2+0x2e/0xd0
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? x64_sys_call+0x1361/0x2330
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? do_syscall_64+0xb8/0xa30
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? __x64_sys_fcntl+0x97/0x130
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? x64_sys_call+0x1b7a/0x2330
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? do_syscall_64+0xb8/0xa30
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? x64_sys_call+0x1b7a/0x2330
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? do_syscall_64+0xb8/0xa30
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? x64_sys_call+0x1bf2/0x2330
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? do_syscall_64+0xb8/0xa30
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? handle_mm_fault+0x254/0x370
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? do_user_addr_fault+0x2f8/0x830
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? irqentry_exit_to_user_mode+0x2e/0x290
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? irqentry_exit+0x43/0x50
Nov 24 03:45:09 pve2 kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Nov 24 03:45:09 pve2 kernel:  ? exc_page_fault+0x90/0x1b0
Nov 24 03:45:09 pve2 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Nov 24 03:45:09 pve2 kernel: RIP: 0033:0x76f410ad2687
Nov 24 03:45:09 pve2 kernel: Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff ff ff
Nov 24 03:45:09 pve2 kernel: RSP: 002b:00007fff045b6850 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
Nov 24 03:45:09 pve2 kernel: RAX: ffffffffffffffda RBX: 000076f410a40740 RCX: 000076f410ad2687
Nov 24 03:45:09 pve2 kernel: RDX: 000000000000000d RSI: 00005d228a8bd140 RDI: 0000000000000001
Nov 24 03:45:09 pve2 kernel: RBP: 00005d228a8bd140 R08: 0000000000000000 R09: 0000000000000000
Nov 24 03:45:09 pve2 kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 000000000000000d
Nov 24 03:45:09 pve2 kernel: R13: 000076f410c2b5c0 R14: 000076f410c28e80 R15: 0000000000000000
Nov 24 03:45:09 pve2 kernel:  </TASK>
Nov 24 03:45:09 pve2 kernel: ---[ end trace 0000000000000000 ]---

Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] Loading DMUB firmware via PSP: version=0x08005300
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Found VCN firmware Version ENC: 1.24 DEC: 9 VEP: 0 Revision: 22
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: psp reg (0x16080) wait timed out, mask: 8000ffff, read: 30000 exp: 80000000
Nov 23 23:13:17 pve2 kernel: [drm:psp_v13_0_4_ring_create [amdgpu]] *ERROR* Failed to wait for trust OS ready for ring creation
Nov 23 23:13:17 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: PSP create ring failed!
Nov 23 23:13:18 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: psp reg (0x16080) wait timed out, mask: 8000ffff, read: 30000 exp: 80000000
Nov 23 23:13:18 pve2 kernel: [drm:psp_v13_0_4_ring_destroy [amdgpu]] *ERROR* Fail to stop psp ring
Nov 23 23:13:18 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: PSP firmware loading failed
Nov 23 23:13:18 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: hw_init of IP block <psp> failed -22
Nov 23 23:13:18 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu_device_ip_init failed
Nov 23 23:13:18 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Fatal error during GPU init

Nov 23 23:13:18 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: finishing device.

The following operations will function correctly until the vfio-pci driver is bind.
Code:
echo "0000:10:00.0" > /sys/bus/pci/drivers/amdgpu/bind 2>/dev/null
echo "0000:10:00.0" > /sys/bus/pci/drivers/amdgpu/unbind 2>/dev/null
echo "0000:10:00.0" > /sys/bus/pci/drivers/amdgpu/bind 2>/dev/null
   :

Error
Code:
echo "0000:10:00.0" > /sys/bus/pci/drivers/amdgpu/unbind 2>/dev/null
   :
VM Start & Shutdown
   :
echo "0000:10:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null
echo "0000:10:00.0" > /sys/bus/pci/drivers/amdgpu/bind 2>/dev/null
 
Last edited:
On the Ryzen 7 7700, you can change the driver using the command as shown below, switching between drivers such as amdgpu - vfio-pci - amdgpu - vfio-pci...

Code:
> echo "0000:10:00.0" > /sys/bus/pci/drivers/amdgpu/unbind 2>/dev/null

Sep 20 13:22:56 pve2 kernel: Console: switching to colour dummy device 80x25
Sep 20 13:22:56 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: finishing device.
Sep 20 13:22:56 pve2 kernel: [drm] amdgpu: ttm finalized

> echo "0000:10:00.0" > /sys/bus/pci/drivers/vfio-pci/bind 2>/dev/null

Sep 20 13:23:38 pve2 kernel: vfio-pci 0000:10:00.0: vgaarb: deactivate vga console
Sep 20 13:23:38 pve2 kernel: vfio-pci 0000:10:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=none

> echo "0000:10:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null

Sep 20 13:24:24 pve2 kernel: vfio-pci 0000:10:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=none

> echo "0000:10:00.0" > /sys/bus/pci/drivers/amdgpu/bind 2>/dev/null

Sep 20 13:24:58 pve2 kernel: [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x164E 0x1849:0x364E 0xC5).
Sep 20 13:24:58 pve2 kernel: [drm] register mmio base: 0xF4D00000
Sep 20 13:24:58 pve2 kernel: [drm] register mmio size: 524288
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 0 <nv_common>
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 1 <gmc_v10_0>
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 2 <navi10_ih>
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 3 <psp>
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 4 <smu>
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 5 <dm>
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 6 <gfx_v10_0>
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 7 <sdma_v5_2>
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 8 <vcn_v3_0>
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 9 <jpeg_v3_0>
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Fetched VBIOS from VFCT
Sep 20 13:24:58 pve2 kernel: amdgpu: ATOM BIOS: 102-RAPHAEL-008
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: vgaarb: deactivate vga console
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
Sep 20 13:24:58 pve2 kernel: [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: VRAM: 16384M 0x000000F400000000 - 0x000000F7FFFFFFFF (16384M used)
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
Sep 20 13:24:58 pve2 kernel: [drm] Detected VRAM RAM=16384M, BAR=16384M
Sep 20 13:24:58 pve2 kernel: [drm] RAM width 128bits DDR5
Sep 20 13:24:58 pve2 kernel: [drm] amdgpu: 16384M of VRAM memory ready
Sep 20 13:24:58 pve2 kernel: [drm] amdgpu: 40003M of GTT memory ready.
Sep 20 13:24:58 pve2 kernel: [drm] GART: num cpu pages 262144, num gpu pages 262144
Sep 20 13:24:58 pve2 kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F7FFC00000).
Sep 20 13:24:58 pve2 kernel: [drm] Loading DMUB firmware via PSP: version=0x05002800
Sep 20 13:24:58 pve2 kernel: [drm] use_doorbell being set to: [true]
Sep 20 13:24:58 pve2 kernel: [drm] Found VCN firmware Version ENC: 1.33 DEC: 4 VEP: 0 Revision: 6
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: reserve 0xa00000 from 0xf7fe000000 for PSP TMR
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: RAS: optional ras ta ucode is not available
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: RAP: optional rap ta ucode is not available
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: SMU is initialized successfully!
Sep 20 13:24:58 pve2 kernel: [drm] Seamless boot condition check passed
Sep 20 13:24:58 pve2 kernel: [drm] Display Core v3.2.316 initialized on DCN 3.1.5
Sep 20 13:24:58 pve2 kernel: [drm] DP-HDMI FRL PCON supported
Sep 20 13:24:58 pve2 kernel: [drm] DMUB hardware initialized: version=0x05002800
Sep 20 13:24:58 pve2 kernel: snd_hda_intel 0000:10:00.1: bound 0000:10:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Sep 20 13:24:58 pve2 kernel: [drm] kiq ring mec 2 pipe 1 q 0
Sep 20 13:24:58 pve2 kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Sep 20 13:24:58 pve2 kernel: kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
Sep 20 13:24:58 pve2 kernel: amdgpu: Virtual CRAT table created for GPU
Sep 20 13:24:58 pve2 kernel: amdgpu: Topology: Add dGPU node [0x164e:0x1002]
Sep 20 13:24:58 pve2 kernel: kfd kfd: amdgpu: added device 1002:164e
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 2, active_cu_number 2
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring gfx_0.1.0 uses VM inv eng 1 on hub 0
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 4 on hub 0
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 5 on hub 0
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 12 on hub 0
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring sdma0 uses VM inv eng 13 on hub 0
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Runtime PM not available
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: [drm] Registered 4 planes with drm panic
Sep 20 13:24:58 pve2 kernel: [drm] Initialized amdgpu 3.61.0 for 0000:10:00.0 on minor 0
Sep 20 13:24:58 pve2 kernel: fbcon: amdgpudrmfb (fb0) is primary device
Sep 20 13:24:58 pve2 kernel: Console: switching to colour frame buffer device 240x67
Sep 20 13:24:58 pve2 kernel: amdgpu 0000:10:00.0: [drm] fb0: amdgpudrmfb frame buffer device
 
On the 8700G, unbinding and bind amdgpu is possible, but switching between configurations like amdgpu - vfio-pci - amdgpu - vfio-pci is not achievable. After unbind amdgpu, you cannot bind vfio-pci. vfio-pci is only bound during VM startup.

As a result, amdgpu bind fails with an error, and the amdgpu driver becomes unusable until the Proxmox host is restarted.

Code:
boot

Nov 24 12:51:52 pve2 kernel: [drm] amdgpu kernel modesetting enabled.
Nov 24 12:51:52 pve2 kernel: amdgpu: Virtual CRAT table created for CPU
Nov 24 12:51:52 pve2 kernel: amdgpu: Topology: Add CPU node
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: enabling device (0006 -> 0007)
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: initializing kernel modesetting (IP DISCOVERY 0x1002:0x15BF 0x1849:0x35BF 0x06).
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: register mmio base: 0xF4F00000
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: register mmio size: 524288
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 0 <soc21_common>
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 1 <gmc_v11_0>
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 2 <ih_v6_0>
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 3 <psp>
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 4 <smu>
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 5 <dm>
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 6 <gfx_v11_0>
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 7 <sdma_v6_0>
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 8 <vcn_v4_0>
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 9 <jpeg_v4_0>
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 10 <mes_v11_0>
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Fetched VBIOS from VFCT
Nov 24 12:51:52 pve2 kernel: amdgpu: ATOM BIOS: 113-PHXGENERIC-001
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: vgaarb: deactivate vga console
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: VRAM: 2048M 0x0000008000000000 - 0x000000807FFFFFFF (2048M used)
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
Nov 24 12:51:52 pve2 kernel: [drm] Detected VRAM RAM=2048M, BAR=2048M
Nov 24 12:51:52 pve2 kernel: [drm] RAM width 64bits DDR5
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: 2048M of VRAM memory ready
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: 14829M of GTT memory ready.
Nov 24 12:51:52 pve2 kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072
Nov 24 12:51:52 pve2 kernel: [drm] PCIE GART of 512M enabled (table at 0x000000807FD00000).
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] Loading DMUB firmware via PSP: version=0x08005300
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Found VCN firmware Version ENC: 1.24 DEC: 9 VEP: 0 Revision: 22
Nov 24 12:51:52 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: reserve 0x4a00000 from 0x8070000000 for PSP TMR
Nov 24 12:51:52 pve2 kernel: mt7921e 0000:06:00.0 wlp6s0: renamed from wlan0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: RAS: optional ras ta ucode is not available
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: RAP: optional rap ta ucode is not available
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is not available
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: SMU is initialized successfully!
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] Display Core v3.2.340 initialized on DCN 3.1.4
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] DP-HDMI FRL PCON supported
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] DMUB hardware initialized: version=0x08005300
Nov 24 12:51:53 pve2 kernel: snd_hda_intel 0000:10:00.1: bound 0000:10:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
Nov 24 12:51:53 pve2 kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Nov 24 12:51:53 pve2 kernel: kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
Nov 24 12:51:53 pve2 kernel: amdgpu: Virtual CRAT table created for GPU
Nov 24 12:51:53 pve2 kernel: amdgpu: Topology: Add dGPU node [0x15bf:0x1002]
Nov 24 12:51:53 pve2 kernel: kfd kfd: amdgpu: added device 1002:15bf
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 6, active_cu_number 12
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Runtime PM not available
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: [drm] Registered 4 planes with drm panic
Nov 24 12:51:53 pve2 kernel: [drm] Initialized amdgpu 3.64.0 for 0000:10:00.0 on minor 0
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] Failed to setup vendor infoframe on connector HDMI-A-1: -22
Nov 24 12:51:53 pve2 kernel: fbcon: amdgpudrmfb (fb0) is primary device
Nov 24 12:51:53 pve2 kernel: fbcon: Deferring console take-over
Nov 24 12:51:53 pve2 kernel: amdgpu 0000:10:00.0: [drm] fb0: amdgpudrmfb frame buffer device

> echo "0000:10:00.0" > /sys/bus/pci/drivers/amdgpu/unbind 2>/dev/null

Nov 24 12:53:51 pve2 kernel: Console: switching to colour dummy device 80x25
Nov 24 12:53:51 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: finishing device.
Nov 24 12:53:51 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: ttm finalized

> echo "0000:10:00.0" > /sys/bus/pci/drivers/amdgpu/bind 2>/dev/null

Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: initializing kernel modesetting (IP DISCOVERY 0x1002:0x15BF 0x1849:0x35BF 0x06).
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: register mmio base: 0xF4F00000
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: register mmio size: 524288
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 0 <soc21_common>
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 1 <gmc_v11_0>
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 2 <ih_v6_0>
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 3 <psp>
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 4 <smu>
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 5 <dm>
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 6 <gfx_v11_0>
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 7 <sdma_v6_0>
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 8 <vcn_v4_0>
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 9 <jpeg_v4_0>
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: detected ip block number 10 <mes_v11_0>
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Fetched VBIOS from VFCT
Nov 24 12:54:21 pve2 kernel: amdgpu: ATOM BIOS: 113-PHXGENERIC-001
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: vgaarb: deactivate vga console
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: VRAM: 2048M 0x0000008000000000 - 0x000000807FFFFFFF (2048M used)
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
Nov 24 12:54:21 pve2 kernel: [drm] Detected VRAM RAM=2048M, BAR=2048M
Nov 24 12:54:21 pve2 kernel: [drm] RAM width 64bits DDR5
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: 2048M of VRAM memory ready
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: 14829M of GTT memory ready.
Nov 24 12:54:21 pve2 kernel: [drm] GART: num cpu pages 131072, num gpu pages 131072
Nov 24 12:54:21 pve2 kernel: [drm] PCIE GART of 512M enabled (table at 0x000000807FD00000).
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] Loading DMUB firmware via PSP: version=0x08005300
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Found VCN firmware Version ENC: 1.24 DEC: 9 VEP: 0 Revision: 22
Nov 24 12:54:21 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: reserve 0x4a00000 from 0x8070000000 for PSP TMR
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: RAS: optional ras ta ucode is not available
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: RAP: optional rap ta ucode is not available
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: SECUREDISPLAY: optional securedisplay ta ucode is not available
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: SMU is initialized successfully!
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] Display Core v3.2.340 initialized on DCN 3.1.4
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] DP-HDMI FRL PCON supported
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] DMUB hardware initialized: version=0x08005300
Nov 24 12:54:22 pve2 kernel: snd_hda_intel 0000:10:00.1: bound 0000:10:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: [drm] PSR support 0, DC PSR ver -1, sink PSR ver 0 DPCD caps 0x0 su_y_granularity 0
Nov 24 12:54:22 pve2 kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Nov 24 12:54:22 pve2 kernel: kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
Nov 24 12:54:22 pve2 kernel: amdgpu: Virtual CRAT table created for GPU
Nov 24 12:54:22 pve2 kernel: amdgpu: Topology: Add dGPU node [0x15bf:0x1002]
Nov 24 12:54:22 pve2 kernel: kfd kfd: amdgpu: added device 1002:15bf
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: SE 1, SH per SE 2, CU per SH 6, active_cu_number 12
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: Runtime PM not available
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: [drm] Registered 4 planes with drm panic
Nov 24 12:54:22 pve2 kernel: [drm] Initialized amdgpu 3.64.0 for 0000:10:00.0 on minor 0
Nov 24 12:54:22 pve2 kernel: amdgpu 0000:10:00.0: [drm] Cannot find any crtc or sizes
Nov 24 12:54:22 pve2 kernel: fbcon: amdgpudrmfb (fb0) is primary device
Nov 24 12:54:23 pve2 kernel: Console: switching to colour frame buffer device 240x67
Nov 24 12:54:23 pve2 kernel: amdgpu 0000:10:00.0: [drm] fb0: amdgpudrmfb frame buffer device

> echo "0000:10:00.0" > /sys/bus/pci/drivers/amdgpu/unbind 2>/dev/null
> echo "0000:10:00.0" > /sys/bus/pci/drivers/vfio-pci/bind 2>/dev/null
> lspci -ks 0000:10:00.0

Nov 24 12:55:36 pve2 kernel: Console: switching to colour dummy device 80x25
Nov 24 12:55:36 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: finishing device.
Nov 24 12:55:36 pve2 kernel: amdgpu 0000:10:00.0: amdgpu: amdgpu: ttm finalized

!!! vfio-pci/bind has no logs

10:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Phoenix1 (rev 06)
        Subsystem: ASRock Incorporation Device 35bf
        !!! The kernel driver is not in use, and the vfio-pci driver is not bind.
        Kernel modules: amdgpu
 
Last edited:
Code:
> VM Start (It displays correctly)

Nov 24 12:59:42 pve2 kernel: snd_hda_intel 0000:10:00.1: GPU sound probed, but not operational: please add a quirk to driver_denylist
Nov 24 12:59:47 pve2 kernel: vfio-pci 0000:10:00.0: vgaarb: deactivate vga console
Nov 24 12:59:47 pve2 kernel: vfio-pci 0000:10:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=none
Nov 24 12:59:47 pve2 kernel: vfio-pci 0000:10:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=none
Nov 24 12:59:47 pve2 kernel: vfio-pci 0000:10:00.0: vgaarb: deactivate vga console
Nov 24 12:59:47 pve2 kernel: vfio-pci 0000:10:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=none
Nov 24 12:59:47 pve2 pvedaemon[5327]: error writing '1' to '/sys/bus/pci/devices/0000:10:00.0/reset': Inappropriate ioctl for device
Nov 24 12:59:47 pve2 pvedaemon[5327]: failed to reset PCI device '0000:10:00.0', but trying to continue as not all devices need a reset
Nov 24 12:59:47 pve2 kernel: vfio-pci 0000:10:00.0: resetting
Nov 24 12:59:47 pve2 kernel: vfio-pci 0000:10:00.0: reset done
Nov 24 12:59:47 pve2 kernel: vfio-pci 0000:10:00.1: resetting
Nov 24 12:59:47 pve2 kernel: vfio-pci 0000:10:00.1: reset done
Nov 24 12:59:49 pve2 kernel: vfio-pci 0000:10:00.0: resetting
Nov 24 12:59:49 pve2 kernel: vfio-pci 0000:10:00.0: reset done
Nov 24 12:59:49 pve2 kernel: vfio-pci 0000:10:00.1: resetting
Nov 24 12:59:49 pve2 kernel: vfio-pci 0000:10:00.1: reset done
Nov 24 12:59:50 pve2 kernel: vfio-pci 0000:10:00.1: resetting
Nov 24 12:59:50 pve2 kernel: vfio-pci 0000:10:00.1: reset done

> echo "0000:10:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null

Nov 24 13:02:51 pve2 kernel: vfio-pci 0000:10:00.0: vgaarb: VGA decodes changed: olddecodes=io+mem,decodes=io+mem:owns=none

> echo "0000:10:00.0" > /sys/bus/pci/drivers/amdgpu/bind 2>/dev/null

The error described in 1
 
It functions normally at startup and until vfio-pci is bound, so a failure of the CPU's integrated GPU seems unlikely, and there also appear to be no driver or firmware issues.

Questions

・Why does manually binding vfio-pci succeed on the 7700 but fail on the 8700G?

・Why using vfio-pci renders amdgpu unusable only on the 8700G

I'd appreciate any advice, no matter how minor.

* The 7700 iGPU and RX 9070 XT, RX 9070 all work without issues using amdgpu in the same manner.
* After shutting down the virtual machine, once the amdgpu driver becomes available, I believe the system will be in the state immediately after Proxmox VE startup. Therefore, I expect that by using hook scripts, the iGPU will function properly during subsequent virtual machine startups.
 
Last edited:
The hardware is different and might be handled differently (at least parts of it) in the amdgpu driver. It might be the driver inside the VM leaves the GPU in a state that amdgpu driver currently cannot handle.

Maybe test unbinding amdgpu and rebinding amdgpu (without vfio-pci and without starting/stopping the VM inbetween) to figure out the least amount of steps to reproduce this.
Maybe then report this issue (amdgpu -> either no driver or vfio-pci or KVM VM run -> amdgpu) to the upstream kernel maintainers at Ubuntu of whoever maintains the open-source amdgpu driver.

You might think that stopping the VM returns everything to the exact situation as before starting the VM but the actual GPU hardware also has internal state that may interfere if the driver(s) do not handle it properly (which is probably a bug in the driver or possibly hardware). This is one of the various caveats of PCI(e) passthrough, unfortunately.

EDIT: Sometimes there are known work-arounds to get the hardware to behave and to pass it through to VMs. Passing it back to the host is not that common and I don't know about work-arounds. That's why I suggest testing to see if the problem also occurs without a VM, which would convince people more easily that it is a driver bug.
 
Last edited:
  • Like
Reactions: uzumo
Thank you.

I looked into it and tried it, but this doesn't seem like a simple matter.

・Simple amdgpu - no driver, no driver - amdgpu ... always succeeds

・amdgpu - no driver - vfio-pci (VM start/stop) - no driver - amdgpu fails

Without multiple CPUs, it seems difficult to reliably isolate the cause of the problem, and it appears challenging for me.
 
I assumed you could try all this via the Linux command-line remotely with a SSH from a different system. If not, then it is indeed difficult to do without a display.
Can you test amdgpu -> no driver -> vfiopci (without VM) -> no driver -> amdgpu? You can bind vfio-pci explicitly yourself, like you (I assume) do with amdgpu
Does it matter if you use a Windows VM or a Linux VM (booting a Live Ubuntu installer ISO for example)?
 
Can you test amdgpu -> no driver -> vfiopci (without VM) -> no driver -> amdgpu?

We have confirmed that this is possible on a computer with a Ryzen 7 7700, but not on a computer with a Ryzen 7 8700G.

There are no command failures or errors, but the result is that vfio-pci is not bound.

You can bind vfio-pci explicitly yourself, like you (I assume) do with amdgpu

Unbinding amdgpu and starting the VM forces vfio-pci to bind, but this was the only method that successfully bound vfio-pci on the 8700G.

* Early binding and similar methods have not been tested.

After stopping the VM, commands like `vfio-pci - no driver - vfio-pci - no driver` succeed, but `amdgpu` fails.

* On Ryzen 7 7700, all of these succeed and function normally even after rebooting.