Hi Mates!
My AMD GPU can't be accessed after VM poweroff.
When I first turn on the host, start VM, the GPU which is passthroughed can work normally ( for me the GPU is 5700XT), and I can also see the VM screen on my monitor.
But after I shutdown the VM, I would see this error:
I googled for weeks but I can't solve it. Maybe you can provide some ideas and help
I have tried a lot , but nothing worked
1. [NOT WORK] Mannually release/attach the PCIE device
2. [NOT WORK] Installing vendor-reset on Proxmox - Working around the AMD GPU Reset bug on Proxmox using vendor-reset
3. [NOT WORK] Turn off Resize BAR : Successfully Passthrough Sapphire Pulse RX 6700XT (12GB) to win 11 on Proxmox 7.2 (also fixes error 43 on windows while installing drivers) : r/VFIO (reddit.com)
4. [NOT WORK] Add
5. [NOT WORK] Add
These problems looked similar to mine, but I felt completely different. What I need to solve is the inaccessible problem of the PCIE device after the VM shutting down.
Looking forward to your generous answers.
Motherboard: Gigabyte B460M AOURS PRO
GPU: XFX 5700XT Ultra & UHD630
Start VM with AMD GPU Passthrough and then Turn off the VM
And try to start VM again
My AMD GPU can't be accessed after VM poweroff.
When I first turn on the host, start VM, the GPU which is passthroughed can work normally ( for me the GPU is 5700XT), and I can also see the VM screen on my monitor.
But after I shutdown the VM, I would see this error:
vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
. And I can't re-start the VM because this problem until I restart the host.kvm: ../hw/pci/pci.c:1613: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed.
I googled for weeks but I can't solve it. Maybe you can provide some ideas and help
I have tried a lot , but nothing worked
1. [NOT WORK] Mannually release/attach the PCIE device
2. [NOT WORK] Installing vendor-reset on Proxmox - Working around the AMD GPU Reset bug on Proxmox using vendor-reset
3. [NOT WORK] Turn off Resize BAR : Successfully Passthrough Sapphire Pulse RX 6700XT (12GB) to win 11 on Proxmox 7.2 (also fixes error 43 on windows while installing drivers) : r/VFIO (reddit.com)
4. [NOT WORK] Add
initcall_blacklist=sysfb_init
to kernel parameter5. [NOT WORK] Add
disable_idle_d3=1
, amdgpu.runpm=0
to kernel parameterThese problems looked similar to mine, but I felt completely different. What I need to solve is the inaccessible problem of the PCIE device after the VM shutting down.
Looking forward to your generous answers.
Hardware Info:
CPU: I5-10400Motherboard: Gigabyte B460M AOURS PRO
GPU: XFX 5700XT Ultra & UHD630
BIOS Settings:
BIOS Version: F7, BIOS Date 06/27/2023 (the latest version)Above 4G Decoding | Enabled |
Resize BAR Support | Disabled |
ErP | Disabled |
CSM Support | Enabled |
Internal Graphics | Enabled |
Platform Power Management -- PEG ASPM -- PCH ASPM -- DMI ASPM | Enabled -- Enbaled -- Enbaled -- Enbaled |
Initial Display Output | IGFX (iGPU) |
RC6 (Render Standby) | Enbaled |
PVE Version
Code:
proxmox-ve: 8.0.1 (running kernel: 6.2.16-4-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.3
pve-kernel-6.2.16-4-pve: 6.2.16-5
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph: 17.2.6-pve1+3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.6
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.4
libpve-rs-perl: 0.8.4
libpve-storage-perl: 8.0.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
openvswitch-switch: 3.1.0-2
proxmox-backup-client: 3.0.1-1
proxmox-backup-file-restore: 3.0.1-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.2.0
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.6
pve-cluster: 8.0.2
pve-container: 5.0.4
pve-docs: 8.0.4
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.5
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1
PCIe Passthrough Settings
/etc/modprobe.d/blacklist.conf
blacklist nvidiafb
blacklist nouveau
blacklist nvidia
blacklist radeon
blacklist amdgpu
/etc/modprobe.d/vfio.conf
options vfio-pci ids=1002:1478,1002:1479,1002:731f,1002:ab38 disable_vga=1 disable_idle_d3=1[/CODE]
/etc/default/grub :
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt i915.enable_gvt=1 amdgpu.runpm=0 initcall_blacklist=sysfb_init"
/etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1 report_ignored_msrs=0
/etc/modprobe.d/dkms.conf
# nothing.... Just a empty file
lspci -kk
code_language.shell:
00:00.0 Host bridge: Intel Corporation Comet Lake-S 6c Host Bridge/DRAM Controller (rev 03)
DeviceName: Onboard - Other
Subsystem: Gigabyte Technology Co., Ltd Comet Lake-S 6c Host Bridge/DRAM Controller
Kernel driver in use: skl_uncore
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 03)
Subsystem: Gigabyte Technology Co., Ltd 6th-10th Gen Core Processor PCIe Controller (x16)
Kernel driver in use: pcieport
00:02.0 VGA compatible controller: Intel Corporation CometLake-S GT2 [UHD Graphics 630] (rev 03)
DeviceName: Onboard - Video
Subsystem: Gigabyte Technology Co., Ltd CometLake-S GT2 [UHD Graphics 630]
Kernel driver in use: i915
Kernel modules: i915
00:14.0 USB controller: Intel Corporation Comet Lake PCH-V USB Controller
DeviceName: Onboard - Other
Subsystem: Gigabyte Technology Co., Ltd Comet Lake PCH-V USB Controller
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
00:14.2 Signal processing controller: Intel Corporation Comet Lake PCH-V Thermal Subsystem
DeviceName: Onboard - Other
Subsystem: Intel Corporation Comet Lake PCH-V Thermal Subsystem
00:16.0 Communication controller: Intel Corporation Comet Lake PCH-V HECI Controller
DeviceName: Onboard - Other
Subsystem: Gigabyte Technology Co., Ltd Comet Lake PCH-V HECI Controller
Kernel driver in use: mei_me
Kernel modules: mei_me
00:17.0 SATA controller: Intel Corporation 400 Series Chipset Family SATA AHCI Controller
DeviceName: Onboard - SATA
Subsystem: Gigabyte Technology Co., Ltd 400 Series Chipset Family SATA AHCI Controller
Kernel driver in use: ahci
Kernel modules: ahci
00:1b.0 PCI bridge: Intel Corporation Device a3e9 (rev f0)
Subsystem: Gigabyte Technology Co., Ltd Device 5001
Kernel driver in use: pcieport
00:1b.4 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #21 (rev f0)
Subsystem: Gigabyte Technology Co., Ltd Comet Lake PCI Express Root Port
Kernel driver in use: pcieport
00:1c.0 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port #05 (rev f0)
Subsystem: Gigabyte Technology Co., Ltd Comet Lake PCI Express Root Port
Kernel driver in use: pcieport
00:1d.0 PCI bridge: Intel Corporation Comet Lake PCI Express Root Port 9 (rev f0)
Subsystem: Gigabyte Technology Co., Ltd Comet Lake PCI Express Root Port 9
Kernel driver in use: pcieport
00:1f.0 ISA bridge: Intel Corporation B460 Chipset LPC/eSPI Controller
DeviceName: Onboard - Other
Subsystem: Gigabyte Technology Co., Ltd B460 Chipset LPC/eSPI Controller
00:1f.2 Memory controller: Intel Corporation Cannon Lake PCH Power Management Controller
DeviceName: Onboard - Other
Subsystem: Gigabyte Technology Co., Ltd Cannon Lake PCH Power Management Controller
00:1f.3 Audio device: Intel Corporation Comet Lake PCH-V cAVS
DeviceName: Onboard - Sound
Subsystem: Gigabyte Technology Co., Ltd Comet Lake PCH-V cAVS
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel, snd_soc_avs, snd_sof_pci_intel_cnl
00:1f.4 SMBus: Intel Corporation Comet Lake PCH-V SMBus Host Controller
DeviceName: Onboard - Other
Subsystem: Gigabyte Technology Co., Ltd Comet Lake PCH-V SMBus Host Controller
Kernel driver in use: i801_smbus
Kernel modules: i2c_i801
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (12) I219-V
DeviceName: Onboard - Ethernet
Subsystem: Gigabyte Technology Co., Ltd Ethernet Connection (12) I219-V
Kernel driver in use: e1000e
Kernel modules: e1000e
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c1)
Kernel driver in use: pcieport
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
Kernel driver in use: pcieport
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (rev c1)
Subsystem: XFX Pine Group Inc. RX 5700 XT RAW II
Kernel driver in use: vfio-pci
Kernel modules: amdgpu
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
05:00.0 Non-Volatile memory controller: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive (rev 03)
Subsystem: ADATA Technology Co., Ltd. XPG SX8200 Pro PCIe Gen3x4 M.2 2280 Solid State Drive
Kernel driver in use: nvme
Kernel modules: nvme
07:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961/SM963
Subsystem: Samsung Electronics Co Ltd SM963 2.5" NVMe PCIe SSD
Kernel driver in use: nvme
Kernel modules: nvme
Full dmesg outputs : See the attach file
Start VM with AMD GPU Passthrough and then Turn off the VM
[ 247.873508] device tap1502i0 entered promiscuous mode
[ 247.890081] vmbr0: port 2(tap1502i0) entered blocking state
[ 247.890084] vmbr0: port 2(tap1502i0) entered disabled state
[ 247.890174] vmbr0: port 2(tap1502i0) entered blocking state
[ 247.890176] vmbr0: port 2(tap1502i0) entered forwarding state
[ 248.781578] vfio-pci 0000:03:00.0: enabling device (0000 -> 0003)
[ 248.781802] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
[ 248.781810] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
[ 248.781813] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x25@0x400
[ 248.781814] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x26@0x410
[ 248.781815] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0x27@0x440
[ 248.802828] vfio-pci 0000:03:00.1: enabling device (0000 -> 0002)
[ 282.585658] vmbr0: port 2(tap1502i0) entered disabled state
[ 284.800841] pcieport 0000:02:00.0: Data Link Layer Link Active not set in 1000 msec
[ 284.922295] vfio-pci 0000:03:00.1: Unable to change power state from D0 to D3hot, device inaccessible
[ 284.922990] vfio-pci 0000:03:00.0: Unable to change power state from D0 to D3hot, device inaccessible
And try to start VM again
Code:
root@pve:~# qm start 1502
WARN: no efidisk configured! Using temporary efivars disk.
kvm: ../hw/pci/pci.c:1613: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed.
start failed: QEMU exited with code 1
[ 368.198631] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 368.199455] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 368.202901] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 368.871106] device tap1502i0 entered promiscuous mode
[ 368.889539] vmbr0: port 2(tap1502i0) entered blocking state
[ 368.889542] vmbr0: port 2(tap1502i0) entered disabled state
[ 368.889621] vmbr0: port 2(tap1502i0) entered blocking state
[ 368.889622] vmbr0: port 2(tap1502i0) entered forwarding state
[ 369.756871] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 369.756905] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 369.756965] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 369.759152] vfio-pci 0000:03:00.0: vfio_cap_init: hiding cap 0xff@0xff
[ 369.759155] vfio-pci 0000:03:00.0: vfio_cap_init: hiding cap 0xff@0xff
[ 369.759156] vfio-pci 0000:03:00.0: vfio_cap_init: hiding cap 0xff@0xff
------------A lot of "vfio-pci 0000:03:00.0: vfio_cap_init: hiding cap 0xff@0xff" output--------------
[ 369.759212] vfio-pci 0000:03:00.0: vfio_cap_init: hiding cap 0xff@0xff
[ 369.759213] vfio-pci 0000:03:00.0: vfio_cap_init: hiding cap 0xff@0xff
[ 369.759215] vfio-pci 0000:03:00.0: vfio_cap_init: hiding cap 0xff@0xff
[ 369.759216] vfio-pci 0000:03:00.0: vfio_cap_init: hiding cap 0xff@0xff
[ 369.759218] vfio-pci 0000:03:00.0: vfio_cap_init: hiding cap 0xff@0xff
[ 369.759219] vfio-pci 0000:03:00.0: vfio_cap_init: hiding cap 0xff@0xff
[ 369.759221] vfio-pci 0000:03:00.0: vfio_cap_init: hiding cap 0xff@0xff
[ 369.759222] vfio-pci 0000:03:00.0: vfio_cap_init: hiding cap 0xff@0xff
[ 369.759224] vfio-pci 0000:03:00.0: vfio_cap_init: hiding cap 0xff@0xff
[ 369.759226] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0xffff@0x100
[ 369.759227] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0xffff@0xffc
[ 369.759228] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0xffff@0xffc
[ 369.759230] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0xffff@0xffc
------------A lot of " vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0xffff@0xffc" output--------------
[ 369.760176] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0xffff@0xffc
[ 369.760177] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0xffff@0xffc
[ 369.760257] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0xffff@0xffc
[ 369.760258] vfio-pci 0000:03:00.0: vfio_ecap_init: hiding ecap 0xffff@0xffc
[ 369.790707] vmbr0: port 2(tap1502i0) entered disabled state
[ 369.790862] vmbr0: port 2(tap1502i0) entered disabled state
[ 369.978904] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 369.979869] vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
[ 369.980686] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 369.981348] vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
[ 372.046793] pcieport 0000:02:00.0: Data Link Layer Link Active not set in 1000 msec
[ 372.062717] vfio-pci 0000:03:00.1: Unable to change power state from D3cold to D0, device inaccessible
[ 372.062753] vfio-pci 0000:03:00.0: Unable to change power state from D3cold to D0, device inaccessible
Attachments
Last edited: