[SOLVED] Radeon 5700 Passthrough VM Restart Error on Proxmox

Fukuringa

New Member
Jan 27, 2025
11
1
3
Hello. I'm using a Radeon 5700 connected to Proxmox.
However, once this assigned VM is stopped, I cannot restart it without rebooting the host machine itself, as it displays the following error.
If you know a way to restart the VM without having to reboot the host machine every time, please let me know.

error writing '1' to '/sys/bus/pci/devices/0000:0a:00.0/reset': Inappropriate ioctl for device
failed to reset PCI device '0000:0a:00.0', but trying to continue as not all devices need a reset
swtpm_setup: Not overwriting existing state file.
kvm: ../hw/pci/pci.c:1633: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed.
stopping swtpm instance (pid 966489) due to QEMU startup error
TASK ERROR: start failed: QEMU exited with code 1

The following is vmconfig.
agent: 1
bios: ovmf
boot: order=sata0;net0
cores: 8
cpu: host
efidisk0: local-lvm:vm-507-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:0a:00,pcie=1,x-vga=1
machine: pc-q35-9.0
memory: 8192
meta: creation-qemu=9.0.2,ctime=1759991634
name: GPU
net0: virtio=BC:24:11:4F:66:0C,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
sata0: local-lvm:vm-507-disk-1,cache=writethrough,discard=on,size=64G
scsihw: virtio-scsi-single
smbios1: uuid=a9ee5442-17c5-4636-9830-f2011900e183
sockets: 1
tpmstate0: local-lvm:vm-507-disk-2,size=4M,version=v2.0
vmgenid: c359e7e1-ea1d-4540-bb1c-061958158521

I am Japanese and use translation tools, so there may be some awkward phrasing. If you have any questions or need further details, please feel free to contact me.
 
Did you install and enable vendor-reset: https://github.com/gnif/vendor-reset ? If not, you might wan to look into it (lots of threads on this forum and the internet) for properly resetting Radeon 5700 GPUs.
x-vga=1 (Primary GPU) is for NVidia and should usually not be enabled for AMD GPUs.
Thank you for your advice.
I installed vendor-reset, but it still hasn't fixed the issue.
I also tried applying it after rebooting using commands like
`echo vendor-reset >> /etc/modules-load.d/vendor-reset.conf`, but it didn't work.

I would greatly appreciate it if you could tell me how to fix this.
Please let me know if you need any additional i

root@server05:~# dmesg | grep -i vendor
[ 2.948724] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.08
[ 2.971000] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 6.08
[ 2.980540] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 6.08
[ 2.980919] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 6.08
[ 3.371537] usb 3-3: New USB device found, idVendor=04e6, idProduct=5810, bcdDevice= 2.00
[ 3.672703] usb 1-3: New USB device found, idVendor=046e, idProduct=5500, bcdDevice= 1.10
[ 3.977331] vendor_reset: loading out-of-tree module taints kernel.
[ 3.977335] vendor_reset: module verification failed: signature and/or required key missing - tainting kernel
[ 4.006981] vendor_reset_hook: installed
[ 4.128873] usb 1-6: New USB device found, idVendor=0511, idProduct=023f, bcdDevice= 1.00

root@server05:~# lspci -nnk -s 0a:00.0
0a:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c4)
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Reference RX 5700 XT [1002:0b36]
Kernel driver in use: vfio-pci
Kernel modules: amdgpu


error writing '1' to '/sys/bus/pci/devices/0000:0a:00.0/reset': Inappropriate ioctl for device
failed to reset PCI device '0000:0a:00.0', but trying to continue as not all devices need a reset
swtpm_setup: Not overwriting existing state file.
kvm: ../hw/pci/pci.c:1633: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed.
stopping swtpm instance (pid 26973) due to QEMU startup error
TASK ERROR: start failed: QEMU exited with code 1
 
I don't think this happens during a reboot, but are you referring to startup/shutdown as a reboot? This is not the same as a reboot. It is startup/shutdown.
* Since the process doesn't change during a restart, I don't think it will cause an error.

I don't have an RX 5700 myself, so I can't test it, but with your current settings, will the following steps resolve the issue during shutdown/startup?

1. Shut down/start up the Proxmox host to restore it to a normal state.
2. I suspect it's early bound, so running `lspci -ks 0000:0a:00.0` should produce results like this?

Kernel driver in use: vfio-pci
Kernel modules: amdgpu

3. After the Proxmox host boots up, start and then shut down the VM configured with PCI passthrough.
4. Execute the following command.

Code:
echo "0000:0a:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null
echo "0000:0a:00.1" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null

5. Can you start the VM with PCI passthrough configured and boot it without errors?


1. Shut down/start up the Proxmox host to restore it to a normal state.
2. I suspect it's early bound, so running `lspci -ks 0000:0a:00.0` should produce results like this?

Kernel driver in use: vfio-pci
Kernel modules: amdgpu

3. After the Proxmox host boots up, start and then shut down the VM configured with PCI passthrough.
4. Execute the following command.

Code:
echo 1 > /sys/bus/pci/devices/0000\:0a\:00.0/remove
echo 1 > /sys/bus/pci/devices/0000\:0a\:00.1/remove
echo 1 > /sys/bus/pci/rescan

5. Can you start the VM with PCI passthrough configured and boot it without errors?


Instead of a simple startup log, I think it's better to save the output of the `journalctl -e` command to a file and attach that file to the thread.
* Please avoid pasting logs directly into the thread. Long logs pasted into a thread are hard to read, and no one will look at them unless they're genuinely interested.
 
Last edited:
I don't think this happens during a reboot, but are you referring to startup/shutdown as a reboot? This is not the same as a reboot. It is startup/shutdown.
* Since the process doesn't change during a restart, I don't think it will cause an error.

I don't have an RX 5700 myself, so I can't test it, but with your current settings, will the following steps resolve the issue during shutdown/startup?

1. Shut down/start up the Proxmox host to restore it to a normal state.
2. I suspect it's early bound, so running `lspci -ks 0000:0a:00.0` should produce results like this?

Kernel driver in use: vfio-pci
Kernel modules: amdgpu

3. After the Proxmox host boots up, start and then shut down the VM configured with PCI passthrough.
4. Execute the following command.

Code:
echo "0000:0a:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null
echo "0000:0a:00.1" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null

5. Can you start the VM with PCI passthrough configured and boot it without errors?


1. Shut down/start up the Proxmox host to restore it to a normal state.
2. I suspect it's early bound, so running `lspci -ks 0000:0a:00.0` should produce results like this?

Kernel driver in use: vfio-pci
Kernel modules: amdgpu

3. After the Proxmox host boots up, start and then shut down the VM configured with PCI passthrough.
4. Execute the following command.

Code:
echo 1 > /sys/bus/pci/devices/0000\:0a\:00.0/remove
echo 1 > /sys/bus/pci/devices/0000\:0a\:00.1/remove
echo 1 > /sys/bus/pci/rescan

5. Can you start the VM with PCI passthrough configured and boot it without errors?


Instead of a simple startup log, I think it's better to save the output of the `journalctl -e` command to a file and attach that file to the thread.
* Please avoid pasting logs directly into the thread. Long logs pasted into a thread are hard to read, and no one will look at them unless they're genuinely interested.
I apologize for the inconvenience and thank you very much for your detailed guidance.

The issue we are experiencing is that when a VM with a GPU assigned is shut down, it becomes unable to start again. Currently, the only way to resolve this is to reboot the Proxmox host itself.

I tried the methods you suggested. For the first one:

echo "0000:0a:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null
echo "0000:0a:00.1" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null

I attempted this, but encountered the following error, and the VM still could not start:

error writing '1' to '/sys/bus/pci/devices/0000:0a:00.0/reset': Inappropriate ioctl for device
failed to reset PCI device '0000:0a:00.0', but trying to continue as not all devices need a reset
swtpm_setup: Not overwriting existing state file.
kvm: ../hw/pci/pci.c:1633: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed.
stopping swtpm instance (pid 3572) due to QEMU startup error
TASK ERROR: start failed: QEMU exited with code 1

The relevant logs are summarized in test1_log.txt.


For the second method, the GPU itself was not recognized and the VM could not start either:

TASK ERROR: no PCI device found for '0000:0a:00'

The logs for this attempt are in test2_log.txt.


Please let me know if there is anything you notice or if you need additional information from my side.
 

Attachments

Did you run the following during the second test?

Code:
echo 1 > /sys/bus/pci/rescan

I thought it would be added during the rescan, but it doesn't appear to be added in the log.

After executing `echo 1 > /sys/bus/pci/rescan`, verify whether the device exists.

Code:
lspci -ks 0000:0a:00.0
 
Last edited:
Did you run the following during the second test?

Code:
echo 1 > /sys/bus/pci/rescan

I thought it would be added during the rescan, but it doesn't appear to be added in the log.

After executing `echo 1 > /sys/bus/pci/rescan`, verify whether the device exists.

Code:
lspci -ks 0000:0a:00.0
I also ran `echo 1 > /sys/bus/pci/rescan` during the second test, but no logs appeared.

After removing the device, the GPU is not recognized again unless the Proxmox host itself is restarted.
 
Unfortunately, I don't have an environment that requires a vendor reset, so I'm afraid I can't help.

*At least RDNA2 or later is not required.

The following seem to cover similar issues, so I suggest trying them one by one.




 
Unfortunately, I don't have an environment that requires a vendor reset, so I'm afraid I can't help.

*At least RDNA2 or later is not required.

The following seem to cover similar issues, so I suggest trying them one by one.




Thank you!
After trying several things, I was able to get it working properly. I really appreciate your help.

For the benefit of others, here’s a summary of what I did based on this thread. Please note that I’m Japanese and used a translation tool, so the wording might be a little off. Also, there might still be mistakes, so use at your own risk.

1. Updated GRUB from:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt video=efifb:off"
to:
GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt pcie_aspm=off pci=noaer"
to remove outdated parameters and apply settings that improve PCIe stability on newer kernels.

2. Moved `vendor-reset` to the top of `/etc/modules`, then ran:
update-grub && update-initramfs -k all -u
and restart the host.

3. Changed BIOS IOMMU from Auto to Enabled.

4. After booting, before starting the VM, ran:
echo 'device_specific' > /sys/bus/pci/devices/0000:0a:00.0/reset_method
After that, I started the VM, shut it down for testing, and then restarted it. It booted up and worked normally.

I’m not 100% certain it’s fully stable yet, so I’ll monitor it for a few days. If no issues appear, I’ll mark this thread as resolved.

Thank you to everyone who took the time to share their thoughts again!
 
Last edited:
  • Like
Reactions: uzumo
That's great.

Documenting the steps that worked will be helpful for future users of the RX 5700.

The parameter pointed out by leesteken does not work in the current kernel, so it should be removed. Correcting the procedure post will benefit future users.

The parameters pointed out by leesteken do not function in the current kernel and should be removed. Correcting the procedure post will be beneficial for future users.

intel_iommu=on and amd_iommu=on are the default settings for kernel 6.8 and later.

nofb video=vesafb:off video=efifb:off video=simplefb:off are parameters from 5.15.64-1-pve and earlier. The current equivalent is initcall_blacklist=sysfb_init.

It should work even without those parameters since it's functioning with unnecessary ones present.
 
Last edited:
Is vendor-reset active? Do you see messages in the System logs about reset and NAVI10? Nevermid, your last post mentions that it's working.
BTW: amd_iommu=on does nothing and video=efifb:off does nothing on Proxmox.

EDIT: amd_iommu=on does not exist and is actually invalid: https://www.kernel.org/doc/html/v6.17/admin-guide/kernel-parameters.html
That's great.

Documenting the steps that worked will be helpful for future users of the RX 5700.

The parameter pointed out by leesteken does not work in the current kernel, so it should be removed. Correcting the procedure post will benefit future users.

The parameters pointed out by leesteken do not function in the current kernel and should be removed. Correcting the procedure post will be beneficial for future users.

intel_iommu=on and amd_iommu=on are the default settings for kernel 6.8 and later.

nofb video=vesafb:off video=efifb:off video=simplefb:off are parameters from 5.15.64-1-pve and earlier. The current equivalent is initcall_blacklist=sysfb_init.

It should work even without those parameters since it's functioning with unnecessary ones present.
I see! Thank you both for the additional advice!!
I'll remove the parameters you pointed out.