[SOLVED] AMD GPU passthrough & reset hookscript that works through guest shutdowns

jce

Member
Jan 31, 2021
5
1
23
44
I used to have an NVidia GPU which I passed through to my guest VMs, this was working well. Recently I switched to an AMD RX 9070 and encountered the notorious reset bug for the first time. The reset hookscripts posted here and on other forums were helpful but only partially. If shutting down a guest from within the VM itself, the "post-stop" phase in the script doesn't trigger and so the GPU remains bound to vfio-pci. I searched the forum and found others mentioning this without any solution posted so I tweaked the script. My version checks to see if the GPU is already bound to vfio-pci during the "pre-start" phase, if yes then it unbinds from this and binds to amdgpu. The script then continues as normal from there, unbinding from amdgpu and binding to vfio-pci. I couldn't find those discussion threads where others raised this issue so I'm posting a new thread with my revision.

I can appreciate that the line grepping for 'vfio-pci' could be finessed to account for variations in output from `lspci` so please reply with suggestions.

Bash:
#!/usr/bin/bash

phase="$2"

echo "Phase is $phase"

if [ "$phase" == "pre-start" ]; then

    if [ `lspci -nnk | grep -A 2 03:00.0 | tail -1 | sed 's/.*: //'` == "vfio-pci" ]; then
        # Unbind gpu from vfio-pci
        echo "Bound to vfio-pci, unbinding"
        echo "0000:03:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null
        # Binding gpu back to amdgpu
        sleep 2
        echo "0000:03:00.0" > /sys/bus/pci/drivers/amdgpu/bind 2>/dev/null
        sleep 2
    fi

    # Unbind gpu from amdgpu
    echo "0000:03:00.0" > /sys/bus/pci/drivers/amdgpu/unbind 2>/dev/null

    sleep 2

    # Resize the GPU's BAR2 memory region (useful for PCI passthrough)

    echo 8 > /sys/bus/pci/devices/0000:03:00.0/resource2_resize

    sleep 2

elif [ "$phase" == "post-stop" ]; then

    # Unbind gpu from vfio-pci

    sleep 5

    echo "0000:03:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null

    sleep 2

    # Bind amdgpu

    echo "0000:03:00.0" > /sys/bus/pci/drivers/amdgpu/bind 2>/dev/null

    sleep 2

fi
 
In addition to the hookscript on Proxmox, you also need 'reset mod' inside the guest OS, that is the only way the the guest can reset the GPU prior to shutdown/reboot.

You've not stated if the guest is Windows or Linux but I found the following guide helpful:
https://github.com/isc30/ryzen-gpu-passthrough-proxmox/issues/131#issue-3266798285
I had the issue with both Windows and Linux. My revision also got it working smoothly with both.

Thanks for the link, I'll follow those steps should my revised hookscript stop working.
 
I just tested it and confirmed it was wrong, so I've corrected it.

Since pre-stop is not executed, it is maintained by vfio-pci, but it seems correct to say that “even if bound to vfio-pci, the VM boots normally, so no one cares.”

On my Proxmox VE running Windows 11 VMs, both the Ryzen 7 7700+ RX 9060 XT and Core Ultra 7 265K+ RX 9070 XT setups booted successfully next time without needing to execute pre-stop, though they didn't return to the console.

*As long as amdgpu maintains its bind and does not interfere with vfip-pci's bind during VM startup, there should be no issues.
The VM requires vfio-pci, but if amdgpu is detected within the script, it unbinds it. If it is vfio-pci, the unbind command is not executed, and vfio-pci is used as-is; therefore, this behavior is considered correct.

If it won't start next time without running pre-stop, your script is the best solution.

*Shutdown scripts and startup scripts are required for iGPU, but this configuration is not necessary for the rx90xx series.
 
Last edited: