[SOLVED] AMD GPU passthrough & reset hookscript that works through guest shutdowns

jce

Member
Jan 31, 2021
5
1
23
44
I used to have an NVidia GPU which I passed through to my guest VMs, this was working well. Recently I switched to an AMD RX 9070 and encountered the notorious reset bug for the first time. The reset hookscripts posted here and on other forums were helpful but only partially. If shutting down a guest from within the VM itself, the "post-stop" phase in the script doesn't trigger and so the GPU remains bound to vfio-pci. I searched the forum and found others mentioning this without any solution posted so I tweaked the script. My version checks to see if the GPU is already bound to vfio-pci during the "pre-start" phase, if yes then it unbinds from this and binds to amdgpu. The script then continues as normal from there, unbinding from amdgpu and binding to vfio-pci. I couldn't find those discussion threads where others raised this issue so I'm posting a new thread with my revision.

I can appreciate that the line grepping for 'vfio-pci' could be finessed to account for variations in output from `lspci` so please reply with suggestions.

Bash:
#!/usr/bin/bash

phase="$2"

echo "Phase is $phase"

if [ "$phase" == "pre-start" ]; then

    if [ `lspci -nnk | grep -A 2 03:00.0 | tail -1 | sed 's/.*: //'` == "vfio-pci" ]; then
        # Unbind gpu from vfio-pci
        echo "Bound to vfio-pci, unbinding"
        echo "0000:03:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null
        # Binding gpu back to amdgpu
        sleep 2
        echo "0000:03:00.0" > /sys/bus/pci/drivers/amdgpu/bind 2>/dev/null
        sleep 2
    fi

    # Unbind gpu from amdgpu
    echo "0000:03:00.0" > /sys/bus/pci/drivers/amdgpu/unbind 2>/dev/null

    sleep 2

    # Resize the GPU's BAR2 memory region (useful for PCI passthrough)

    echo 8 > /sys/bus/pci/devices/0000:03:00.0/resource2_resize

    sleep 2

elif [ "$phase" == "post-stop" ]; then

    # Unbind gpu from vfio-pci

    sleep 5

    echo "0000:03:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null

    sleep 2

    # Bind amdgpu

    echo "0000:03:00.0" > /sys/bus/pci/drivers/amdgpu/bind 2>/dev/null

    sleep 2

fi
 
In addition to the hookscript on Proxmox, you also need 'reset mod' inside the guest OS, that is the only way the the guest can reset the GPU prior to shutdown/reboot.

You've not stated if the guest is Windows or Linux but I found the following guide helpful:
https://github.com/isc30/ryzen-gpu-passthrough-proxmox/issues/131#issue-3266798285
I had the issue with both Windows and Linux. My revision also got it working smoothly with both.

Thanks for the link, I'll follow those steps should my revised hookscript stop working.
 
In many environments, amdgpu may bind even if post-stop doesn't run during VM shutdown.

In my environment, vfio-pci unbinds and amdgpu binds during shutdown from the VM, even without logs showing post-stop executed.

It's probably due to differences in settings or environment.

*Probably differences in kernel parameters, modules, or modprobe.d configuration
I haven't tested Linux, so if this only happens on Linux, there might be differences in how shutdown is handled.

*On the Proxmox host running Windows VMs with Ryzen 7 7700+ RX 9060 XT and Core Ultra 7 265K+ RX 9070 XT, the behavior during shutdown from the VM is as described above.

If it won't start next time without running pre-stop, your script is the best solution.

*Shutdown scripts and startup scripts are required for iGPU, but this configuration is not necessary for the rx90xx series.
 
Last edited: