I used to have an NVidia GPU which I passed through to my guest VMs, this was working well. Recently I switched to an AMD RX 9070 and encountered the notorious reset bug for the first time. The reset hookscripts posted here and on other forums were helpful but only partially. If shutting down a guest from within the VM itself, the "post-stop" phase in the script doesn't trigger and so the GPU remains bound to vfio-pci. I searched the forum and found others mentioning this without any solution posted so I tweaked the script. My version checks to see if the GPU is already bound to vfio-pci during the "pre-start" phase, if yes then it unbinds from this and binds to amdgpu. The script then continues as normal from there, unbinding from amdgpu and binding to vfio-pci. I couldn't find those discussion threads where others raised this issue so I'm posting a new thread with my revision.
I can appreciate that the line grepping for 'vfio-pci' could be finessed to account for variations in output from `lspci` so please reply with suggestions.
I can appreciate that the line grepping for 'vfio-pci' could be finessed to account for variations in output from `lspci` so please reply with suggestions.
Bash:
#!/usr/bin/bash
phase="$2"
echo "Phase is $phase"
if [ "$phase" == "pre-start" ]; then
if [ `lspci -nnk | grep -A 2 03:00.0 | tail -1 | sed 's/.*: //'` == "vfio-pci" ]; then
# Unbind gpu from vfio-pci
echo "Bound to vfio-pci, unbinding"
echo "0000:03:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null
# Binding gpu back to amdgpu
sleep 2
echo "0000:03:00.0" > /sys/bus/pci/drivers/amdgpu/bind 2>/dev/null
sleep 2
fi
# Unbind gpu from amdgpu
echo "0000:03:00.0" > /sys/bus/pci/drivers/amdgpu/unbind 2>/dev/null
sleep 2
# Resize the GPU's BAR2 memory region (useful for PCI passthrough)
echo 8 > /sys/bus/pci/devices/0000:03:00.0/resource2_resize
sleep 2
elif [ "$phase" == "post-stop" ]; then
# Unbind gpu from vfio-pci
sleep 5
echo "0000:03:00.0" > /sys/bus/pci/drivers/vfio-pci/unbind 2>/dev/null
sleep 2
# Bind amdgpu
echo "0000:03:00.0" > /sys/bus/pci/drivers/amdgpu/bind 2>/dev/null
sleep 2
fi