3 Minute Delay Starting VM with GPU Passthrough (vfio-pci reset issue)

damarges

Member
Jul 25, 2022
34
7
13
41
Bad Kreuznach, Germany
Hello everyone,

I'm having an issue with my GPU passthrough setup. The passthrough itself works perfectly, and the GPU is available in my Windows 11 VM. However, every time I start the VM, there is a very long delay of about 2.5 to 3 minutes.

After analyzing the system logs, it seems the problem is not the VM startup itself, but the shutdown process (post-stop hook). The GPU is not being cleanly returned to the host, which causes a long series of PCI resets on the next VM start.
System Information:

  • Proxmox VE Version: 9.0.11
  • Linux Kernel: 6.14.11-4-pve
  • Motherboard: Supermicro X10Dai
  • CPU: 2x Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
  • GPU: Nvidia RTX 4060 Ti (PCI IDs: 02:00.0 and 02:00.1)

VM Configuration (/etc/pve/qemu-server/100.conf):
Code:
affinity: 0-15
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0;ide2;ide0;net0
cores: 16
cpu: x86-64-v3
efidisk0: vm-data:100/vm-100-disk-0.qcow2,efitype=4m,pre-enrolled-keys=1,size=528K
hookscript: local:snippets/gpu-handoff.sh
hostpci0: 0000:02:00.0,pcie=1,x-vga=1
hostpci1: 0000:02:00.1,pcie=1
machine: pc-q35-10.0+pve1
memory: 65536
name: win11
net0: virtio=BC:24:11:F1:4A:55,bridge=vmbr0,firewall=1
numa: 1
ostype: win11
scsi0: vm-data:100/vm-100-disk-1.raw,cache=writeback,discard=on,size=700G
scsihw: virtio-scsi-single
smbios1: uuid=aa6c53ad-5443-497c-9cca-0d671ca8e9b8
sockets: 1
tpmstate0: vm-data:100/vm-100-disk-2.raw,size=4M,version=v2.0
vga: none
vmgenid: c22cf681-d3dd-4fab-826a-efdc0a8afc57

Current Hookscript (/var/lib/vz/snippets/gpu-handoff.sh):

Bash:
#!/bin/bash
set -euo pipefail
VMID="$1"
PHASE="$2"

GPU="0000:02:00.0"
AUDIO="0000:02:00.1"

log(){ logger -t gpu-handoff "VM $VMID: [$PHASE] $*"; }

case "$PHASE" in
  pre-start)
    log "Handing off GPU to VM..."
    fuser -k /dev/nvidia* 2>/dev/null || true
    sleep 1
    modprobe -r nvidia_drm nvidia_modeset nvidia_uvm nvidia 2>/dev/null || true
    echo "$GPU" > /sys/bus/pci/devices/$GPU/driver/unbind 2>/dev/null || true
    echo "$AUDIO" > /sys/bus/pci/devices/$AUDIO/driver/unbind 2>/dev/null || true
    modprobe vfio-pci
    echo "$GPU" > /sys/bus/pci/drivers/vfio-pci/bind
    echo "$AUDIO" > /sys/bus/pci/drivers/vfio-pci/bind
    log "GPU successfully bound to vfio-pci."
    ;;

  post-stop)
    log "Returning GPU to host..."
    echo "$GPU" > /sys/bus/pci/drivers/vfio-pci/unbind
    echo "$AUDIO" > /sys/bus/pci/drivers/vfio-pci/unbind
    sleep 1
    echo 1 > /sys/bus/pci/devices/$GPU/reset 2>/dev/null || true
    echo 1 > /sys/bus/pci/devices/$AUDIO/reset 2>/dev/null || true
    modprobe nvidia_drm
    modprobe nvidia_modeset
    modprobe nvidia_uvm
    modprobe nvidia
    modprobe snd_hda_intel
    sleep 2
    echo "$GPU" > /sys/bus/pci/drivers/nvidia/bind
    echo "$AUDIO" > /sys/bus/pci/drivers/snd_hda_intel/bind
    nvidia-smi >/dev/null 2>&1
    log "GPU successfully returned to host."
    ;;
esac

exit 0

When I shut down the VM, the post-stop script fails with "Device or resource busy":

Code:
# The VM is shut down...
Oct 30 10:55:02 pve gpu-handoff[1682882]: VM 100: [post-stop] GPU wird an den Host zurückgegeben...
# ...
Oct 30 10:55:07 pve qmeventd[1682871]: /var/lib/vz/snippets/gpu-handoff.sh: line 45: echo: write error: Device or resource busy
Oct 30 10:55:07 pve qmeventd[1682871]: hookscript error for 100 on post-stop: command '/var/lib/vz/snippets/gpu-handoff.sh 100 post-stop' failed: exit code 1

I assume because of this failure, the next VM start hangs for almost 3 minutes while the kernel repeatedly tries to reset the PCI device:

Code:
# The VM is started again...
Oct 30 10:55:30 pve kernel: # Network setup is quick
# --- HUGE GAP of ~2m 57s ---
Oct 30 10:58:27 pve kernel: vfio-pci 0000:02:00.0: resetting
Oct 30 10:58:27 pve kernel: vfio-pci 0000:02:00.0: reset done
Oct 30 10:58:27 pve kernel: vfio-pci 0000:02:00.0: resetting
Oct 30 10:58:27 pve kernel: vfio-pci 0000:02:00.1: resetting
Oct 30 10:58:27 pve kernel: vfio-pci 0000:02:00.0: reset done
Oct 30 10:58:27 pve kernel: vfio-pci 0000:02:00.1: reset done
Oct 30 10:58:28 pve qm[1683332]: VM 100 started with PID 1683370.

I believe if I can fix the post-stop script so it no longer fails, the startup delay will be gone.

Does anyone have experience with this kind of behavior, perhaps with this specific GPU? Is there a better way to structure the script to ensure a clean return of the GPU to the host?

Thank you in advance for any help
 
Update. With this script, start times are reduced from 150 seconds to 60 seconds. Not perfect but better:


Code:
#!/bin/bash
set -euo pipefail
VMID="$1"
PHASE="$2"

GPU="0000:02:00.0"
AUDIO="0000:02:00.1"

log(){ logger -t gpu-handoff "VM $VMID: [$PHASE] $*"; }

case "$PHASE" in
  pre-start)
    log "Phase: pre-start. Bereite GPU für VM vor."

    # Treiber vom Host lösen
    echo "$GPU" > /sys/bus/pci/devices/$GPU/driver/unbind || true
    echo "$AUDIO" > /sys/bus/pci/devices/$AUDIO/driver/unbind || true

    # Nvidia-Treiber entladen, um die GPU komplett freizugeben
    modprobe -r nvidia_drm nvidia_modeset nvidia_uvm nvidia || true

    # An vfio-pci binden
    echo "$GPU" > /sys/bus/pci/drivers/vfio-pci/bind || true
    echo "$AUDIO" > /sys/bus/pci/drivers/vfio-pci/bind || true

    log "GPU erfolgreich an vfio-pci gebunden."
    ;;

  post-stop)
    log "Phase: post-stop. Gebe GPU an Host zurück."

    # Von vfio-pci lösen (genau wie in Ihrem manuellen Script)
    echo "$GPU" > /sys/bus/pci/drivers/vfio-pci/unbind || true
    echo "$AUDIO" > /sys/bus/pci/drivers/vfio-pci/unbind || true

    # Alle notwendigen Host-Treiber laden (genau wie in Ihrem manuellen Script)
    modprobe nvidia nvidia_uvm nvidia_modeset nvidia_drm
    modprobe snd_hda_intel

    # An die Host-Treiber binden (genau wie in Ihrem manuellen Script)
    echo "$GPU" > /sys/bus/pci/drivers/nvidia/bind || true
    echo "$AUDIO" > /sys/bus/pci/drivers/snd_hda_intel/bind || true

    sleep 1 # Kurze Pause

    # GPU auf dem Host initialisieren (genau wie in Ihrem manuellen Script)
    nvidia-smi || true

    log "GPU erfolgreich an Host zurückgegeben."
    ;;
esac

exit 0

There is no vendor-reset like for AMD GPUs for NVIDIA is it?