[SOLVED] Power cycle Win11 gpu passthrough VM reboots 7.2 host

Kodey

Member
Oct 26, 2021
107
4
23
Everytime I power off or reboot the win11 vm, the whole host goes down.
The only work around I can find is to not passthrough the gpu.
The vm currently uses a hook script to unbind and rescan the pci device.

This all worked without problems until kernel 5.15 and then I had to introduce the hookscript.
Since then it was working fine until a recent kernel upgrade the vm became unstable.
After the upgrade to windows 11 with the latest updates, every power cycle kills the host machine.
The host is solid other than this.

Things I've tried bios wise: disable aspm, aer, pss
Things I've tried vm wise: cpus qemu64, EPYC, host; virtio-gpu (works but only when not using gpu passthrough). Compete reinstall and clone of windows
host Configs:
/etc/modprobe.d/vfio.conf options vfio-pci ids=1002:683f,1002:aab0 disable_vga=1
/etc/modprobe.d/pve-blacklist.conf: blacklist snd_hda_intel
/etc/kernel/cmdline root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet pcie_aspm=off amd_iommu=on iommu=pt initcall_blacklist=sysfb_init kvm.ignore_msrs=1 vfio-pci.ids=1002:683f,1002:aab0 default_hugepagesz=1G hugepagesz=1G hugepages=64

vm config minus some identifying data:
Code:
agent: 1
balloon: 0
bios: ovmf
boot: order=scsi0
cores: 4
cpu: host,flags=-spec-ctrl;-ssbd;+pdpe1gb;-hv-evmcs
efidisk0: zfs16Tr10:vm-113-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hookscript: local:snippets/gpu-hookscript.sh
hostpci0: 0000:0f:00,pcie=1,x-vga=1
ide0: isos:iso/virtio-win-0.1.225.iso,media=cdrom,size=519590K
machine: pc-q35-7.0
memory: 32768
meta: creation-qemu=7.0.0,ctime=1667111355
name: Windblows
net0: virtio======,bridge=vmbr0
numa: 1
ostype: win11
rng0: source=/dev/urandom
scsi0: local-zfs:vm-113-disk-0,discard=on,size=256G,ssd=1
scsi1: zfs16Tr10:vm-113-disk-2,cache=writeback,discard=on,size=512G
scsihw: virtio-scsi-pci
smbios1: uuid======
sockets: 1
tpmstate0: zfs16Tr10:vm-113-disk-1,size=4M,version=v2.0
usb0: host=5-1
usb1: host=5-2
vga: none
vmgenid: ======
vmstatestorage: local-zfs
hardware:
Asrock Tiachi X570 Razor mb.
AMD Ryzen 7 5800X cpu
VISIONTEK Cape Verde PRO [Radeon HD 7750/8740 / R7 250E] gpu

I need a way to proceed. Is there a way to fix this and are there any viable workarounds?
 
Last edited:
can you try to post the journal/syslog from when the host reboots? maybe there is a hint what's happening
also the content of the hookscript could be interesting
 
Hi and thanks for looking at this.
The journal log was empty, the os just went away and a new boot message begun.
The hookscript only contains those 2 lines mentioned earlier which came from another thread here, remove/rescan of the pci id.
https://forum.proxmox.com/threads/gpu-passthrough-issues-after-upgrade-to-7-2.109051/post-469855

Later in that same thread when kernel 5.15 was new, some discovered that blacklisting the amdgpu wasn't required and that worked for me.
I don't know if it was in a more recent kernel or a windows 11 update that changed this, but re-blacklisting it has solved the problem.

solution: /etc/modprobe.d/pve-blacklist.conf
blacklist snd_hda_intel
blacklist amdgpu
blacklist radeon

works for now.
I'll post something in the original thread for anyone else.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!