GPU passthrough switching between Win10 and macOS will case host reboot

proale

New Member
Dec 8, 2022
7
0
1
PC cofing:
- 12700k
- rx560

PVE:
- 7.3-3
- 5.15.74-1-pve, 5.19.17-1-pve

/etc/default/grub
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off,efifb:off"

/etc/modprobe.d/kvm.conf
Code:
options kvm ignore_msrs=1
options kvm ignore_msrs=Y report_ignored_msrs=N

/etc/modprobe.d/blacklist.conf
Code:
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist nvidiafb

/etc/modprobe.d/vfio.conf
Code:
options vfio-pci ids=1002:37ef,1002:aab0  disable_vga=1

/etc/kernel/cmdline
Code:
root=ZFS=rpool/ROOT/pve-1 boot=zfs intel_iommu=on iommu=pt

Steps:
- start Win10
- play game
- shutdown Win10
- start macOS
- shutdown macOS
- start Win10
- freezing in loading screen or maybe can logging in but can't detect gpu driver correctly in Win10
- worst case is host reboot (up to 90%)

I've tried 5.15 and 5.19, or change grub value, but no luck... If just login in and watch youtube, internet, not do any heavy GPU work, that will not reboot immediately.

If play game and shutdown, then start the same OS again, no any issue. finally I add 1 more gpu, 1 for win10 and 1 for macOS...
 
I feel you: I had a similar issue which I never resolved - only reboot worked.
Are you switching between Win and macOS or other OS?

Since it's an AMD GPU, and it sounds like the VM finds the GPU in an unworkable state, maybe vendor-reset can help? You'll need to activate it every host reboot.
Thanks for your suggestion, I will try it later.

PVE should be as stable as possible, that's what I expect, but this problem that will make the host reboot immediately... I also would like to know is it not recommended?
 
PVE should be as stable as possible, that's what I expect, but this problem that will make the host reboot immediately... I also would like to know is it not recommended?
You are "breaking" the virtualization by passing real hardware to a VM, which prevents dynamic memory management and moving VMs between hosts within a cluster. The hardware can, due to it's physical nature, interfere with other hardware. PCI(e) passthrough is a niche use case that hardware manufacturers mostly don't test nor design for. The Proxmox developers (and the Debian and Linux kernel developers) do give it a best effort but they cannot guarantee that it will always work, or work at all, given all possible combinations.
 
  • Like
Reactions: proale and Neobin
You are "breaking" the virtualization by passing real hardware to a VM, which prevents dynamic memory management and moving VMs between hosts within a cluster. The hardware can, due to it's physical nature, interfere with other hardware. PCI(e) passthrough is a niche use case that hardware manufacturers mostly don't test nor design for. The Proxmox developers (and the Debian and Linux kernel developers) do give it a best effort but they cannot guarantee that it will always work, or work at all, given all possible combinations.
I got it, thanks for your details explanation.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!