PCI Passthrough - VM Restart Causes Proxmox to Crash

jimibob

Member
Feb 7, 2022
11
0
6
74
Hi all,

Attempting to pass through a GPU to Windows 11 guest. Everything is working fine when booted, I can play games, encode, watch videos for hours and it's solid. However, every time I restart or shutdown the VM it crashes proxmox and I have to powercycle the server.

Useful info:
System is old and consumer grade. It's using an i7 6700k on a Z170 Mobo. The GPU is a new Intel ARC (A380).

I have followed the PCIe passthrough guide to the letter, and as stated everything works fine other than restarting/shutting down the VM.

Grub:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"
GRUB_CMDLINE_LINUX=""

Blacklist:
Code:
blacklist radeon
blacklist nouveau
blacklist nvidia
blacklist snd_hda_intel

VFIO:
Code:
options vfio-pci ids=8086:56a5,8086:4f92 disable_vga=1

lspci:
Code:
root@gpu:~# lspci
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 07)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller (rev 31)
00:14.2 Signal processing controller: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem (rev 31)
00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 (rev 31)
00:17.0 SATA controller: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] (rev 31)
00:1b.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #17 (rev f1)
00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 (rev f1)
00:1c.2 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #3 (rev f1)
00:1f.0 ISA bridge: Intel Corporation Z170 Chipset LPC/eSPI Controller (rev 31)
00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller (rev 31)
00:1f.3 Audio device: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller (rev 31)
00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus (rev 31)
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V (rev 31)
01:00.0 PCI bridge: Intel Corporation Device 4fa1 (rev 01)
02:01.0 PCI bridge: Intel Corporation Device 4fa4
02:04.0 PCI bridge: Intel Corporation Device 4fa4
03:00.0 VGA compatible controller: Intel Corporation Device 56a5 (rev 05)
04:00.0 Audio device: Intel Corporation Device 4f92
05:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
07:00.0 USB controller: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller

IOMMU Groups:
Code:
root@gpu:~# find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/7/devices/0000:00:1b.0
/sys/kernel/iommu_groups/5/devices/0000:00:16.0
/sys/kernel/iommu_groups/13/devices/0000:07:00.0
/sys/kernel/iommu_groups/3/devices/0000:00:08.0
/sys/kernel/iommu_groups/11/devices/0000:00:1f.6
/sys/kernel/iommu_groups/1/devices/0000:03:00.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/1/devices/0000:01:00.0
/sys/kernel/iommu_groups/1/devices/0000:04:00.0
/sys/kernel/iommu_groups/1/devices/0000:02:01.0
/sys/kernel/iommu_groups/1/devices/0000:02:04.0
/sys/kernel/iommu_groups/8/devices/0000:00:1c.0
/sys/kernel/iommu_groups/6/devices/0000:00:17.0
/sys/kernel/iommu_groups/4/devices/0000:00:14.2
/sys/kernel/iommu_groups/4/devices/0000:00:14.0
/sys/kernel/iommu_groups/12/devices/0000:05:00.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/10/devices/0000:00:1f.2
/sys/kernel/iommu_groups/10/devices/0000:00:1f.0
/sys/kernel/iommu_groups/10/devices/0000:00:1f.3
/sys/kernel/iommu_groups/10/devices/0000:00:1f.4
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
/sys/kernel/iommu_groups/9/devices/0000:00:1c.2

lspci details:
Code:
root@gpu:~# lspci -ks 03:00
03:00.0 VGA compatible controller: Intel Corporation Device 56a5 (rev 05)
        Subsystem: ASRock Incorporation Device 6004
        Kernel driver in use: vfio-pci

root@gpu:~# lspci -ks 04:00
04:00.0 Audio device: Intel Corporation Device 4f92
        Subsystem: ASRock Incorporation Device 6004
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

VM Config:
Code:
balloon: 0
bios: ovmf
boot: order=ide0;ide2;net0
cores: 6
cpu: host
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:03:00,pcie=1
hostpci1: 0000:04:00,pcie=1
ide0: local-lvm:vm-100-disk-1,size=100G
ide2: local:iso/Win11_22H2_EnglishInternational_x64v1.iso,media=cdrom,size=5418024K
machine: pc-q35-7.1
memory: 8192
meta: creation-qemu=7.1.0,ctime=1670190231
name: minecraft
net0: e1000=2A:81:E4:67:2B:61,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsihw: virtio-scsi-single
smbios1: uuid=1e7be4dd-1929-4905-90d3-eebba9351fbd
sockets: 1
tpmstate0: local-lvm:vm-100-disk-2,size=4M,version=v2.0
vga: std
vmgenid: 3edce8aa-c954-472e-a559-49d4c89167d6

Any help would be greatly appreciated
 
This typically happens when a passed through device does not reset properly, like older AMD GPUs without vendor-reset. With early binding to vfio-pci and not touching the device, it works once but stopping and restarting the VM fails/locks the system (a warm reboot from within the VM usually works). Does that match your experience? I don't know of any work-arounds for Intel A380, sorry.
 
This typically happens when a passed through device does not reset properly, like older AMD GPUs without vendor-reset. With early binding to vfio-pci and not touching the device, it works once but stopping and restarting the VM fails/locks the system (a warm reboot from within the VM usually works). Does that match your experience? I don't know of any work-arounds for Intel A380, sorry.
That makes sense. I'm unable to reboot/shutdown successfully from either the Proxmox GUI or inside the VM. All of those options crash the Proxmox server.
 
I have the same. Proxmox with several vms on one I installed an Nvidia quadro GPU. And I added it through "add hardware" to a specific machine. When I restart the VM my host was offline with all the other vms as well. Major bug
 
  • Like
Reactions: jimibob
I have the same. Proxmox with several vms on one I installed an Nvidia quadro GPU. And I added it through "add hardware" to a specific machine. When I restart the VM my host was offline with all the other vms as well. Major bug
Same here. Starting the VM with the passthrough Nvidia crashes the entire Proxmox.

root@pve1:~# for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done IOMMU group 0 00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:191f] (rev 07) IOMMU group 1 00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07) IOMMU group 1 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208B [GeForce GT 710] [10de:128b] (rev a1) IOMMU group 1 01:00.1 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1) IOMMU group 2 00:02.0 Display controller [0380]: Intel Corporation HD Graphics 530 [8086:1912] (rev 06) IOMMU group 3 00:14.0 USB controller [0c03]: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller [8086:a2af] IOMMU group 3 00:14.2 Signal processing controller [1180]: Intel Corporation 200 Series PCH Thermal Subsystem [8086:a2b1] IOMMU group 4 00:15.0 Signal processing controller [1180]: Intel Corporation 200 Series PCH Serial IO I2C Controller #0 [8086:a2e0] IOMMU group 5 00:16.0 Communication controller [0780]: Intel Corporation 200 Series PCH CSME HECI #1 [8086:a2ba] IOMMU group 6 00:17.0 SATA controller [0106]: Intel Corporation 200 Series PCH SATA controller [AHCI mode] [8086:a282] IOMMU group 7 00:1b.0 PCI bridge [0604]: Intel Corporation 200 Series PCH PCI Express Root Port #17 [8086:a2e7] (rev f0) IOMMU group 8 00:1f.0 ISA bridge [0601]: Intel Corporation 200 Series PCH LPC Controller (Q270) [8086:a2c6] IOMMU group 8 00:1f.2 Memory controller [0580]: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller [8086:a2a1] IOMMU group 8 00:1f.3 Audio device [0403]: Intel Corporation 200 Series PCH HD Audio [8086:a2f0] IOMMU group 8 00:1f.4 SMBus [0c05]: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller [8086:a2a3] IOMMU group 8 00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (5) I219-LM [8086:15e3] IOMMU group 9 02:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. RTS5763DL NVMe SSD Controller [10ec:5762] (rev 01)

The only other thing in the IOMMU group is the PCI bridge. The root/boot device is a M.2 PCIe SSD on the motherboard. Could that be the issue?
 
The only other thing in the IOMMU group is the PCI bridge. The root/boot device is a M.2 PCIe SSD on the motherboard. Could that be the issue?
Unlikely. Make sure the system boots with the integrated graphics (connect an working display to it, to make sure). Make sure to early bind the passthrough GPU to vfio-pci. And/or try this work-around when doing passthrough of the GPU that is used during boot.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!