Proxmox freeze when launching gpu passthroughed vm

Redover

New Member
Feb 12, 2022
4
1
3
34
Hi, I recently tried gpu passthrough another time, but when I launch the VM via the web interface it doesn't start and proxmox freezes, I need to press the reset switch to get proxmox running again. I have a ryzen 5 3600 in my server.



I'm trying to passthrough a gtx 1060 and rtx 3070



Here's my configuration :

/etc/default/grub

Code:
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on"
GRUB_CMDLINE_LINUX=""

/etc/modprobe.d/vfio.conf

options vfio-pci ids=10de:2484,10de:228b,10de:1c03,10de:10f1 disable_vga=1

VM :

Code:
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
bios: ovmf
boot: order=ide0;ide2;net0
cores: 12
cpu: host,hidden=1,flags=+pcid
efidisk0: local-lvm:vm-104-disk-1,efitype=4m,pre-enrolled-keys=1,size=4M
hostpci0: 0000:03:00,pcie=1
ide0: local-lvm:vm-104-disk-0,size=128G
ide2: local:iso/Windows_2004.iso,media=cdrom
machine: q35
memory: 8192
meta: creation-qemu=6.1.0,ctime=1644670405
name: windows-mining
net0: e1000=D6:32:23:13:0F:E1,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsihw: virtio-scsi-single
smbios1: uuid=6f068804-a177-4f56-992e-f25cbb9c2d90
sockets: 1
vmgenid: 3badb7ea-66d2-4f95-9963-f4976c55082e

Thanks!
 
Can you please show your IOMMU groups using this command? for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done
Probably the GPU is in the same group as your SATA, network and/or USB controllers. They are often all in the big motherboard/chipset-group when using Ryzen. You cannot safely/securely share devices in the same group between VMs and/or the host. Therefore the host loses all devices in the group when you do passthrough.
Maybe you can tell us what motherboard you are using exactly? Maybe I can guess which PCIe slots will work, and which don't.
 
Can you please show your IOMMU groups using this command? for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done
Probably the GPU is in the same group as your SATA, network and/or USB controllers. They are often all in the big motherboard/chipset-group when using Ryzen. You cannot safely/securely share devices in the same group between VMs and/or the host. Therefore the host loses all devices in the group when you do passthrough.
Maybe you can tell us what motherboard you are using exactly? Maybe I can guess which PCIe slots will work, and which don't.
Thanks for replying!

My mobo is a Gigabyte A520m.

I put the output of the iommu group command below:

Code:
root@redover-proxmox:~# for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done
IOMMU group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 0 00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 0 01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ec]
IOMMU group 0 01:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43eb]
IOMMU group 0 01:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43e9]
IOMMU group 0 02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU group 0 02:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU group 0 02:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU group 0 02:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43ea]
IOMMU group 0 03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA104 [GeForce RTX 3070] [10de:2484] (rev a1)
IOMMU group 0 03:00.1 Audio device [0403]: NVIDIA Corporation GA104 High Definition Audio Controller [10de:228b] (rev a1)
IOMMU group 0 04:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] [10de:1c03] (rev a1)
IOMMU group 0 04:00.1 Audio device [0403]: NVIDIA Corporation GP106 High Definition Audio Controller [10de:10f1] (rev a1)
IOMMU group 0 06:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 16)
IOMMU group 10 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 0 [1022:1440]
IOMMU group 10 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 1 [1022:1441]
IOMMU group 10 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 2 [1022:1442]
IOMMU group 10 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 3 [1022:1443]
IOMMU group 10 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 4 [1022:1444]
IOMMU group 10 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 5 [1022:1445]
IOMMU group 10 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 6 [1022:1446]
IOMMU group 10 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 7 [1022:1447]
IOMMU group 11 08:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
IOMMU group 12 09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
IOMMU group 13 09:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
IOMMU group 14 09:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
IOMMU group 15 09:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller [1022:1487]
IOMMU group 1 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 2 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 2 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 2 07:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208B [GeForce GT 710] [10de:128b] (rev a1)
IOMMU group 2 07:00.1 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1)
IOMMU group 3 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 4 00:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 5 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 6 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 7 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 8 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 9 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
IOMMU group 9 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
 
As you can see from your post: the IOMMU group 0 contains the RTX 3070, GTX 1060 and the network and SATA controllers and a USB controller. If you do passthrough of one of those devices, the Proxmox host loses its network connection, SATA drives and several USB port. That makes sure you cannot reach it via a browser of SSH, and because it cannot write log files, it will crash soon after.

According to the specifications, only the x16 PCIe slot and the M.2 slot come from the CPU and are possibly in an IOMMU group of their own (bridges don't count). But I assume you use the GT 710, which you can passthrough in principle, for booting the Proxmox host, which makes it more difficult to use if for passthrough.

This AM4 motherboard and all others, except those that use X570, are not very good for passthrough. If you don't care about security and isolation between VMs and the Proxmox host, you can try using the pcie_acs_override=downstream,multifunction kernel parameter to break group 0 (at your own risk).
Note that those GPUs will be severely starved for PCIe bandwidth anyway. I don't see a NVMe drive, so you could try a M.2 to PCIe x4 converter...
 
  • Like
Reactions: TorqueWrench
As you can see from your post: the IOMMU group 0 contains the RTX 3070, GTX 1060 and the network and SATA controllers and a USB controller. If you do passthrough of one of those devices, the Proxmox host loses its network connection, SATA drives and several USB port. That makes sure you cannot reach it via a browser of SSH, and because it cannot write log files, it will crash soon after.

According to the specifications, only the x16 PCIe slot and the M.2 slot come from the CPU and are possibly in an IOMMU group of their own (bridges don't count). But I assume you use the GT 710, which you can passthrough in principle, for booting the Proxmox host, which makes it more difficult to use if for passthrough.

This AM4 motherboard and all others, except those that use X570, are not very good for passthrough. If you don't care about security and isolation between VMs and the Proxmox host, you can try using the pcie_acs_override=downstream,multifunction kernel parameter to break group 0 (at your own risk).
Note that those GPUs will be severely starved for PCIe bandwidth anyway. I don't see a NVMe drive, so you could try a M.2 to PCIe x4 converter...
Thank you I didn't knew I couldn't passthrough all pcie ports. I'll try to switch the GT 710 and RTX 3070 to passthrough the 3070 to a vm and keeping the GT710 for console.
 
Thank you I didn't knew I couldn't passthrough all pcie ports. I'll try to switch the GT 710 and RTX 3070 to passthrough the 3070 to a vm and keeping the GT710 for console.
Unless you can select in the motherboard BIOS which GPU should be used to boot the system, you might run into reset issues. That is that the GPU is passed through but not functional inside the VM. If that's the case, you'll need to make sure that the GPU is touched as little as possible by blacklisting drivers and other kernel parameters that make the system effectively headless. Search for single GPU passthrough for more information about this. As I said, passthrough of the boot GPU is much more involved.
Another thing: you probably don't need args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off' if you enable Primary GPU (or ,x-vga=1) for the passthrough.
 
Unless you can select in the motherboard BIOS which GPU should be used to boot the system, you might run into reset issues. That is that the GPU is passed through but not functional inside the VM. If that's the case, you'll need to make sure that the GPU is touched as little as possible by blacklisting drivers and other kernel parameters that make the system effectively headless. Search for single GPU passthrough for more information about this. As I said, passthrough of the boot GPU is much more involved.
Another thing: you probably don't need args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off' if you enable Primary GPU (or ,x-vga=1) for the passthrough.
Thanks, I'll remove args and enable primary GPU. Btw it works now! Thank you! I already blacklisted the drivers in the host. Will I be able to see console view on the web interface with a headless proxmox?
 
  • Like
Reactions: leesteken
Thanks, I'll remove args and enable primary GPU. Btw it works now! Thank you! I already blacklisted the drivers in the host. Will I be able to see console view on the web interface with a headless proxmox?
Sure, but if you ever lose the network connection, you'll won't have a text console to login on the system itself.
 
  • Like
Reactions: Redover

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!