AMD passthrough to (Linux) VM fails with maxed out memory and no video output

goldsteal

Active Member
Nov 3, 2018
12
2
43
30
Hello world!

I am trying my hand at a slightly unusual set-up of Proxmox.

For my home desktop (Intel X99 plattform) I want to run Proxmox headless and instead give the main graphics card (AMD R9 380x Nitro in Slot 1) to a Linux VM and the second graphics card (NVIDIA GeForce GTX 980Ti) to a Windows 10 VM. Is this setup even possible?

When I try to pass through the AMD card to the Linux distribution the screen stays black and the ram get of the VM maxed out.
if I try to restart I get other errors – presumably because the system was not shutdown correctly and henceforth the graphics card was not released from the binding to the VM.

Is there anyone who can help or has tried something similar before?

Greetings

goldsteal
 
It sounds like the AMD GPU is not resetting properly (after being used by the Proxmox host). You might want to look into vendor-reset.
Is the passthrough of the NVidia GPU working fine? That one might be easier because it is not used by the BIOS and Proxmox host during boot.
Please make sure to reboot before each attempt to passthrough a GPU, because some might not work after a restart of the VM sometimes and then you won't know which problem you are trying to troubleshoot.
I think the X99 platform has enough PCIe lanes from the CPU to allow for multiple devices passthroughs. Since you're not complaining about system freezes/crashed, I assume that your IOMMU groups look good for multiple passthrough?
 
and the ram get of the VM maxed out
Also keep in mind that your VMs will always use the maximum RAM if you use GPU passthrough (RAM needs to be pinned because the GPU can use DMA to directly access any part of the VMs RAM). So make sure you don't overprovision your RAM so OOM won'T kick in.
 
Also keep in mind that your VMs will always use the maximum RAM if you use GPU passthrough (RAM needs to be pinned because the GPU can use DMA to directly access any part of the VMs RAM). So make sure you don't overprovision your RAM so OOM won'T kick in.
That a great point thank you so much. I probably gave the VMs a little bit too much of the host systems 64 GiB. Until everything works I will scale that down in order to avoid any further errors.
 
It sounds like the AMD GPU is not resetting properly (after being used by the Proxmox host). You might want to look into vendor-reset.

Thanks for the hint! I will do that as soon as I get home. Sounds pretty promising.
Update: Damn. I just looked into it and I assume that this doesn't solve my issue cause the card is not supported, right? Any other workarrounds or anyone who got it working with a 380x?

Is the passthrough of the NVidia GPU working fine? That one might be easier because it is not used by the BIOS and Proxmox host during boot.

Well I haven't exactly made it work reliably with the VM start/stops but if I give it to my Linux VM and reboot the node it indeed works as intended. So you are a 100% right on that one...

Please make sure to reboot before each attempt to passthrough a GPU, because some might not work after a restart of the VM sometimes and then you won't know which problem you are trying to troubleshoot.
Yeah. That is something I learned the hard way here :D

I think the X99 platform has enough PCIe lanes from the CPU to allow for multiple devices passthroughs. Since you're not complaining about system freezes/crashed, I assume that your IOMMU groups look good for multiple passthrough?
Well, as far as I can tell they look good. What would the limitations usually be? Since I haven't gotten the Linux one fully working with the AMD card I haven't yet set up the Windows one completely since I assumed that would have way more inherent errors (EFI stuff, performance, VirtIO drivers, performance fixes and the following forays into CPU pinning that I probably plan to do...)

Bash:
#!/bin/bash
shopt -s nullglob
for g in `find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V`; do
    echo "IOMMU Group ${g##*/}:"
    for d in $g/devices/*; do
        echo -e "\t$(lspci -nns ${d##*/})"
    done;
done;

IOMMU Group 0:
ff:0b.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring [8086:2f81] (rev 02)
ff:0b.1 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring [8086:2f36] (rev 02)
ff:0b.2 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 R3 QPI Link 0 & 1 Monitoring [8086:2f37] (rev 02)
IOMMU Group 1:
ff:0c.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe0] (rev 02)
ff:0c.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe1] (rev 02)
ff:0c.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe2] (rev 02)
ff:0c.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe3] (rev 02)
ff:0c.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe4] (rev 02)
ff:0c.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Unicast Registers [8086:2fe5] (rev 02)
IOMMU Group 2:
ff:0f.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent [8086:2ff8] (rev 02)
ff:0f.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Buffered Ring Agent [8086:2ff9] (rev 02)
ff:0f.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers [8086:2ffc] (rev 02)
ff:0f.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers [8086:2ffd] (rev 02)
ff:0f.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 System Address Decoder & Broadcast Registers [8086:2ffe] (rev 02)
IOMMU Group 3:
ff:10.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface [8086:2f1d] (rev 02)
ff:10.1 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCIe Ring Interface [8086:2f34] (rev 02)
ff:10.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers [8086:2f1e] (rev 02)
ff:10.6 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers [8086:2f7d] (rev 02)
ff:10.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Scratchpad & Semaphore Registers [8086:2f1f] (rev 02)
IOMMU Group 4:
ff:12.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 [8086:2fa0] (rev 02)
ff:12.1 Performance counters [1101]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Home Agent 0 [8086:2f30] (rev 02)
IOMMU Group 5:
ff:13.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers [8086... (rev 02)
ff:13.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Target Address, Thermal & RAS Registers [8086... (rev 02)
ff:13.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder [8086:2faa] (rev 02)
ff:13.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder [8086:2fab] (rev 02)
ff:13.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder [8086:2fac] (rev 02)
ff:13.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel Target Address Decoder [8086:2fad] (rev 02)
ff:13.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 0/1 Broadcast [8086:2fae] (rev 02)
ff:13.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast [8086:2faf] (rev 02)
IOMMU Group 6:
ff:14.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 Thermal Control [8086:2fb0] (rev 02)
ff:14.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 Thermal Control [8086:2fb1] (rev 02)
ff:14.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 0 ERROR Registers [8086:2fb2] (rev 02)
ff:14.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 1 ERROR Registers [8086:2fb3] (rev 02)
ff:14.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 [8086:2fbc] (rev 02)
ff:14.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 [8086:2fbd] (rev 02)
ff:14.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 [8086:2fbe] (rev 02)
ff:14.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 0 & 1 [8086:2fbf] (rev 02)
IOMMU Group 7:
ff:15.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 Thermal Control [8086:2fb4] (rev 02)
ff:15.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 Thermal Control [8086:2fb5] (rev 02)
ff:15.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 2 ERROR Registers [8086:2fb6] (rev 02)
ff:15.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 0 Channel 3 ERROR Registers [8086:2fb7] (rev 02)
IOMMU Group 8:
ff:16.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Target Address, Thermal & RAS Registers [8086... (rev 02)
ff:16.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Channel 2/3 Broadcast [8086:2f6e] (rev 02)
ff:16.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO Global Broadcast [8086:2f6f] (rev 02)
IOMMU Group 9:
ff:17.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Integrated Memory Controller 1 Channel 0 Thermal Control [8086:2fd0] (rev 02)
ff:17.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 [8086:2fb8] (rev 02)
ff:17.5 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 [8086:2fb9] (rev 02)
ff:17.6 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 [8086:2fba] (rev 02)
ff:17.7 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DDRIO (VMSE) 2 & 3 [8086:2fbb] (rev 02)
IOMMU Group 10:
ff:1e.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit [8086:2f98] (rev 02)
ff:1e.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit [8086:2f99] (rev 02)
ff:1e.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit [8086:2f9a] (rev 02)
ff:1e.3 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit [8086:2fc0] (rev 02)
ff:1e.4 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Power Control Unit [8086:2f9c] (rev 02)
IOMMU Group 11:
ff:1f.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU [8086:2f88] (rev 02)
ff:1f.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 VCU [8086:2f8a] (rev 02)
IOMMU Group 12:
00:00.0 Host bridge [0600]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 DMI2 [8086:2f00] (rev 02)
IOMMU Group 13:
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 1 [8086:2f02] (rev 02)
IOMMU Group 14:
00:01.1 PCI bridge [0604]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 1 [8086:2f03] (rev 02)
IOMMU Group 15:
00:02.0 PCI bridge [0604]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 [8086:2f04] (rev 02)
IOMMU Group 16:
00:02.3 PCI bridge [0604]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 [8086:2f07] (rev 02)
IOMMU Group 17:
00:03.0 PCI bridge [0604]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 [8086:2f08] (rev 02)
IOMMU Group 18:
00:05.0 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Address Map, VTd_Misc, System Management [8086:2f28] (rev 02)
IOMMU Group 19:
00:05.1 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 Hot Plug [8086:2f29] (rev 02)
IOMMU Group 20:
00:05.2 System peripheral [0880]: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 RAS, Control Status and Global Errors [8086:2f2a] (rev 02)
IOMMU Group 21:
00:11.0 Unassigned class [ff00]: Intel Corporation C610/X99 series chipset SPSR [8086:8d7c] (rev 05)
IOMMU Group 22:
00:14.0 USB controller [0c03]: Intel Corporation C610/X99 series chipset USB xHCI Host Controller [8086:8d31] (rev 05)
IOMMU Group 23:
00:16.0 Communication controller [0780]: Intel Corporation C610/X99 series chipset MEI Controller #1 [8086:8d3a] (rev 05)
IOMMU Group 24:
00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I218-V [8086:15a1] (rev 05)
IOMMU Group 25:
00:1b.0 Audio device [0403]: Intel Corporation C610/X99 series chipset HD Audio Controller [8086:8d20] (rev 05)
IOMMU Group 26:
00:1c.0 PCI bridge [0604]: Intel Corporation C610/X99 series chipset PCI Express Root Port #1 [8086:8d10] (rev d5)
IOMMU Group 27:
00:1f.0 ISA bridge [0601]: Intel Corporation C610/X99 series chipset LPC Controller [8086:8d47] (rev 05)
00:1f.2 SATA controller [0106]: Intel Corporation C610/X99 series chipset 6-Port SATA Controller [AHCI mode] [8086:8d02] (rev 05)
00:1f.3 SMBus [0c05]: Intel Corporation C610/X99 series chipset SMBus Controller [8086:8d22] (rev 05)
IOMMU Group 28:
03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1)
IOMMU Group 29:
03:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
IOMMU Group 30:
05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Tonga XT / Amethyst XT [Radeon R9 380X / R9 M295X] [1002:6938] (rev f1)
IOMMU Group 31:
05:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Tonga HDMI Audio [Radeon R9 285/380] [1002:aad8]
IOMMU Group 32:
06:00.0 Non-Volatile memory controller [0108]: Kingston Technology Company, Inc. A2000 NVMe SSD [2646:2263] (rev 03)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!