GPU passthrough tutorial/reference

May 12, 2018
8
1
1
Hello, everybody,
I've worked my way through the tutorial and almost everything works fine. Thanks a lot for that!
However, I have the following problem:
When I restart the VM with Passthrough, my whole system freezes and I have to restart everything.
I hope you can help me with the diagnosis and give me tips on how to fix it.

Thanks a lot!
Greetings hewu

Translated with www.DeepL.com/Translator
 

dcsapak

Proxmox Staff Member
Staff member
Feb 1, 2016
3,798
345
83
31
Vienna
When I restart the VM with Passthrough, my whole system freezes and I have to restart everything.
I hope you can help me with the diagnosis and give me tips on how to fix it.
what kind of card is it ? some amd vega cards cannot be reset properly
any output in dmesg/syslog ?
 

dcsapak

Proxmox Staff Member
Staff member
Feb 1, 2016
3,798
345
83
31
Vienna
Is there a Workaround?
the only workaround i know of is to eject the driver in the guest of the card before rebooting/shutdown
(e.g. rmmod for linux guest)

it seems there is a kernel fix for this in 4.19, but i do not believe we will backport this
 

Jeffrey Roberts

New Member
Jan 11, 2019
9
0
1
37
I followed the instructions, but I do not see

Kernel driver in use: vfio-pci

when I execute

lspci -v

Any ideas on what I might be doing wrong?

...

Also, I see

Kernel modules: nvidiafb, nouveau

I added nouveau to the blacklist, should nouveau still be listed in the kernel modules?

Thank you
 

davu

New Member
Feb 6, 2019
1
0
1
23
Hello, I am having troubles with my GPU Passthrough. I am using an RX570. I have two VM's in which I have the GPU passed through (only one is on at a time).

This method has worked flawlessly with my Windows 10 VM. And it somewhat works with my Ubuntu VM.

The GPU will passthrough to the Ubuntu VM once, and only once. Whenever I shut it down, I cannot start it back up again until I restart the whole proxmox node. The error I get when starting the Ubuntu VM after shutting it down is:

Code:
TASK ERROR: start failed: command ' (a bunch of parameters) ' failed: got timeout
it seems as if the GPU is getting locked after I shutdown the VM (I shutdown the vm from inside the vm, but clicking the power button and shutting down)
 

koburr

New Member
Feb 16, 2019
4
0
1
30
Ah yes here is my config for GPU passthrough on the PowerEdge R710 I with Xeon E55xx CPUs using two GTX 750 Ti's (One on this machine and one on another:

Code:
agent: 1
balloon: 0
bios: ovmf
bootdisk: virtio0
cores: 3
cpu: host,hidden=1
hostpci0: 04:00,x-vga=1,pcie=1
hotplug: 0
ide2: none,media=cdrom
machine: q35
memory: 7168
name: WIN10X64PVEXXXX
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0
numa: 1
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
sockets: 2
virtio0: local-lvm:vm-102-disk-0,cache=writeback,size=320G
vmgenid: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
It would boot up the machine and crash the whole server with pcie without numa: 1 but without numa would run regular pci.

Instructions:

Hardware:
1. Solder in power plugs onto the power supply connector on the motherboard.
2. Using a razor knife and hot air soldering station heat up the PCI slit in the back (Using 400 degree heat on medium air or whatever works) and cut out the back of the PCI slot to fit cards.

Proxmox:
Edit grub command line w unsafe interrupts:
Code:
# nano /etc/default/grub
change:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1 video=efifb:eek:ff"
add to /etc/modules:
Code:
# echo "vfio" > /etc/modules
# echo "vfio_iommu_type1" > /etc/modules
# echo "vfio_pci" > /etc/modules
# echo "vfio_virqfd" > /etc/modules
Allow unsafe interrupts in vfio:
Code:
echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
Add dev id (either 04:00 or 06:00 on the R710):
Code:
lspci -n -s 04:00
04:00.0 0300: 10de:1380 (rev a2)
04:00.1 0403: 10de:0fbc (rev a1)

echo "options vfio-pci ids=10de:1380,10de:0fbc disable_vga=1" > /etc/modprobe.d/vfio.conf
blacklist drivers:
Code:
echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf
Make a new VM for windows 10 as usual: virtio lvm, virtio networking, cpu: host, hidden=1, machine: q35, bios: ovmf

add virtio drivers cdrom, install...

Code:
agent: 1
balloon: 0
bios: ovmf
bootdisk: virtio0
cores: 3
cpu: host,hidden=1
hotplug: 0
ide2: none,media=cdrom
machine: q35
memory: 7168
name: WIN10X64PVEXXXX
net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
sockets: 2
virtio0: local-lvm:vm-102-disk-0,cache=writeback,size=320G
vmgenid: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
After install enable RDP and add to the config:
Code:
hostpci0: 04:00,x-vga=1,pcie=1
numa: 1
install nvidia drivers as usual.

Tips on gaming:

install steam and set up steam in home streaming.
Streaming will only work on first boot of vm machine before logging in RDP.
Add notepad or some application as a non steam game (right click in notepad and search with bing to leave notepad and minimize)


Other notes on CPUs being that they are quade core sockets on the R710 the machine will crash and maybe crash the server if you use more than four cores without sockets: 2.

Still a very inexpensive buy for doing Autodesk work remotely and gaming while still having a little CPU power left over for running databases and developing websites and applications at home..

I think the nvidia drivers I'm using are 289.81 I wouldn't recommend installing any newer ones and those seem to be the ones people are recommending on steam for the 750 Ti. Although I heard you can get an RX 460 for around $50 these days.
 
Last edited:

Saxosus

New Member
Apr 8, 2019
2
0
1
49
Question about the vid passthrough instructions...
I've only been using Proxmox for about a month now and I'm still learning the multitude of details, but do the step 3 instructions update the correct blacklist?

In step 3, it says add lines to:
/etc/modprobe.d/blacklist.conf
In my system I see an already existing file named:
/etc/modprobe.d/pve-blacklist.conf

After I ran "update-initramfs -u" and checked with "lspci -v | grep -A 8 -i NVIDIA", I still see the drivers loaded.
 
Last edited:

Saxosus

New Member
Apr 8, 2019
2
0
1
49
Ah yes here is my config for GPU passthrough on the PowerEdge R710 I with Xeon E55xx CPUs using two GTX 750 Ti's (One on this machine and one on another:
...
Hardware:
1. Solder in power plugs onto the power supply connector on the motherboard.
2. Using a razor knife and hot air soldering station heat up the PCI slit in the back (Using 400 degree heat on medium air or whatever works) and cut out the back of the PCI slot to fit cards.
I'm also running an r710 but I don't like the idea of working so hard to modify the server to get an x16 card crammed into the slot in addition to having even more wires to deal with. As it's only a pcie v2 bus anyway, I went with the standard Nvidia Geforce GT710 x8 card. Your GTX 750 will downgrade itself in the x8 slot to 8x pcie 2 speeds, so you'll lose significant throughput there, but your card will still work better (MUCH better I think :) ) than mine. However, there's one more thing you can do to speed it up. While I was researching how much effort I wanted to put into a vid card, I found an interesting thing. To mount a GTX750 in riser 2, you have the choice of using one of two x8 slots. First, and this is ridiculous, but available, there is a very rare, and stupid expensive at over $200, x16 riser card made for the machine. There's also the miner rig mods out there. You can get hold of one of the small bitmining slot adapters which will convert two x8 slots to an x16 slot AND include power into the bus for only about $15. You'll still need an additional power supply because the riser is only rated at 25W per port and not exceeding 30W for the whole riser. Combining the two x8 slots into a single x16 slot will allow your vid card to run at its full speed of 16x

Other notes on CPUs being that they are quade core sockets on the R710 the machine will crash and maybe crash the server if you use more than four cores without sockets: 2.
Can you elaborate a little more on this? I have the hex procs in my machine, but are you saying you have to use at least one core from each proc? Maybe something to do with balancing the data? I'll be interested in learning more.

I think the nvidia drivers I'm using are 289.81 I wouldn't recommend installing any newer ones and those seem to be the ones people are recommending on steam for the 750 Ti. Although I heard you can get an RX 460 for around $50 these days.
Have you experimented with and found any better drivers? I'd love to save the trouble of fighting driver hunting!

Thanks!
 

Neox

New Member
Dec 12, 2018
5
1
1
42
Question about the vid passthrough instructions...
I've only been using Proxmox for about a month now and I'm still learning the multitude of details, but do the step 3 instructions update the correct blacklist?

In step 3, it says add lines to:
/etc/modprobe.d/blacklist.conf
In my system I see an already existing file named:
/etc/modprobe.d/pve-blacklist.conf

After I ran "update-initramfs -u" and checked with "lspci -v | grep -A 8 -i NVIDIA", I still see the drivers loaded.
the real name of the file doesn't matter if you keep the .conf at the end
this is only for you to remember what it does

as pve-blacklist.conf might be provided by "proxmox package", I would use a
gpu-passthrough-blacklist.conf to keep proxmox able to update his own file and keep mine separate

and after the update-initramfs -u -k all
you MUST reboot your server, to activate the new setting
 
  • Like
Reactions: Saxosus

Kavus Kazian

New Member
Aug 17, 2019
4
0
1
31
I too am having an issue with this. I've got no output from the card at all, not even during the UEFI boot splash.

EDIT: I am using Proxmox 6.0-4, maybe something has changed to make it not work, or the options need to be put in differently?

Here's my VM's .conf
Code:
bios: ovmf
bootdisk: virtio0
cores: 4
cpu: host,hidden=1
efidisk0: windows:vm-100-disk-1,size=128K
hostpci0: 01:00,x-vga=on,pcie=1
ide0: local:iso/virtio-win-0.1.141.iso,media=cdrom,size=309208K
ide2: local:iso/Windows10.iso,media=cdrom
machine: q35
memory: 8192
name: Gaming
net0: virtio=3E:B7:06:9A:87:E8,bridge=vmbr0,firewall=1
numa: 1
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=6d603e08-c6e0-4897-9aa6-795b760e440f
sockets: 1
virtio0: windows:vm-100-disk-0,size=110G
vmgenid: f4941051-eab0-4f5e-9517-8b23f5ba89e7
Both parts of my graphics card are passed through, and "lspci -v" shows "Kernel driver in use: vfio-pci"

Code:
01:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1) (prog-if 00 [VGA controller])
Subsystem: eVga.com. Corp. GP107 [GeForce GTX 1050 Ti]
Flags: fast devsel, IRQ 16
Memory at de000000 (32-bit, non-prefetchable) [size=16M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=32M]
I/O ports at e000 [size=128]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] #19
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau

01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
Subsystem: eVga.com. Corp. GP107GL High Definition Audio Controller
Flags: fast devsel, IRQ 17
Memory at df080000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
What worries me is if there's just some reason this specific card will never work for GPU passthrough.
 
Last edited:

koburr

New Member
Feb 16, 2019
4
0
1
30
I too am having an issue with this. I've got no output from the card at all, not even during the UEFI boot splash.

EDIT: I am using Proxmox 6.0-4, maybe something has changed to make it not work, or the options need to be put in differently?

Here's my VM's .conf
Code:
bios: ovmf
bootdisk: virtio0
cores: 4
cpu: host,hidden=1
efidisk0: windows:vm-100-disk-1,size=128K
hostpci0: 01:00,x-vga=on,pcie=1
ide0: local:iso/virtio-win-0.1.141.iso,media=cdrom,size=309208K
ide2: local:iso/Windows10.iso,media=cdrom
machine: q35
memory: 8192
name: Gaming
net0: virtio=3E:B7:06:9A:87:E8,bridge=vmbr0,firewall=1
numa: 1
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=6d603e08-c6e0-4897-9aa6-795b760e440f
sockets: 1
virtio0: windows:vm-100-disk-0,size=110G
vmgenid: f4941051-eab0-4f5e-9517-8b23f5ba89e7
Both parts of my graphics card are passed through, and "lspci -v" shows "Kernel driver in use: vfio-pci"

Code:
01:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1) (prog-if 00 [VGA controller])
Subsystem: eVga.com. Corp. GP107 [GeForce GTX 1050 Ti]
Flags: fast devsel, IRQ 16
Memory at de000000 (32-bit, non-prefetchable) [size=16M]
Memory at c0000000 (64-bit, prefetchable) [size=256M]
Memory at d0000000 (64-bit, prefetchable) [size=32M]
I/O ports at e000 [size=128]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] #19
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau

01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
Subsystem: eVga.com. Corp. GP107GL High Definition Audio Controller
Flags: fast devsel, IRQ 17
Memory at df080000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
What worries me is if there's just some reason this specific card will never work for GPU passthrough.

My config looks like this:

Code:
agent: 1
balloon: 0
bios: ovmf
bootdisk: virtio0
cores: 6
cpu: host,hidden=1
hostpci0: 04:00,x-vga=1,pcie=1
ide0: none,media=cdrom
ide2: none,media=cdrom
machine: q35
memory: 10240
name: WIN10X64KEV
net0: virtio=8A:BC:92:4A:47:AF,bridge=vmbr0
numa: 1
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=x
sockets: 1
virtio0: img:102/vm-102-disk-0.qcow2,cache=unsafe,size=164G
vmgenid: x
You may need to add "balloon: 0" as well as unsafe iommu:
Mine says: [/etc/default/grub]
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on efifb:off intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1 pci-stub.ids=10de:1380,10de:0fbc"
This just tells the iommu "I don't care just make it work" [Beware it allows unsafe memory accessory from the VM through the PCI bus and might be exploited via the GPU/CUDA with DMA access through the MMU]
 
Last edited:

Kavus Kazian

New Member
Aug 17, 2019
4
0
1
31
Okay, so apparently I needed to make the default GPU the intel integrated one, otherwise it wasn't fully letting go of the nvidia one.

But now, with the settings "balloon: 0" and "vfio_iommu_type1.allow_unsafe_interrupts=1 pci-stub.ids=<ids>" it just bluescreens.

xscreenshot-partial-2019-08-17T110346.png

It's the same both with and without "vfio_iommu_type1.allow_unsafe_interrupts=1 pci-stub.ids=<ids>". I had to enable qxl/spice to even see the bluescreen, as the video output from the graphics card cuts out.

I think my system supports interrupt remapping anyway.

It works if I don't specify x-vga=1, but then it shows (code 43) in Windows Device Manager.
 

koburr

New Member
Feb 16, 2019
4
0
1
30
Okay, so apparently I needed to make the default GPU the intel integrated one, otherwise it wasn't fully letting go of the nvidia one.

But now, with the settings "balloon: 0" and "vfio_iommu_type1.allow_unsafe_interrupts=1 pci-stub.ids=<ids>" it just bluescreens.

View attachment 11327

It's the same both with and without "vfio_iommu_type1.allow_unsafe_interrupts=1 pci-stub.ids=<ids>". I had to enable qxl/spice to even see the bluescreen, as the video output from the graphics card cuts out.

I think my system supports interrupt remapping anyway.

It works if I don't specify x-vga=1, but then it shows (code 43) in Windows Device Manager.
Yes. You need to install drivers in windows. First remove the present drivers. Then install drivers that will work in a VM.

Edit:

You may turn on x-vga after uninstalling the drivers and rebooting. You may also want to enable RDP, and disable any passwords and UAC and allow network login without passwords if you are using Steam InHome Streaming, be sure to have your firewall(s) configured correctly for this.
 
Last edited:

Kavus Kazian

New Member
Aug 17, 2019
4
0
1
31
Are there special drivers for VMs? Or do I just install the Geforce drivers from Nvidia's website?
 

koburr

New Member
Feb 16, 2019
4
0
1
30
Are there special drivers for VMs? Or do I just install the Geforce drivers from Nvidia's website?
There may be special (patched) drivers, you will need to patch them yourself with powershell. They are on github.

You may also find other drivers that will work. I have a different GPU, as you can see, and the Microsoft drivers seem to work correctly for my use case.
 

Kavus Kazian

New Member
Aug 17, 2019
4
0
1
31
Unfortunately patching drivers doesn't work, now it just boot loops.

Do Radeon cards have these issues? I'm thinking of just buying one of those next month instead.
 

IvanAngelplus

New Member
Sep 2, 2019
2
0
1
31
Hello, first of all forgive for my English. I do not speak it.

I have installed the version of Porxmox 6 with a GTX 1070. But the device manager of my VM (windows 10) gives error 43. And we failed to install the driver well.

Thanks
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!