GPU passthrough tutorial/reference

Discussion in 'Proxmox VE: Installation and configuration' started by sshaikh, Apr 23, 2017.

  1. hewu

    hewu New Member
    Proxmox Subscriber

    Joined:
    May 12, 2018
    Messages:
    8
    Likes Received:
    1
    Hello, everybody,
    I've worked my way through the tutorial and almost everything works fine. Thanks a lot for that!
    However, I have the following problem:
    When I restart the VM with Passthrough, my whole system freezes and I have to restart everything.
    I hope you can help me with the diagnosis and give me tips on how to fix it.

    Thanks a lot!
    Greetings hewu

    Translated with www.DeepL.com/Translator
     
  2. dcsapak

    dcsapak Proxmox Staff Member
    Staff Member

    Joined:
    Feb 1, 2016
    Messages:
    3,696
    Likes Received:
    338
    what kind of card is it ? some amd vega cards cannot be reset properly
    any output in dmesg/syslog ?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. hewu

    hewu New Member
    Proxmox Subscriber

    Joined:
    May 12, 2018
    Messages:
    8
    Likes Received:
    1
    Thank you for your reply

    It is an AMD Radeon Vega Frontier Edition. Is there a Workaround? What is the reason for this behavior?

    Greetings
    Hewu
     

    Attached Files:

    BobhWasatch likes this.
  4. dcsapak

    dcsapak Proxmox Staff Member
    Staff Member

    Joined:
    Feb 1, 2016
    Messages:
    3,696
    Likes Received:
    338
    the only workaround i know of is to eject the driver in the guest of the card before rebooting/shutdown
    (e.g. rmmod for linux guest)

    it seems there is a kernel fix for this in 4.19, but i do not believe we will backport this
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. Jeffrey Roberts

    Jeffrey Roberts New Member

    Joined:
    Jan 11, 2019
    Messages:
    7
    Likes Received:
    0
    I followed the instructions, but I do not see

    Kernel driver in use: vfio-pci

    when I execute

    lspci -v

    Any ideas on what I might be doing wrong?

    ...

    Also, I see

    Kernel modules: nvidiafb, nouveau

    I added nouveau to the blacklist, should nouveau still be listed in the kernel modules?

    Thank you
     
  6. davu

    davu New Member

    Joined:
    Feb 6, 2019
    Messages:
    1
    Likes Received:
    0
    Hello, I am having troubles with my GPU Passthrough. I am using an RX570. I have two VM's in which I have the GPU passed through (only one is on at a time).

    This method has worked flawlessly with my Windows 10 VM. And it somewhat works with my Ubuntu VM.

    The GPU will passthrough to the Ubuntu VM once, and only once. Whenever I shut it down, I cannot start it back up again until I restart the whole proxmox node. The error I get when starting the Ubuntu VM after shutting it down is:

    Code:
    TASK ERROR: start failed: command ' (a bunch of parameters) ' failed: got timeout
    it seems as if the GPU is getting locked after I shutdown the VM (I shutdown the vm from inside the vm, but clicking the power button and shutting down)
     
  7. koburr

    koburr New Member

    Joined:
    Feb 16, 2019
    Messages:
    4
    Likes Received:
    0
    Ah yes here is my config for GPU passthrough on the PowerEdge R710 I with Xeon E55xx CPUs using two GTX 750 Ti's (One on this machine and one on another:

    Code:
    agent: 1
    balloon: 0
    bios: ovmf
    bootdisk: virtio0
    cores: 3
    cpu: host,hidden=1
    hostpci0: 04:00,x-vga=1,pcie=1
    hotplug: 0
    ide2: none,media=cdrom
    machine: q35
    memory: 7168
    name: WIN10X64PVEXXXX
    net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0
    numa: 1
    ostype: win10
    scsihw: virtio-scsi-pci
    smbios1: uuid=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
    sockets: 2
    virtio0: local-lvm:vm-102-disk-0,cache=writeback,size=320G
    vmgenid: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
    
    It would boot up the machine and crash the whole server with pcie without numa: 1 but without numa would run regular pci.

    Instructions:

    Hardware:
    1. Solder in power plugs onto the power supply connector on the motherboard.
    2. Using a razor knife and hot air soldering station heat up the PCI slit in the back (Using 400 degree heat on medium air or whatever works) and cut out the back of the PCI slot to fit cards.

    Proxmox:
    Edit grub command line w unsafe interrupts:
    Code:
    # nano /etc/default/grub
    
    change:
    Code:
    GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1 video=efifb:eek:ff"
    
    add to /etc/modules:
    Code:
    # echo "vfio" > /etc/modules
    # echo "vfio_iommu_type1" > /etc/modules
    # echo "vfio_pci" > /etc/modules
    # echo "vfio_virqfd" > /etc/modules
    
    Allow unsafe interrupts in vfio:
    Code:
    echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf
    
    Add dev id (either 04:00 or 06:00 on the R710):
    Code:
    lspci -n -s 04:00
    04:00.0 0300: 10de:1380 (rev a2)
    04:00.1 0403: 10de:0fbc (rev a1)
    
    echo "options vfio-pci ids=10de:1380,10de:0fbc disable_vga=1" > /etc/modprobe.d/vfio.conf
    
    blacklist drivers:
    Code:
    echo "blacklist radeon" >> /etc/modprobe.d/blacklist.conf
    echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
    echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf
    
    Make a new VM for windows 10 as usual: virtio lvm, virtio networking, cpu: host, hidden=1, machine: q35, bios: ovmf

    add virtio drivers cdrom, install...

    Code:
    agent: 1
    balloon: 0
    bios: ovmf
    bootdisk: virtio0
    cores: 3
    cpu: host,hidden=1
    hotplug: 0
    ide2: none,media=cdrom
    machine: q35
    memory: 7168
    name: WIN10X64PVEXXXX
    net0: virtio=XX:XX:XX:XX:XX:XX,bridge=vmbr0
    ostype: win10
    scsihw: virtio-scsi-pci
    smbios1: uuid=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
    sockets: 2
    virtio0: local-lvm:vm-102-disk-0,cache=writeback,size=320G
    vmgenid: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
    
    After install enable RDP and add to the config:
    Code:
    hostpci0: 04:00,x-vga=1,pcie=1
    numa: 1
    
    install nvidia drivers as usual.

    Tips on gaming:

    install steam and set up steam in home streaming.
    Streaming will only work on first boot of vm machine before logging in RDP.
    Add notepad or some application as a non steam game (right click in notepad and search with bing to leave notepad and minimize)


    Other notes on CPUs being that they are quade core sockets on the R710 the machine will crash and maybe crash the server if you use more than four cores without sockets: 2.

    Still a very inexpensive buy for doing Autodesk work remotely and gaming while still having a little CPU power left over for running databases and developing websites and applications at home..

    I think the nvidia drivers I'm using are 289.81 I wouldn't recommend installing any newer ones and those seem to be the ones people are recommending on steam for the 750 Ti. Although I heard you can get an RX 460 for around $50 these days.
     
    #47 koburr, Feb 16, 2019
    Last edited: Feb 16, 2019
  8. Saxosus

    Saxosus New Member

    Joined:
    Apr 8, 2019
    Messages:
    2
    Likes Received:
    0
    Question about the vid passthrough instructions...
    I've only been using Proxmox for about a month now and I'm still learning the multitude of details, but do the step 3 instructions update the correct blacklist?

    In step 3, it says add lines to:
    /etc/modprobe.d/blacklist.conf
    In my system I see an already existing file named:
    /etc/modprobe.d/pve-blacklist.conf

    After I ran "update-initramfs -u" and checked with "lspci -v | grep -A 8 -i NVIDIA", I still see the drivers loaded.
     
    #48 Saxosus, May 11, 2019
    Last edited: May 11, 2019
  9. Saxosus

    Saxosus New Member

    Joined:
    Apr 8, 2019
    Messages:
    2
    Likes Received:
    0
    I'm also running an r710 but I don't like the idea of working so hard to modify the server to get an x16 card crammed into the slot in addition to having even more wires to deal with. As it's only a pcie v2 bus anyway, I went with the standard Nvidia Geforce GT710 x8 card. Your GTX 750 will downgrade itself in the x8 slot to 8x pcie 2 speeds, so you'll lose significant throughput there, but your card will still work better (MUCH better I think :) ) than mine. However, there's one more thing you can do to speed it up. While I was researching how much effort I wanted to put into a vid card, I found an interesting thing. To mount a GTX750 in riser 2, you have the choice of using one of two x8 slots. First, and this is ridiculous, but available, there is a very rare, and stupid expensive at over $200, x16 riser card made for the machine. There's also the miner rig mods out there. You can get hold of one of the small bitmining slot adapters which will convert two x8 slots to an x16 slot AND include power into the bus for only about $15. You'll still need an additional power supply because the riser is only rated at 25W per port and not exceeding 30W for the whole riser. Combining the two x8 slots into a single x16 slot will allow your vid card to run at its full speed of 16x

    Can you elaborate a little more on this? I have the hex procs in my machine, but are you saying you have to use at least one core from each proc? Maybe something to do with balancing the data? I'll be interested in learning more.

    Have you experimented with and found any better drivers? I'd love to save the trouble of fighting driver hunting!

    Thanks!
     
  10. Sub-7

    Sub-7 New Member

    Joined:
    Mar 11, 2018
    Messages:
    7
    Likes Received:
    1
  11. Neox

    Neox New Member

    Joined:
    Dec 12, 2018
    Messages:
    5
    Likes Received:
    1
    the real name of the file doesn't matter if you keep the .conf at the end
    this is only for you to remember what it does

    as pve-blacklist.conf might be provided by "proxmox package", I would use a
    gpu-passthrough-blacklist.conf to keep proxmox able to update his own file and keep mine separate

    and after the update-initramfs -u -k all
    you MUST reboot your server, to activate the new setting
     
    Saxosus likes this.
  12. Kavus Kazian

    Kavus Kazian New Member

    Joined:
    Saturday
    Messages:
    4
    Likes Received:
    0
    I too am having an issue with this. I've got no output from the card at all, not even during the UEFI boot splash.

    EDIT: I am using Proxmox 6.0-4, maybe something has changed to make it not work, or the options need to be put in differently?

    Here's my VM's .conf
    Code:
    bios: ovmf
    bootdisk: virtio0
    cores: 4
    cpu: host,hidden=1
    efidisk0: windows:vm-100-disk-1,size=128K
    hostpci0: 01:00,x-vga=on,pcie=1
    ide0: local:iso/virtio-win-0.1.141.iso,media=cdrom,size=309208K
    ide2: local:iso/Windows10.iso,media=cdrom
    machine: q35
    memory: 8192
    name: Gaming
    net0: virtio=3E:B7:06:9A:87:E8,bridge=vmbr0,firewall=1
    numa: 1
    ostype: win10
    scsihw: virtio-scsi-pci
    smbios1: uuid=6d603e08-c6e0-4897-9aa6-795b760e440f
    sockets: 1
    virtio0: windows:vm-100-disk-0,size=110G
    vmgenid: f4941051-eab0-4f5e-9517-8b23f5ba89e7
    Both parts of my graphics card are passed through, and "lspci -v" shows "Kernel driver in use: vfio-pci"

    Code:
    01:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1) (prog-if 00 [VGA controller])
    Subsystem: eVga.com. Corp. GP107 [GeForce GTX 1050 Ti]
    Flags: fast devsel, IRQ 16
    Memory at de000000 (32-bit, non-prefetchable) [size=16M]
    Memory at c0000000 (64-bit, prefetchable) [size=256M]
    Memory at d0000000 (64-bit, prefetchable) [size=32M]
    I/O ports at e000 [size=128]
    Expansion ROM at 000c0000 [disabled] [size=128K]
    Capabilities: [60] Power Management version 3
    Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
    Capabilities: [78] Express Legacy Endpoint, MSI 00
    Capabilities: [100] Virtual Channel
    Capabilities: [250] Latency Tolerance Reporting
    Capabilities: [128] Power Budgeting <?>
    Capabilities: [420] Advanced Error Reporting
    Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
    Capabilities: [900] #19
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau
    
    01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
    Subsystem: eVga.com. Corp. GP107GL High Definition Audio Controller
    Flags: fast devsel, IRQ 17
    Memory at df080000 (32-bit, non-prefetchable) [size=16K]
    Capabilities: [60] Power Management version 3
    Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
    Capabilities: [78] Express Endpoint, MSI 00
    Capabilities: [100] Advanced Error Reporting
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel
    What worries me is if there's just some reason this specific card will never work for GPU passthrough.
     
    #52 Kavus Kazian, Aug 17, 2019 at 04:41
    Last edited: Aug 17, 2019 at 05:24
  13. koburr

    koburr New Member

    Joined:
    Feb 16, 2019
    Messages:
    4
    Likes Received:
    0

    My config looks like this:

    Code:
    agent: 1
    balloon: 0
    bios: ovmf
    bootdisk: virtio0
    cores: 6
    cpu: host,hidden=1
    hostpci0: 04:00,x-vga=1,pcie=1
    ide0: none,media=cdrom
    ide2: none,media=cdrom
    machine: q35
    memory: 10240
    name: WIN10X64KEV
    net0: virtio=8A:BC:92:4A:47:AF,bridge=vmbr0
    numa: 1
    ostype: win10
    scsihw: virtio-scsi-pci
    smbios1: uuid=x
    sockets: 1
    virtio0: img:102/vm-102-disk-0.qcow2,cache=unsafe,size=164G
    vmgenid: x
    
    You may need to add "balloon: 0" as well as unsafe iommu:
    Mine says: [/etc/default/grub]
    Code:
    GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on efifb:off intel_iommu=on vfio_iommu_type1.allow_unsafe_interrupts=1 pci-stub.ids=10de:1380,10de:0fbc"
    
    This just tells the iommu "I don't care just make it work" [Beware it allows unsafe memory accessory from the VM through the PCI bus and might be exploited via the GPU/CUDA with DMA access through the MMU]
     
    #53 koburr, Aug 17, 2019 at 12:44
    Last edited: Aug 17, 2019 at 12:52
  14. Kavus Kazian

    Kavus Kazian New Member

    Joined:
    Saturday
    Messages:
    4
    Likes Received:
    0
    Okay, so apparently I needed to make the default GPU the intel integrated one, otherwise it wasn't fully letting go of the nvidia one.

    But now, with the settings "balloon: 0" and "vfio_iommu_type1.allow_unsafe_interrupts=1 pci-stub.ids=<ids>" it just bluescreens.

    xscreenshot-partial-2019-08-17T110346.png

    It's the same both with and without "vfio_iommu_type1.allow_unsafe_interrupts=1 pci-stub.ids=<ids>". I had to enable qxl/spice to even see the bluescreen, as the video output from the graphics card cuts out.

    I think my system supports interrupt remapping anyway.

    It works if I don't specify x-vga=1, but then it shows (code 43) in Windows Device Manager.
     
  15. koburr

    koburr New Member

    Joined:
    Feb 16, 2019
    Messages:
    4
    Likes Received:
    0
    Yes. You need to install drivers in windows. First remove the present drivers. Then install drivers that will work in a VM.

    Edit:

    You may turn on x-vga after uninstalling the drivers and rebooting. You may also want to enable RDP, and disable any passwords and UAC and allow network login without passwords if you are using Steam InHome Streaming, be sure to have your firewall(s) configured correctly for this.
     
    #55 koburr, Aug 17, 2019 at 20:25
    Last edited: Aug 17, 2019 at 20:32
  16. Kavus Kazian

    Kavus Kazian New Member

    Joined:
    Saturday
    Messages:
    4
    Likes Received:
    0
    Are there special drivers for VMs? Or do I just install the Geforce drivers from Nvidia's website?
     
  17. koburr

    koburr New Member

    Joined:
    Feb 16, 2019
    Messages:
    4
    Likes Received:
    0
    There may be special (patched) drivers, you will need to patch them yourself with powershell. They are on github.

    You may also find other drivers that will work. I have a different GPU, as you can see, and the Microsoft drivers seem to work correctly for my use case.
     
  18. Kavus Kazian

    Kavus Kazian New Member

    Joined:
    Saturday
    Messages:
    4
    Likes Received:
    0
    Unfortunately patching drivers doesn't work, now it just boot loops.

    Do Radeon cards have these issues? I'm thinking of just buying one of those next month instead.
     
  19. jimnordb

    jimnordb New Member

    Joined:
    May 4, 2016
    Messages:
    26
    Likes Received:
    1
    @Kavus Kazian you need to use kernel_irqchip=on

    qm set ID -args '-machine type=q35,kernel_irqchip=on'
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice