PCI/GPU Nvidia PCI(e) Passthrough, VM Problems, and Lag/Performance Issues

robertsw

New Member
Mar 7, 2021
8
1
3
41
I am fairly new to Proxmox, and I am running into some issues. Some time back, I saw that PCI passthrough has become more popular, so I decided to take my single-OS Arch Linux server and switch to Proxmox. I have all of the LXC containers working perfectly, but I simply cannot get my VMs to work. I have tried various settings and configurations, all resulting in VERY poor performance. I am going to try and provide everything I can in hopes that someone can help me out. If you take the time to read this entire post; Thank you!

What I had before Proxmox:

Arch Linux PC that was hosting Jellyfin, 4 websites, Openhab, Nextcloud, and Gitea. Most were containerized through Cockpit. The frontend was a modified desktop environment designed to launch Steam big-screen.

What I would like with Proxmox:

Since I have many things that I could containerize or build into a VM, I decided to break my system down in a logical way.

Node - Banshee
  1. -- LXC for my SQL requirements (Centos LXC, works perfectly) | dedicated 1024mb memory and a single core
  2. -- LXC for my Pihole / DNS blakcholer (Ubuntu LXC, works perfectly) | dedicated 512mb memory and a single core
  3. -- LXC for my Data / Services requirements (Arch Linux LXC, works perfectly) | dedicated 4096mb memory and 2 cores
    -- has Jellyfin, Openhab, Nextcloud and Gitea running perfectly
  4. -- LXC for my Websites / Web server (Arch Linux LXC, works perfectly) | dedicated 1024mb memory and a single core
** I chose to use LXCs because of the simplicity of mount points related to the physical RAID. It might not be the most secure, but I am okay with this. Additionally, the LXCs consume significantly less system resources which is a huge bonus for me. **

Most of the LXCs listed above aren't even consuming 1/2 of their allocated memory or 1/4 of their allocated processing power. (my websites suck and I generate no traffic!)

I would like to host a VM where I pass my GPU through for remote (and local) gaming. I would also like the VM to actually work.

I have set the onboard HDMI as the primary boot device, so nothing is displayed on the PCIe video card (Nvidia GTX 980).

My references:

https://pve.proxmox.com/wiki/Pci_passthrough
https://superuser.com/questions/103...d-linux-machines-video-mode-is-graphics-outpu
https://manjaro.site/how-to-enable-gpu-passthrough-on-proxmox-6-2/
https://hackmd.io/@edingroot/SkGD3Q7Wv
https://www.youtube.com/watch?v=fgx3NMk6F54
https://www.youtube.com/watch?v=-HCzLhnNf-A (his look creeps me out a little, but he seems very knowledgeable)

My problem:

I created a Windows 10-based VM where I want to pass my video card through for remote gaming. Additionally, I want the video output of the video card to display through HDMI. Based on the resources online, I have configured my system every way I can read or think of with failed results.

First Issue: When I start the VM, it is VERY slow and sluggish, to the point where I can't even get through the installer before it comes to a complete halt and is unresponsive through noVNC. I would love to setup RDP, but I have a hard time getting to that point. One of the prior configurations and setups, I did get to RDP, but it was crap results. There was a little bit of improvement, but not by much. When I try to install the Nvidia drivers, the card isn't detected.

Second Issue: Based on everything I have read online through the different sources, the configurations I have provided below should (theoretically) make it to where the GPU is successfully passed through. Furthermore, by setting the display to VirtIO-GPU, should allow the video output of the card to activate displaying the VM through the HDMI. (I checked HDMI and all of the Display Ports, there is nothing being displayed).

Hardware:

Code:
root@banshee:~# screenfetch
         _,met$$$$$gg.           root@banshee
      ,g$$$$$$$$$$$$$$$P.        OS: Debian
    ,g$$P""       """Y$$.".      Kernel: x86_64 Linux 5.3.18-2-pve
   ,$$P'              `$$$.      Uptime: 1h 51m
  ',$$P       ,ggs.     `$$b:    Packages: 677
  `d$$'     ,$P"'   .    $$$     Shell: bash 5.0.3
   $$P      d$'     ,    $$P     CPU: Intel Core i9-9900K @ 16x 5GHz [27.8°C]
   $$:      $$.   -    ,d$$'     GPU: GeForce GTX 980
   $$\;      Y$b._   _,d$P'      RAM: 2385MiB / 31958MiB
   Y$$.    `.`"Y$$$$P"'      
   `$$b      "-.__          
    `Y$$                    
     `Y$$.                  
       `$$b.                
         `Y$$b.              
            `"Y$b._          
                `""""

The primary drive is a 250GB NVME drive.
Additionally, I have a 5-drive RAID 6 3TB storage.

Configurations:

So that everyone can see my setup, I am providing everything I can on my BIOS configurations and boot configurations (that I can remember):

Here is virt-host-validate:
Code:
root@banshee:~# virt-host-validate
  QEMU: Checking for hardware virtualization                                 : PASS
  QEMU: Checking if device /dev/kvm exists                                   : PASS
  QEMU: Checking if device /dev/kvm is accessible                            : PASS
  QEMU: Checking if device /dev/vhost-net exists                             : PASS
  QEMU: Checking if device /dev/net/tun exists                               : PASS
  QEMU: Checking for cgroup 'cpu' controller support                         : PASS
  QEMU: Checking for cgroup 'cpuacct' controller support                     : PASS
  QEMU: Checking for cgroup 'cpuset' controller support                      : PASS
  QEMU: Checking for cgroup 'memory' controller support                      : PASS
  QEMU: Checking for cgroup 'devices' controller support                     : PASS
  QEMU: Checking for cgroup 'blkio' controller support                       : PASS
  QEMU: Checking for device assignment IOMMU support                         : PASS
  QEMU: Checking if IOMMU is enabled by kernel                               : PASS
   LXC: Checking for Linux >= 2.6.26                                         : PASS
   LXC: Checking for namespace ipc                                           : PASS
   LXC: Checking for namespace mnt                                           : PASS
   LXC: Checking for namespace pid                                           : PASS
   LXC: Checking for namespace uts                                           : PASS
   LXC: Checking for namespace net                                           : PASS
   LXC: Checking for namespace user                                          : PASS
   LXC: Checking for cgroup 'cpu' controller support                         : PASS
   LXC: Checking for cgroup 'cpuacct' controller support                     : PASS
   LXC: Checking for cgroup 'cpuset' controller support                      : PASS
   LXC: Checking for cgroup 'memory' controller support                      : PASS
   LXC: Checking for cgroup 'devices' controller support                     : PASS
   LXC: Checking for cgroup 'freezer' controller support                     : PASS
   LXC: Checking for cgroup 'blkio' controller support                       : PASS
   LXC: Checking if device /sys/fs/fuse/connections exists                   : PASS


Here is lscpu:
Code:
root@banshee:~# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       39 bits physical, 48 bits virtual
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               158
Model name:          Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
Stepping:            13
CPU MHz:             5000.227
CPU max MHz:         5000.0000
CPU min MHz:         800.0000
BogoMIPS:            7200.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            16384K
NUMA node0 CPU(s):   0-15
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities

To the best of my knowledge and capability, virtualization is enabled in my BIOS. (as I physically enabled it and verified multiple times that it was still on).

System configurations (at the time of this writing):

Proxmox:

/etc/default/grub

Code:
...
GRUB_DEFAULT="Advanced options for Proxmox Virtual Environment GNU/Linux>Proxmox Virtual Environment GNU/Linux, with Linux 5.3.18-2-pve"

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox Virtual Environment"
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt nofb nomodeset video=efifb:off"
GRUB_CMDLINE_LINUX=""
...

I am booting an older kernel as I have seen and heard success from 5.3 and lower. Apparently, there are some known issues with PCI passthrough and performance with 5.4.x kernels right now. (again, I read that somewhere, and from what I can tell, there is truth to this because it wasn't working on the latest kernel).

dmesg | grep -e DMAR -e IOMMU
Code:
root@banshee:~# dmesg | grep -e DMAR -e IOMMU
[    0.006879] ACPI: DMAR 0x0000000085CCD308 0000A8 (v01 INTEL  EDK2     00000002      01000013)
[    0.092359] DMAR: IOMMU enabled
[    0.196837] DMAR: Host address width 39
[    0.196837] DMAR: DRHD base: 0x000000fed90000 flags: 0x0
[    0.196841] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 19e2ff0505e
[    0.196842] DMAR: DRHD base: 0x000000fed91000 flags: 0x1
[    0.196844] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
[    0.196844] DMAR: RMRR base: 0x00000084942000 end: 0x00000084961fff
[    0.196845] DMAR: RMRR base: 0x00000087800000 end: 0x0000008fffffff
[    0.196846] DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 1
[    0.196847] DMAR-IR: HPET id 0 under DRHD base 0xfed91000
[    0.196847] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[    0.200037] DMAR-IR: Enabled IRQ remapping in x2apic mode
[    0.952672] DMAR: No ATSR found
[    0.952730] DMAR: dmar0: Using Queued invalidation
[    0.952732] DMAR: dmar1: Using Queued invalidation
[    0.963745] DMAR: Intel(R) Virtualization Technology for Directed I/O

lsmod |grep nvidia renders nothing because I have blacklisted the drivers. Here are other things that were suggested in the PCI passthrough guides and walkthroughs as well:

Code:
root@banshee:~# cat /etc/modprobe.d/blacklist.conf
blacklist nouveau
blacklist nvidia
blacklist nvidiafb

root@banshee:~# cat /etc/modprobe.d/iommu_unsafe_interrupts.conf
options vfio_iommu_type1 allow_unsafe_interrupts=1

root@banshee:~# cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:13c0,10de:0fbb disable_vga=1

The associated devices align with my PCI IDs on the video card as well:
Code:
03:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev ff)
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau
03:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev ff)
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

** I think everything is booting properly in accordance with the guides that Proxmox has offered **

dmesg | grep fb
Code:
root@banshee:~# dmesg | grep fb
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.3.18-2-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt nofb nomodeset video=efifb:off
[    0.092313] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.3.18-2-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt nofb nomodeset video=efifb:off
[    0.228787] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x33e452fbb2f, max_idle_ns: 440795236593 ns
[    0.388078] pci 0000:00:1f.4: reg 0x20: [io  0xefa0-0xefbf]
[    0.389173] pci 0000:03:00.1: [10de:0fbb] type 00 class 0x040300
[    3.327470] fbcon: Taking over console
[    4.328826] vfio_pci: add [10de:0fbb[ffffffff:ffffffff]] class 0x000000/00000000

And because some people experienced separate issues where defining the rom was required, I dumped the rom file from my Nvidia and saved that in a safe location AND in /usr/share/kvm/dump.rom
Code:
root@banshee:~# ls /usr/share/kvm/dump.rom
/usr/share/kvm/dump.rom

I configured built the VM based on many different guides, and honestly, I am not sure why I have selected have the crap I have now... it just became natural for me; I am sure there is something wrong with this and will take ANY guidance or help in properly configuring the VM for Windows 10.

Code:
root@banshee:~# cat /etc/pve/qemu-server/104.conf
agent: 1
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
bios: ovmf
boot: order=ide2;hostpci0;scsi0
cores: 8
cpu: host,hidden=1,flags=+pcid
efidisk0: local-lvm:vm-104-disk-1,size=4M
hostpci0: 03:00,pcie=1,romfile=dump.rom
ide0: local:iso/virtio-win-0.1.185.iso,media=cdrom,size=402812K
ide2: local:iso/Win10_20H2_v2_English_x64.iso,media=cdrom
machine: pc-q35-5.2
memory: 16384
name: gecko
net0: virtio=DA:BF:E8:A6:5C:A2,bridge=vmbr0,firewall=1
numa: 1
ostype: l26
scsi0: local-lvm:vm-104-disk-0,discard=on,iothread=1,replicate=0,size=60G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=89bbec72-85e4-4601-9a7a-f636be0a99ec
sockets: 1
usb0: host=1-6,usb3=1
vmgenid: 51536712-93f9-4430-9117-c52f1f9fadcb

I am hoping that I am not the only one with this issue or have experienced this issue. Maybe someone can shed some light on what I am doing wrong or if this is a lost cause and I should return the old setup for now and try again later when there is more support for PCI/GPU passthrough.

Thanks for any help in advance.
 
Last edited:
hi,

i have a few suggestions, but cannot promise it will solve your problems, but i can't hurt to try
1. i'd use the latest kernel available, for various reason (#1 is security/patches; also i did not hear anything about 5.4 and pci passthrough slowdowns? can you provide a link maybe?)
2. remove the following line from your config:

args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
it does nothing that we do not already do

3. change the line
hostpci0: 03:00,pcie=1,romfile=dump.rom
to
Code:
hostpci0: 03:00,pcie=1,x-vga=1,romfile=dump.rom
4. change the ostype to 'win10'
this will enable some cpu hv flags that can vastly improve performance for modern windows guests
and in conjunction with the 'x-vga' part will set the vendor-id
5. i'd probably disable numa, but i am not sure if it makes any difference at all, but it's worth a shot
 
I honestly can't remember where I saw the reference to the new kernel not working properly, but I can confirm that it's an issue with my setup now.

When booting Win10 with your recommendations above (and on the latest kernel), it displays in console and shows the boot circle only. It never actually gets to WIn10.

I followed your recommendations but using the the 5.3.18 kernel and I got the VM to boot up.

The only problem I have now is that the VM only displays in the console. Even with the primary display option chosen, it will not display Win10 on HDMI. It does show the proxmox bios logo though.

I am playing with the settings now.
 
Last edited:
Even more weird things happening here.

So, first off; I got the video to output on HDMI.

Code:
agent: 0
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 4
cpu: host,hidden=1
efidisk0: local-lvm:vm-104-disk-1,size=4M
#hostpci0: 01:00,pcie=1,romfile=dump.rom
hostpci0: 01:00,pcie=1
ide0: local:iso/virtio-win-0.1.185.iso,media=cdrom,size=402812K
ide2: local:iso/Win10_20H2_v2_English_x64.iso,media=cdrom
machine: pc-q35-5.2
memory: 16384
name: gecko
net0: virtio=72:CE:EF:BF:91:11,bridge=vmbr0,firewall=1
numa: 0
#ostype: win10
ostype: 126
scsi0: local-lvm:vm-104-disk-0,discard=on,size=60G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=a4a33b4f-487c-46ef-ade6-521d8aa69a68
sockets: 1
vga: virtio
vmgenid: cab091f4-08f0-4a4f-aac5-25e168aea5a7

win10 as the ostype causes more issues so I kept it at 126. I got rid of the rom file as well. It seems to be a little sluggish, but I think I can find my way from here.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!