GPU Passes through but performance is garbage

phil1c

New Member
Feb 23, 2020
6
0
1
34
Hello!

I beg of all of those who read this for some help as I've been fighting this for way over a month now and I can't crack it. I'm going to keep this intentionally short as 1) you can view all of my endeavours at this link in the Plex support forums, and 2) I don't want to color anyone's ideas of where to investigate based on what I've already done, I think I need fresh eyes.

If you do read my linked Plex forums post, take note that my setup is currently using the R720, 2x e52-2670s, 128gb of RAM, a pair of Samsung 860 EVOs in RAID 1 through the PERC H710P, GTX 1660 SUPER, and I copy the files to the VM local storage for testing. Do I think any of that is strictly relevant? I'm not sure. I've swapped literally every piece of hardware except the SSDs in my testing and nothing has worked, but maybe it'll inspire some ideas in you, the reader.

In short, I am running Proxmox 6.1 with a GTX 1660 installed (currently, I started my build with a P2200 and bought this to see if it was something wrong with the card) and I'm trying to pass it to a Win10 Vm to run Plex with HW transcoding I followed this guide to get GPU passthrough working, and it does work (GPU is present in the VM, shows in device manager, and is present in Task Manager). However, when trying to HW transcode, the transcode does not keep up with file playback and eventually "catches up". When I try and change quality levels from automatic to something specific, it crashes, either by the GPU ENC/DEC usage dropping to 0% or with a file format error (depending on what I'm playing back with). What is of interesting note is, when i put either card into my PC, HW transcoding has no issues with 2+ 4K HEVC streams. If memory serves, the ENC would run at around 14% and DEC around 23% (when viewed in task manager) BUT, in the VM, each runs sub-10% during transcodes. This could be something to do with the virtualization and how that information is presented, but I don't know and, frankly, I basically just made that sentence up given how little about virtualization I actually know.

Now, before you go and say just stick with Plex forums because it seems like a software issue, hear my plea:
  1. That forum has been, thus-far, unhelpful, so I'm spreading out my requests for information/assistance.
  2. I keep hearing tales of people running proxmox with GPU passthrough, especially with P2000/P2200, and it "just works" yet I can't seem to get any of them to respond.
  3. When I drop either the P2200 or the GTX 1660 into my PC, HW transcoding works just fine (multiple 4K HEVC streams, transcoding is always way ahead of playback, no weird file format issues). Given that I've swapped from an R620 to an R720 and from a P2200 to a 1660, I can safely rule out that it's an issue with the card(s), chassis, memory, or processors unless it's inherent to Dell chassis themselves somehow. I, with little knowledge except for my troubleshooting to back this statement up, am left thinking it's something to do with Proxmox and/or my config. Right now, as of this morning, I've fresh-installed Proxmox 6.1 and have performed only the actions outlined in the "Ultimate Guide" to get GPU passthrough working and I still have the same issues as before.

SO...
Any ideas that anyone has are welcome. Any logs or command outputs you think would help will be gleefully provided. Any alternate guides or recommendations on how to set up passthrough will be thoroughly followed. Want me to burn some sage and sacrifice a goat? I'm not sure of where to get a goat around here but I'll figure it out this weekend.


EDIT: I also can't the resolution of the desktop environment to larger than 640x480 and I am using a dummy DP plug that supports up to 4K and EDID. Again, not sure if relevant to my root issue, but maybe it is
 
can you post your vm config (qm config VMID), your versions (pveversion -v) and maybe the kernel messages during such a transcode (dmesg)
 
Is the GPU attached to the same PCIe lanes as the CPU socket your running on the Virtual?
Your not crossing NUMA nodes are you ?

Sorry, looks like your much deeper in to trouble shooting than that, but thought I'd put my 2 cents in.
 
can you post your vm config (qm config VMID), your versions (pveversion -v) and maybe the kernel messages during such a transcode (dmesg)

I can get you the version info and dmesg output when I return home from work, but the VMconfig is below (as of where I finished testing last night, I have a couple other new tests and changes to try as suggested from a reddit post I made). Are there any particular dmesg arguments you want me to pass?

Note: the config I'm posting has the physical GPU passed along with the emulated GPU. I have performed testing with the emulated GPU disabled and active and it has made no difference, but I can always change that around and run some more commands to test. I'm open to anything.

Is the GPU attached to the same PCIe lanes as the CPU socket your running on the Virtual?
Your not crossing NUMA nodes are you ?

Sorry, looks like your much deeper in to trouble shooting than that, but thought I'd put my 2 cents in.

Hey, any suggestions are welcomed with open arms, so thanks for chiming in. I've been wondering about NUMA related shenanigans but am not sure how to determine how it's being handled as I'm not familiar with this line of troubleshooting. Thus far, I have not used the "Enable Numa" GUI checkbox, and I did confirm on the host that the GPU was getting x8 PCI-E lanes (presumably because of how Dell riser cards handle bifurcation), and yesterday I was told about running lstopo but it seems to only show the physical topological layout but does not provide specifically which cores from which NUMA node are being assigned to which VM and thus allowing me to correlate with GPU. Do you know of a way to view the logical assignment of NUMA nodes to VMs? Maybe some args to pass to lstopo to get that or a different command perhaps? And how robust is the "Enable NUMA" checkbox? Does it handle more than just memory allocation?


/etc/pve/qemu-server/100.conf (windows10vm, currently the only VM)
Code:
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off'
balloon: 0
bios: ovmf
bootdisk: virtio0
cores: 8
cpu: host,hidden=1,flags=+pcid
efidisk0: local-lvm:vm-100-disk-1,size=128K
hostpci0: 04:00,pcie=1
ide0: tower-isos:iso/virtio-win-0.1.173.iso,media=cdrom,size=385296K
machine: q35
memory: 32768
name: win10-new1
net0: virtio=0E:22:91:72:21:50,bridge=vmbr0,firewall=1
numa: 0
ostype: win10
scsihw: virtio-scsi-single
smbios1: uuid=c52512e1-0664-426b-9da4-4e05757bbe29
sockets: 1
vga: std,memory=64
virtio0: local-lvm:vm-100-disk-0,cache=writeback,iothread=1,replicate=0,size=150G
vmgenid: 0376dbf8-10cf-4ff7-95ca-4aff767dcbb5
 
Just had a quick skim of this, and it looks like it has the info to figure out which cores and where
(and some bash scripts and bit to fire up the machine on a set node)

Tho the PCIe slots might be quicker to see the motherboards manual, or just fire up the machnie on one NUMA node and the other.
 
can you post your vm config (qm config VMID), your versions (pveversion -v) and maybe the kernel messages during such a transcode (dmesg)

Alright, finally back from work. Here are my current config, pveversion, and dmesg:

/etc/pve/qemu-server/101.conf
```
balloon: 0
bios: ovmf
bootdisk: scsi0
cores: 6
cpu: host
efidisk0: local-lvm:vm-101-disk-1,size=128K
machine: q35
memory: 16384
name: win10-new3
net0: virtio=C6:12:A7:68:4E:CC,bridge=vmbr0,firewall=1
numa: 1
ostype: win10
scsi0: local-lvm:vm-101-disk-0,backup=0,cache=writeback,size=150G
scsihw: virtio-scsi-pci
smbios1: uuid=4a01d547-39b5-42f2-979d-de3b410ca3a7
sockets: 1
hostpci0: 04:00,pcie=1,x-vga=on
```

pveversion -v
```
root@joshua:/etc/pve/qemu-server# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.10-1-pve)
pve-manager: 6.1-3 (running version: 6.1-3/37248ce6)
pve-kernel-5.3: 6.0-12
pve-kernel-helper: 6.0-12
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 1.2.5-1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-2
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.1-2
pve-container: 3.0-14
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191002-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-2
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2
```

dmesg during single 4K HEVC transcode to webapp on "convert automatically" (plays but "catches up" quickly, ENC and DEC sub 10%)
file "dmesg-convertauto.text"

dmesg during same single 4K HEVC transcode to webapp having selected 1080p quality (does not play, ENC/DEC usage and GPU memory usage drops to 0% and never recovers)
file "dmesg-convert1080pWA.txt"
 

Attachments

  • dmesg-convertauto.txt
    99.1 KB · Views: 2
  • dmesg-convert1080pWA.txt
    99.1 KB · Views: 2
nothing really stands out here, but i noticed two small things:

you are not running the most current kernel (should not really make a difference)
the card has a xhci and serial controller which load the xhci driver instead of vfio-pci, maybe this has an influence on the guest driver?

otherwise i would really check the logs from inside the guest....
 
I'm facing a very similar situation right now. Proxmox running on a dual-socket HP server, passing through an Nvidia Quadro P2000 and the performance is significantly lower than it should be. Did you ever figure this out?
 
I'm facing a very similar situation right now. Proxmox running on a dual-socket HP server, passing through an Nvidia Quadro P2000 and the performance is significantly lower than it should be. Did you ever figure this out?
Is there any news on that? I'm especiall interested to this as I do also have HP dual socket servers (DL360 gen8) into which I want to put P2000/P2200 GPUs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!