Windows 10 VM with GPU passthrough gets poor performance in single game (Fallout 76)

jppowers

Member
Apr 18, 2020
10
1
6
38
Hello folks!

I built up a Proxmox host a couple months ago in order to merge a bunch of PC's I had doing different tasks into one physical machine with a handful of VM's. Hardware wise it's a bit overkill.

Dell Poweredge R730XD
Intel Xeon E5-2667v4 (dual)
256GB DDR4 2400MHz RAM (128GB per CPU)
8x6TB HDD's (zraid2 I carried over from an old NAS), 4x1TB SSD's (in ZFS RAID10 for VM image storage), and 2x250GB SSD's for Proxmox install, and a Supermicro dual NVMe PCIe card with slot bifurcation on so I can have a couple Samsung NVMe SSD's
Started with an nVidia GTX 980Ti but upgraded to an RTX 2080 Super now that I got (most) games running stable
A handful of other misc. bits and pieces unrelated.
OS version is Proxmox 6.1

I've got a Windows 10 VM with 12 vCPUs (using pve-helpers to taskset them to cores on the second CPU since that's what the GPU is electrically connected to it), 32GB RAM, and PCIe Passhtru of a 500GB Samsung 970 Evo SSD NVMe (I was using a VirtIO SCSI disk but used Samsung's software to clone from it to the SSD, then from "inside" the VM while booting changed the boot order during POST, then removed the VirtIO disk), a USB 3.0 4 port PCIe card, and the RTX 2080 Super GPU.

After a lot of tinkering/tweaking I got games performing consistently well in the VM. Call of Duty Warzone runs at a very nice 60+fps constantly with settings turned way up, 3DMark earns a decent enough score all things considered, etc. Probably not as good as bare-metal but more than well enough for me. However, with the release of Fallout 76 Wastelanders DLC I figured I'd hop back into that game... and it runs like crap. I still have a Shadow.tech "Windows 10 VM in the cloud" account, which is just KVM's with the v3 version of my CPU with 8 threads, 12GB RAM, smaller virtIO disks, and Quadro P5000's, but that's getting 2-3 times better framerates. On my personal machine's VM I'm seeing ~20-30 fps at best, even when turning everything down in game, while my Shadow is getting 60+fps in the same locations. For note keeping purposes, I added a Fallout 1st subscription to my account so I could have a game server to myself so I can be very certain my issues are not other players creating a mess on the server dropping my frame rate.

I've done some more tweaking today, specifically loading WSL onto both my machine and Shadow so I could get into a linux prompt and do a cat /proc/cpuinfo to compare the cpu flags best I could, did see stuff like sse3/sse4 missing which definitely could explain performance differences. I got those loaded in via adding more to args: -cpu list, compared the lists again and everything looks good on the lists, but I'm still seeing really poor performance. I'm not at a point where I'm pretty sure I have a bunch

It very well could a situation of "Fallout 76 just runs like crap in a VM" but I'm just left scratching my head how Shadow.tech figured it out so it runs well. For now I'm just gonna play this game there while I continue to throw stuff at the wall and see what sticks. I'm including my VM's .conf below. I imagine I'm just not sure where to go next to troubleshoot this. What's confusing me the most is that even though I have my cpu set as host I still have to pass so many flags that should be included, and I'm guessing I just have something figured wrong somewhere along the line? Any assistance would be greatly appreciated!


#cpu_taskset 5,7,9,11,13,15,21,23,25,27,29,31
agent: 1
args: -machine 'type=q35,kernel_irqchip=on' -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off,hv_time,hv-vpindex,hv_vapic,hv_relaxed,hv_synic,hv_stimer,ss,rdtscp,pclmulqdq,ssse3,fma,sse4_1,sse4_2,movbe,popcnt,xsave,avx,f16c,rdrand,abm,fsgsbase,tsc_adjust,bmi1,avx2,smep,bmi2,erms,invpcid'
balloon: 0
bios: ovmf
boot: dc
bootdisk: scsi0
cores: 12
cpu: host,hidden=1,flags=+md-clear;+pcid;+spec-ctrl;+ssbd;+pdpe1gb;+aes
efidisk0: vmstorage:vm-104-disk-1,size=1M
hookscript: local:snippets/exec-cmds
hostpci0: 82:00,pcie=1,x-vga=1
hostpci1: 85:00,pcie=1
hostpci2: 83:00,pcie=1
machine: q35
memory: 32768
name: Gamer
net0: virtio=46:A6:25:E0:4E:79,bridge=vmbr3,firewall=1
numa: 1
ostype: win10
scsihw: virtio-scsi-single
smbios1: uuid=07f2a4aa-e526-44e2-89b9-502f2a97b59b
sockets: 1
tablet: 0
vga: none
vmgenid: ab8c535a-fbb4-412c-8d58-1a78136200d4
 
Update:

I added a few more MS Hyper-V CPU Flags to my CPU args and it's ... maybe a hair better but nothing huge if it is, and I'm not sure it is. It felt like in a couple areas I was getting a touch better performance, ~3fps, but it's still getting below 20fps in most external areas. New additions are hv_spinlocks=0x1fff,hv_reset,hv_frequencies and I added hv-tlbflush from the CPU options in the GUI.
 
hi, first of all the line
args: -machine 'type=q35,kernel_irqchip=on' -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off,hv_time,hv-vpindex,hv_vapic,hv_relaxed,hv_synic,hv_stimer,ss,rdtscp,pclmulqdq,ssse3,fma,sse4_1,sse4_2,movbe,popcnt,xsave,avx,f16c,rdrand,abm,fsgsbase,tsc_adjust,bmi1,avx2,smep,bmi2,erms,invpcid'
should not really be needed, we already add the hv_* flags, also the 'kernel_irqchip' workaround was only necessary on qemu 4.0.1 (or 4.0.0, i don't remember)

aside from that the config looks ok
things that could impact performance is storage (but seems to be nvme), memory speed (i didn't find any info about concrete shadow.tech specs aside from "DDR4")
also the taskset might hurt performance more than you gain (in all my tests, manually setting the cores always reduced performance)
i would benchmark this

you mention that other titles run ok, so i assume it is title-specific, maybe a driver update would help?
 
I'll remove all my excess CPU flags, most of that was just a "I dunno lets just throw stuff at it and see what sticks." I've had the thought that maybe I'm throwing too much at it, some of it is redundant, and maybe that's a problem, too. It is strange to me still that things like sse3/sse4 flags aren't getting passed, so... best I can think is since I have host passed redundantly maybe it's confusing qemu?

As for the memory speed, what's reported in Task Manager in Shadow.tech is 2400MHz, which is the same as what I have. It's difficult/impossible to tell what their host RAM layout is like though, of course, so I'm thinking I'm going to try cutting my guest back down to 16GB, or up to 24GB, to see if the reason they're using 12GB has to do with some tuning for RAM layout they just figured out on their own...

As for taskset, I'm kind of torn on it myself. I was having a lot of hitching issues before using it. Every game would lock up if I tried to move the mouse to look around while also hitting keyboard inputs to move, and even just on the desktop performance was strange sometimes. Taskset seemed to alleviate those problems for a bit but they creeped back in, yet since setting hugepages on with +pdpe1gb that problem's seemingly gone away for good... so maybe switching back to no taskset will help.

The main reason I wanted to do it was to "force" the VM to be running on the CPU's that the GPU is also attached to since I can't really tell for sure if NUMA is assigning the processes to the same GPU as all of this PCIe passthrough hardware is electrically attached to. Am I just being overly cautious about it, should enabling NUMA be enough to allow for that?
 
So I tried some changes and these are my results:

I removed the -machine args entirely and most of the -cpu args. I moved the +kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off arguments to the cpu: line (while leaving all the other current cpu: line options alone) instead since I needed those to ensure the nVidia GPU would still show up properly. I also removed the taskset and hookscript so it wouldn't be tasksetting any CPUs. When booting up the VM the CPU shoulded up a KVM processor, and another cat /proc/cpuinfo in WSL showed that all of the sse3/4, avx/2, etc. options were missing, too. Shutting down the VM somehow took the whole host offline, too.

I rolled back the settings (I've been cp'ing the vmid.conf for every test config so I can more easily diff them) and removed the taskset functions and the Hyper-V options, so it's as below. The performance was worse than it was before, and the hitching while moving/looking at the same problem I had previously was back. I checked htop was running the games and I definitely see CPU core usage on both CPU0 and CPU1, which ... makes me think NUMA isn't working unless I taskset? That sounds wrong, right?

agent: 1
args: -cpu 'host,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off,ss,rdtscp,pclmulqdq,ssse3,fma,sse4_1,sse4_2,movbe,popcnt,xsave,avx,f16c,rdrand,abm,fsgsbase,tsc_adjust,bmi1,avx2,smep,bmi2,erms,invpcid'
balloon: 0
bios: ovmf
boot: dc
bootdisk: scsi0
cores: 12
cpu: host,hidden=1,flags=+md-clear;+pcid;+spec-ctrl;+ssbd;+pdpe1gb;+hv-tlbflush;+aes
efidisk0: vmstorage:vm-104-disk-1,size=1M
hostpci0: 82:00,pcie=1,x-vga=1
hostpci1: 85:00,pcie=1
hostpci2: 83:00,pcie=1
machine: q35
memory: 16384
name: Gamer
net0: virtio=46:A6:25:E0:4E:79,bridge=vmbr3,firewall=1
numa: 1
ostype: win10
scsihw: virtio-scsi-single
smbios1: uuid=07f2a4aa-e526-44e2-89b9-502f2a97b59b
sockets: 1
tablet: 0
vga: none
vmgenid: ab8c535a-fbb4-412c-8d58-1a78136200d4


edit: and for clarity, when you say "driver update," do you nVidia driver or VirtIO drivers? My nVidia driver is up to date and I usually do clean installs (habit from years of weird issues with other situations), but now that I'm thinking about it I haven't done any VirtIO driver updates... I imagine I shouldn't need to do much of those, though?
 
Last edited:
I moved the +kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off arguments to the cpu: line
this will not work and lead to our stack not recognizing the cpu line at all (and defaulting to kvm) just put
Code:
cpu: host
our stack will add the rest (verify with qm showcmd ID) as long as you set the correct ostype (win10) and set x-vga=on

for setting numa, use our numaX options (cli only)
for example:
Code:
numa0: cpus=0-11,hostnodes=0,memory=16384,policy=bind
just use the correct hostnode

then qemu will use memory and cpus from the correct host numa node
 
OK, it appears to me like I have everything setup now as suggested but I'm still not certain it's working as expected but I do see some good changes. The host CPU flags are definitely showing up right now, so it does appear like all the extra stuff from args was unnecessary. I guess passing "host" as a CPU flag twice caused one "host" entry to be on and the other off for passing some of the flags, which is why I had to then set them manually again... so that's interesting. I was just trying to get those specific ones (like hv_vendor_id) passed in as well since most of what I read for GPU passthrough guides included them in some manner for ensuring things worked, but they don't appear necessary since I'm still seeing everything working as it should without them.

After the changes, looking at htop while it was running I did see cores were peaking on both CPU0 and CPU1 but I can't be sure if one of the other VM's was just spinning up on near by cores. Performance in the Windows desktop seemed fine, though, just the same problem in game of surprisingly low performance.

A couple questions for clarity because I'm not sure if I'm understanding a couple steps:

  • With the addition of memory=16384 in numa1 now, should I still have a separate memory line? I'm guessing one is telling kvm how much memory to use total, and the other in numa1= is saying how much memory to use on that numa node?
  • when you say "set x-vga=on", should I change the x-vga=1 entry in the hostpci0 to x-vga=on, or set that flag elsewhere? I'm under the understanding that, since I'm passing thru a GPU, VGA emulation won't work so leaving it set to none is preferred.
Curent VMID.conf is as follows:

agent: 1
balloon: 0
bios: ovmf
boot: dc
bootdisk: scsi0
cores: 12
cpu: host,hidden=1,flags=+md-clear;+pcid;+spec-ctrl;+ssbd;+pdpe1gb;+hv-tlbflush;+aes
efidisk0: vmstorage:vm-104-disk-1,size=1M
hostpci0: 82:00,pcie=1,x-vga=1
hostpci1: 85:00,pcie=1
hostpci2: 83:00,pcie=1
machine: q35
memory: 16384
name: Gamer
net0: virtio=46:A6:25:E0:4E:79,bridge=vmbr3,firewall=1
numa: 1
numa1: cpus= 5-16,hostnodes=0,memory=16384,policy=bind
ostype: win10
scsihw: virtio-scsi-single
smbios1: uuid=07f2a4aa-e526-44e2-89b9-502f2a97b59b
sockets: 1
tablet: 0
vga: none
vmgenid: ab8c535a-fbb4-412c-8d58-1a78136200d4


Edit: So I tried this while waiting on some stuff while working today and tonight figured I'd jump back into it and try again and it wasn't working. I noticed a stray space between cpus= and 5-16, removed it, got an error saying the CPU index couldn't be over 12 (I guess I misinterpreted the purpose of it), changed 5-16 to 0-11, got some errors about the numa node ID missing, so reread what I had and realized that I had numa1, not numa0, at the start of the option declaration. Corrected entry is below. It does appear like it's properly running all vCPUs on the second socket now, and everything is still functional, but I'm still getting pretty poor performance in game.


agent: 1
balloon: 0
bios: ovmf
boot: dc
bootdisk: scsi0
cores: 12
cpu: host,hidden=1,flags=+md-clear;+pcid;+spec-ctrl;+ssbd;+pdpe1gb;+hv-tlbflush;+aes
efidisk0: vmstorage:vm-104-disk-1,size=1M
hostpci0: 82:00,pcie=1,x-vga=1
hostpci1: 85:00,pcie=1
hostpci2: 83:00,pcie=1
machine: q35
memory: 16384
name: Gamer
net0: virtio=46:A6:25:E0:4E:79,bridge=vmbr3,firewall=1
numa: 1
numa0: cpus=0-11,hostnodes=1,memory=16384,policy=bind
ostype: win10
scsihw: virtio-scsi-single
smbios1: uuid=07f2a4aa-e526-44e2-89b9-502f2a97b59b
sockets: 1
tablet: 0
vga: none
vmgenid: ab8c535a-fbb4-412c-8d58-1a78136200d4
 
Last edited:
With the addition of memory=16384 in numa1 now, should I still have a separate memory line? I'm guessing one is telling kvm how much memory to use total, and the other in numa1= is saying how much memory to use on that numa node?
yes exactly, (and yes you still have to specify how much memory the vm has in total)

when you say "set x-vga=on", should I change the x-vga=1 entry in the hostpci0 to x-vga=on, or set that flag elsewhere? I'm under the understanding that, since I'm passing thru a GPU, VGA emulation won't work so leaving it set to none is preferred.
i meant leave x-vga enabled on the hostpci line (it does not matter if its 1 or 'on', our tooling interpretes it the same), also yes vga emulation could work even with passthrough, but you don't want it here

everything is still functional, but I'm still getting pretty poor performance in game.
mhmm.. from the vm config side there is not much you can do anymore.. a question do you see anything in the host logs that looks suspicious?
i had some weird issues with performance and bluescreens until i set 'kvm_ignore_msrs' in /etc/modprobe.d/ (i believe it was assassins creed origins/odyssey?)

when that does not help, i am pretty much out of ideas how to improve performance...
 
I don't see anything strange or particularly shouting out "I'm a problem!" in any of the logs I've checked thus far. I'll maybe return to playing around with some settings this weekend, maybe dive further into some logs and do some "just run Fallout 76, get to a low FPS spot, and kill the VM to get more obvious logs of it failing" tests. That said, out of morbid curiosity I re-enabled the cpu_taskset settings on top of my most recent configuration and it's actually a touch better than it was before I posted this, specifically considering I've also finally received my 3440x1440 resolution ultrawide and I'm still holding just below 25-30fps in Fallout 76 at worse. I ran a quick CoD: Warzone match and it's still holding really well, too.

I haven't checked against Shadow with the new resolution just yet so I'll try that soon, too, just for the sake of comparison. Maybe the total collection of changes I'm at now has brought me on par with them now and I just haven't noticed. Beyond that, i agree, I don't think there's a lot of else to do for improving performance. I dove head first into a fairly expensive experiment knowing I may just have to forgo the plan and return to a physical gaming PC, so while I'm disappointed I'm not opposed to declaring the experiment not quite ready for prime time.
 
Having the same problem but with the game squad. Dell poweredge T620 with 64GB DDR3 Ram. 70FPS bare metal. 20 FPS in Virtual windows 10. Really destroyed about it. All my gaming in this VM has been such a disappointment. If anyone has any tips I'd love to here 'em. Please help. If I run a bunch of benchmarks on bare metal and in the VM they look similar except for Single Threaded performance which takes an 10% hit. probably turboboost difference. But actual gaming is crushed and I don't think it's the GPU, because the game engine stats point to CPU bottle neck, and gpu usage is only 50% in the VM.

Ta.
Code:
agent: 1
balloon: 0
bios: ovmf
boot: dc
bootdisk: virtio2
cores: 10
cpu: host,flags=+md-clear;+pcid;+spec-ctrl;+ssbd;+pdpe1gb;+aes
cpuunits: 4096
efidisk0: Thin:vm-100-disk-0,size=4M
hostpci0: 02:00,pcie=1,romfile=RX580.rom
hostpci1: 43:00
ide0: none,media=cdrom
machine: q35
memory: 20480
name: Tesseract
net0: virtio=2A:34:7F:06:07:B4,bridge=vmbr0
numa: 1
numa0: cpus=0-9,hostnodes=0,memory=10240,policy=bind
numa1: cpus=10-19,hostnodes=1,memory=10240,policy=bind
onboot: 1
ostype: win10
parent: Driver_update
scsihw: virtio-scsi-pci
smbios1: uuid=****************
sockets: 2
startup: order=2,up=10,down=60
usb0: host=1-1.4
usb1: host=1-1.3
usb2: host=2-1.5
usb3: host=2-1.4
usb4: host=2-1.8
vcpus: 20
vga: none
virtio1: Thin:vm-100-disk-1,cache=writethrough,size=200G
virtio2: Thin:vm-100-disk-2,cache=writethrough,size=150G
virtio3: Thin:vm-100-disk-3,backup=0,cache=writethrough,replicate=0,size=200G
vmgenid: dc22f5f8-53ca-497b-914c-f52fe6214ad6


Code:
Per-node process memory usage (in MBs)
PID              Node 0 Node 1 Total
---------------  ------ ------ -----
2013 (kvm)        10262  10343 20605
2247 (kvm)         4015    362  4378
1790 (kvm)         4121      6  4127
1721 (kvm)         2051     38  2089
2464 (kvm)         1252      5  1257
3330 (kvm)          938     28   966
1947 (kvm)          177     99   276
2770 (kvm)          240     28   268
1913 (kvm)          166     77   243
---------------  ------ ------ -----
Total             23222  10988 34209
 
Sadly, my solution was to build a dedicated gaming PC again. I put my 980Ti back in my server in hopes of trying to keep at it but I haven't had the motivation. Frankly, my real goal is to wait a while. I'm hoping that kernel updates and other things will make up some if not all of the difference.

The best I can figure is that during certain tasks the VM is able to use the vCPUs to nearly if not their full potential, while in others the virtualized Windows install is severely limited. I'm of the opinion it has more to do with how Windows recognizes itself as virtualized or not, namely in how it handles it CPU scheduling. Due to a lot of games that have anti-cheats blocking or banning you, as well as the nVidia driver not loading properly for Geforce cards in a VM, you are forced to lie to Windows about the state of your VM. I bet if you were able to let Windows know it was a VM, such as using a Quadro card instead of Geforce and avoiding games that may ban you, performance would be better. I'm thinking that's the key difference in that Shadow.tech instance I was using to compare to, Windows is probably doing it's CPU scheduling different in a virtualized state that I/we can't quite allow it to when we have to tell Windows to act like it's not a VM.
 
I ended up solving my problem last night -- after days of messing with numa configurations and cpu pinning, someone mentioned to me that HP servers default to a "power conservation" mode that throttles PCI devices, which has to be changed in ILO. And now I feel like an idiot.

Check out the difference it made:

THANK YOU! I looked so much for what could cause this issue... although I am using a Dell R720 now, the power and thermal configuration in the bios messed the PCIe lanes enough... I even changed a few processors and then the server a couple of times but did not run into this post. I registered an account just to thank you!
 
  • Like
Reactions: James Crook
Hi there,

I have HP 380 GEN 10, i have applied the same but still getting 30-40 fps on games.

I am using a VGPU PROFILE WITH 60FPS.

My GPU usage is 45% and 8GB RTX A5000 (VGPU Workstation)

My CPU (8 vcpus amd epyc 2.5ghz) is at 60% usage.

Windows 10 PRO
 
Oh my god. (also registered, just so I could thank you)

Had the same problem on my Dell R720. Went INSANE for 3 days before I found this post.

Dear server-user reader. If you are also going mental, go into your servers bios, and find the performance profile/mode/setting and enable it.
Reboot, witness the sweet sound of gaming blazing from your fans going at 100%.
For fan control on R720(and probably other dell poweredges) check out https://github.com/nmaggioni/r710-fan-controller it is AMAZING.

Now back to playing Stray on our server with my wife. :)

PS:

@forresthopkinsa THANK YOU THANK YOU THANK YOU​

 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!