CPU/core/thread terminology, and best allocating resources in a limited setting

Tsirist

New Member
Mar 21, 2019
4
0
1
29
Hi all, I have two gaming VMs set up and running on a Ryzen 2700X. I'm quite impressed with the performance so far, although it's nowhere near what it manages natively. I think this may be in part due to how I have the VMs configured.

The 2700X is an 8 core/16 thread CPU. I imagine the best way for our VMs to be laid out would be to have 1 core for Proxmox and 7 shared between the two VMs. I'm not sure what this looks like in the configuration though. For a while I had each VM set up with 14 vCPUs, since the Proxmox dashboard shows the node as having 16 cores. I assumed that allocating 14 vCPUs was to "speak the same language" as Proxmox, by referring to the thread count.

However, it seems that we get better performance by allocating 7 vCPUs to each VM, so I'm not sure if I'm missing part of the configuration options here.

Is there a way to pin cores and their hyperthreading pairs to particular machines in the Proxmox configuration files? I looked at the NUMA options but they don't seem applicable here, since we only have 1 node with these 8 cores and 16 threads.

What is the most reasonable setup here? Is allocating 7 cores to each VM equivalent to allocating a core to Proxmox and sharing the remaining 7 between the VMs? Does it dedicate 1 core each to the VMs with the remaining 6 shared, and the hypervisor living somewhere in there? A mix of these?
 
Hey!

You've said at the beginning that you have 2 gaming VMs atop of Proxmox. Keep in mind that the number of cores and threads is not as relevant for gaming as is the clock speed of each core. Moreover, gaming does require GPU capabilities and Proxmox does not divide the CPU between VMs.

In terms of CPU, you can also over-provision the cores. Having 16 vCPUs does not mean that you cannot allocate 20 vCPUs for VMs. In your case i'd go with NUMA and 2 sockets / 3-4 cores per socket. This should be enough, given that the VMs may not require 100% CPU processing power all the time.

As for GPU - i suggest you look into this: https://pve.proxmox.com/wiki/Pci_passthrough#GPU_PASSTHROUGH
Thing is - you will likely need either one graphics card per VM, or a VERY expensive enterprise one.

Hope this helps!
 
Hey!

You've said at the beginning that you have 2 gaming VMs atop of Proxmox. Keep in mind that the number of cores and threads is not as relevant for gaming as is the clock speed of each core. Moreover, gaming does require GPU capabilities and Proxmox does not divide the CPU between VMs.

In terms of CPU, you can also over-provision the cores. Having 16 vCPUs does not mean that you cannot allocate 20 vCPUs for VMs. In your case i'd go with NUMA and 2 sockets / 3-4 cores per socket. This should be enough, given that the VMs may not require 100% CPU processing power all the time.

As for GPU - i suggest you look into this: https://pve.proxmox.com/wiki/Pci_passthrough#GPU_PASSTHROUGH
Thing is - you will likely need either one graphics card per VM, or a VERY expensive enterprise one.

Hope this helps!

Thanks for the response! I should have included more details with the initial post, as we have addressed some of these issues. In particular, we have two identical GPUs that seem to be passed through just fine.I think we have over-provisioned at least once, with worse results. I don't think our system allows us to control pinning with the NUMA configuration in Proxmox (see the NUMA details of our system at the end of this post). Looking at the docs (here:https://pve.proxmox.com/pve-docs/qm.1.html), we see the following.
numa[n]: cpus=<id[-id];...> [,hostnodes=<id[-id];...>] [,memory=<number>] [,policy=<preferred|bind|interleave>]

NUMA topology.

cpus=<id[-id];...>
CPUs accessing this NUMA node.

hostnodes=<id[-id];...>
Host NUMA nodes to use.

memory=<number>
Amount of memory this NUMA node provides.

policy=<bind | interleave | preferred>
NUMA allocation policy.
My interpretation of this is that we would need multiple real NUMA nodes in order to pin CPUs that way. As it is, we only have one NUMA node. I could set each VM's CPUs to use that NUMA node, but as far as I can tell there's no way to prevent them from using the same physical cores.

I'll copy some details from a draft I was writing for the VFIO subreddit if you'd like some more configuration information, and some more details regarding our experience/efforts.
On the question of benchmarking, I've had some interesting results. While testing things and setting everything up, I took to using the Unigine benchmarks to test performance. On this front, things seem rather good. Oddly though, our VMs perform differently even when running solo, despite having virtually identical setups. We'll call them the primary and secondary VMs. The primary on its own matches up with native performance (scoring ~7000 on one of these benchmarks). The secondary scores ~6000. Naturally they use different graphics cards, but they're the same model so I'm curious as to where this difference might be coming from.

Running the Superposition 4K benchmark simultaneously on each VM, the scores actually don't change that much with our current configuration, which seems like a big win. This sort of thing has long led me to believe that we were doing things correctly. Unfortunately, when playing real games performance begins to suffer, and I'm wondering if the hit we're taking is to be expected or if there are options we haven't explored yet.

Since we enjoy playing it, I've done some evaluation of performance of DotA 2. Each time I would go into demo mode as Venomancer and eyeball the average framerate under the different situations below.

Results: https://pastebin.com/raw/WuycAPtn

Currently we have our VMs set up with direct access to our disks, no memory ballooning, and with 7 vCPUs allocated to each. I've tried making all 16 threads available to the VMs, as well as splitting the set of cores between them, but our current setup seems most consistent in all regards. With 16 allocated to each VM, performance was predictably "chunky" at times. With 14, performance was much better, but 7 seems to be an improvement even over that.

I've tried things with and without MSI support. It doesn't make much of a difference, and the number of rescheduled interrupts per second has never been very high (thousands if I remember correctly, as opposed to the millions that I understand to be problematic). Right now they might be disabled on account of Windows losing track as we reconfigure things.

Are the numbers seen with DotA above to be expected with our setup? I know we can expect *some* performance hit, but I'm not sure if we're seeing one that's large enough to indicate misconfiguration on my part. Our hardware is listed below; I fear that our poor 8c/16t processor just doesn't cut it. We're not doing CPU overclocking, so it runs at 3.7GHz; memory runs at ~2900MHz.

Details of hardware, `lspci`, `lscpu`, `numactl`, and VM configurations: https://pastebin.com/ZgpxutX9
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!