CPU Sockets / NUMA

mhert

Well-Known Member
Jul 5, 2017
87
1
48
44
Hello Guys,

can somebody clear up things a little about the best setting of core/sockets with numa enabled?

The manuals says you should enable numa and set the number of sockets equal to the "real" sockets on the mainboard.

That means to me (vm with 4 cores e.q.) that ich have to set the number of cores to 2 and the number of sockets too. So instead of
4 cores on one socket it should be set to 2 cores on 2 sockets, right?

Or can this slow down the vm? I'm asking this because in the forum you can find posts where people advise the opposite (all cores on one
socket per vm) to solve performance issues....

Thanks in advance.
 
That means to me (vm with 4 cores e.q.) that ich have to set the number of cores to 2 and the number of sockets too. So instead of
4 cores on one socket it should be set to 2 cores on 2 sockets, right?
Yes that sounds about right, if your host system has two sockets (and two NUMA nodes) and you want your VM to have a total of four cores, you'll need to enable NUMA for the VM and configure two sockets as well as two cores (as this setting is per socket).
 
Just to clarify:

What is best practice (performance) for a server with two sockets and a vm with four vCPU's (Numa enabled)?

1 Socket / 4 vCPU's OR 2 Sockets / 2 vCPU's

And why?
 
Last edited:
1 Socket / 4 vCPU's OR 2 Sockets / 2 vCPU's
As I said before, if you want your VM to have 4 total Cores (not vCPUs, that's again different) and the host has 2 NUMA nodes you should configure the VM as such:
  • In the advanced CPU settings enable NUMA.
  • Set the number of sockets to match the host system, so: 2
  • Set the number of cores to: (total_cores/sockets) = (4/2) = 2
The system will then have sockets*cores vCPUs = 4. vCPUs define how many threads QEMU should spawn and are by default the number of sockets times the number of cores.

So to answer your question: 2 sockets with each two cores. Also enable NUMA. This will make it so the NUMA layout of the guest matches the real NUMA layout of the host. This should help the guest OS schedule its tasks and processes more efficiently and, thus, improve performance.
 
... and should have a better memory allocation, shouldn't it? (allocate memory in each numa domain)
Yes, sorry if this was somewhat misleadingly phrased. By “schedule its tasks and process more efficiently” I basically meant that each process should then be able to use the same NUMA domain as the resources it needs (e.g., memory).
 
Yes, sorry if this was somewhat misleadingly phrased. By “schedule its tasks and process more efficiently” I basically meant that each process should then be able to use the same NUMA domain as the resources it needs (e.g., memory).
No problem, I just wanted to have everything on record.
 
I'm a bit confused about this subject.

I understand that if one node has 2 physical sockets with 2 cores each, and you set up a VM with 4 vCPU, then it is recommended to use NUMA and configure the VM as the node, with 2 sockets and 2 cores. This way, the VM is aware of the physical specs of the node, and it will use the cores and RAM on that node properly.

If the node has 2 x Intel(R) Xeon(R) Gold 5120, with 14 cores and 28 threads each, and you want to maximize performance on the previous VM (4vCPU). It wouldn't be better to check the NUMA flag and then configure 1 socket and 4 cores on that VM? That way, all the vCPU on the VM will be on the same physical CPU, and they will share the cache of the processor.

Best regards,

Manuel Martínez
 
Last edited:
If the node has 2 x Intel(R) Xeon(R) Gold 5120, with 14 cores and 28 threads each, and you want to maximize performance on the previous VM (4vCPU). It wouldn't be better to check the NUMA flag and then configure 1 socket and 4 cores on that VM? That way, all the vCPU on the VM will be on the same physical CPU, and they will share the cache of the processor.
AFAIK, the NUMA check will result in scheduling all corresponding vCPUs threads on the PVE host on the same numa node and therefore pinning it to the CPU - the same with the memory. If you don't check NUMA, the vCPU threads can be schedudled on ALL available CPUs even on other NUMA nodes and therefore being slower (cross NUMA node access is slower).

In your setup: Using two sockets with 2 vCPUs each will be faster, because you have (at least theoretically) double performance due to doubled cache (sum of each physical CPU). Yet all these performance considerations are ALWAYS application specific.

I don't think that CPU cache will be such a big impact in highly virtualized environments (yet as always ... it depends and your milage may vary), the main performance boost comes from local RAM vs. remote RAM from the CPU perspective.

In the end, you may want to benchmark for yourself if you have an impact or not. Years ago, I looked into hugepage support for KVM, which was faster, but very "undynamic" for VM usage, so I went for "better virtualization experience" instead of 10% more performance.
 
  • Like
Reactions: ucholak

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!