NUMA in Proxmox

Babetecno

New Member
Apr 24, 2020
2
0
1
50
Hi everyone,

I have a host with 2 sockets (8 pcores/each, 16 cores/each with HT) with Proxmox 5.4.13 installed. Each Processor has 32 GB of RAM memory. I have "NUMA" enabled in all VMs.

When I run numastat, the results are not the expected ones. I get too many numa_miss events:

Code:
root@pve1:~# numastat
                           node0           node1
numa_hit                83133278        66176090
numa_miss                6891218        19498691
numa_foreign            19498691         6891218
interleave_hit             29932           29641
local_node              83132082        66142957
other_node               6892414        19531824

I have four VM:

- 1 Windows Server with 8 GiB (8192 MiB) and 2 processors (1 socket, 2 cores) (pid 2235)
- 1 Windows Server with 16 GiB (16384 MiB) and 4 processors (2 socket, 2 cores) (pid 2562)
- 1 Windows Server with 32 GiB (32768 MiB) and 8 processors (2 sockets, 4 cores) (pid 4819)
- 1 Windows 10 with 8 GiB (8192 MiB) and 2 processors (1 socket, 2 cores) (pid 21910)

When running numastat to check the memory usage per VM, I get unbalanced results for all the VMs. Single socket VMs do not use memory only from a single node and dual socket VMs do not divide memory across nodes equally. For example, for the VM with 32GiB (pid 4819), the results should be (aprox.) 16000 (node 0) and 16000 (node 1), but they are 12000 (node 0) and 20000 (node 1).

Code:
root@pve1:~# numastat -c kvm

Per-node process memory usage (in MBs)
PID              Node 0 Node 1 Total
---------------  ------ ------ -----
2235 (kvm)         1283   6926  8210
2238 (kvm-nx-lpa      0      0     0
2323 (kvm-pit/22      0      0     0
2562 (kvm)        10984   5374 16357
2564 (kvm-nx-lpa      0      0     0
2611 (kvm-pit/25      0      0     0
4819 (kvm)        12377  20433 32810
4821 (kvm-nx-lpa      0      0     0
4956 (kvm-pit/48      0      0     0
21910 (kvm)        5909   2314  8222
21912 (kvm-nx-lp      0      0     0
21954 (kvm-pit/2      0      0     0
---------------  ------ ------ -----
Total             30553  35046 65600
Questions:

1) Is there anything else to configure?

a) Maybe the memory in the following VM options?

numa[n]: cpus=<id[-id];...> [,hostnodes=<id[-id];...>] [,memory=<number>] [,policy=<preferred|bind|interleave>]

I guess the policy would be preferred

b) tasksel? It seems it is only for CPU, not for memory

2) If you force Proxmox to use a particular numa node for a VM, you lose flexibility in terms of schedulling, right? Any negative implication?

3) Can I use numad in Proxmox? http://www.admin-magazine.com/Archive/2014/20/Best-practices-for-KVM-on-NUMA-servers


Thanks in advance.
 

wolfgang

Proxmox Staff Member
Staff member
Oct 1, 2014
6,011
400
103
Hi,
For example, for the VM with 32GiB (pid 4819), the results should be (aprox.) 16000 (node 0) and 16000 (node 1), but they are 12000 (node 0) and 20000 (node 1).
NUMA does not necessarily have to be symmetrical.
NUMA say only Non-Uniform Memory access.

I guess the policy would be preferred
Yes, you can use this but you have to use bind instead preferred.

If you force Proxmox to use a particular numa node for a VM, you lose flexibility in terms of schedulling, right? Any negative implication?
You could overbook a node and the other node could be empty.
So there is no load balancing anymore.
 

Babetecno

New Member
Apr 24, 2020
2
0
1
50
Hi,

thanks for your answer.

For testing purposes, I have created a Linux VM with 2 sockets and 3 cores (NUMA enabled). Inside of the VM, I have checked the vNUMA nodes and their associated memory with "numactl --hardware":

Code:
numactl --hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2
node 0 size: 3945 MB
node 0 free: 3742 MB
node 1 cpus: 3 4 5
node 1 size: 4030 MB
node 1 free: 3890 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10
So, the memory assigned to the VM is almost totally symmetric regarding to the vNUMA nodes. I assume for Windows VMs the memory is ballanced across the vNUMA nodes as well.

When stressing the memory inside of the VM, on the Proxmox host the memory assigned to the VM is unbalanced regarding both NUMA nodes:

Code:
PID              Node 0 Node 1 Total
---------------  ------ ------ -----
56340 (kvm)        7175   1020  8195
---------------  ------ ------ -----

The VM is NUMA aware and "sees" every vNUMA node with 4096M. The NUMA statistics inside the VM are the following:

Code:
numastat
                           node0           node1
numa_hit                25145694        18821909
numa_miss                  66666           28516
numa_foreign               28516           66666
interleave_hit             16121           16093
local_node              25145654        18805626
other_node                 66706           44799
Assesing the "NUMA awareness" behaviour (very few numa_miss events)

But physically, the assignation of the VM memory for each Proxmox NUMA node is 7000 (Node0) and 1000 (Node1) repectively. When a vCPU corresponding to a CPU residing on Proxmox NUMA Node1 asks for more than 1GB (and less than 4GB), inside of the VM there is no memory interleaving, but in the Proxmox host, the memory must be retrieved from Node0, leading to a loss of performance. Is it right? Would it explain the high number of numa_miss events in Proxmox?

Is numa option in the VM the unique solution? As far as I have understood, a tool like numad solves this. I personally would prefer not to force the Proxmox NUMA node in which the VM must reside on.


Thanks again.
 

wolfgang

Proxmox Staff Member
Staff member
Oct 1, 2014
6,011
400
103
numad is not part of Proxmox VE so you can try it but there are no guarantees for a proper working setup.
I never used it and a so I can' t tell if numad works.

Numa is always a compromise that is why I would try to stay on UMA machines.
It is not possible to have no numa miss and it depends on the used software too.

If a Programm(single process) allocates 7GB in one piece it is not possible to get no numa misses or the scheduler will unbalance the Memory.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!