install portainer.I want a simple solution, not all of the "enchilada" that @alexskysilk notes.
install portainer.I want a simple solution, not all of the "enchilada" that @alexskysilk notes.
Can you elaborate? I thought QEMU and therefore PVE is able to use NUMA.I mean we still don't even have Numa Support, which makes Proxmox mostly useless on Dual-Socket or Newer Single-Socket AMD Plattforms like Genoa compared to other Hypervisors.
This is exactly what we want:Can you elaborate? I thought QEMU and therefore PVE is able to use NUMA.
If you enable this feature, your system will try to arrange the resources such that a VM does have all its vCPUs on the same physical socket
No, I assume that they run on the numa node from which the memory is used to reduce the inter-numa-node-communication.So you assume, according to the Wiki, that all vCPU's of one VM should run all on the same Chiplet (or numa-node).
Where is your proof? You just provide anecdotal evidence, which is totally useless to interpret.The issue is, that this doesnt work at all. Not even a bit.
$ numastat
node0 node1
numa_hit 180134743078 129405155945
numa_miss 2028661746 461859704
numa_foreign 461859704 2028661746
In the context of NUMA (Non-Uniform Memory Access) configurations, it's crucial to understand that the significance extends beyond just memory; the L3 cache plays a pivotal role. On Genoa platforms, L3 caches are distributed across NUMA nodes, with each cache supporting eight threads. This distribution is similar to how memory is handled, but with a key distinction:Maybe we should split off to a new thread, this is much more interesting than than another homelabber chiming in to want a new Docker GUI.
No, I assume that they run on the numa node from which the memory is used to reduce the inter-numa-node-communication.
Where is your proof? You just provide anecdotal evidence, which is totally useless to interpret.
Trying to understand what you want to say and inspecting my dual-socket intel machines, my numastat shows, that I have this:
Code:$ numastat node0 node1 numa_hit 180134743078 129405155945 numa_miss 2028661746 461859704 numa_foreign 461859704 2028661746
Which shows that node0 has a miss ratio of 1,1% and node1 0,35% which are both far from "not even a bit". This is the worst numstat I found, others have even lower misses.
I don't have any AMD machine at my hand right now, so how does it look at your machine? Have you configured NUMA for EACH VM?
Thank you very much for the detailed explanation. I wasn't aware of the cache situation, which is completly feasible.There are your proofs and whatever you want.
You will not solve the memory NUMA allocation, yet the cache allocation. I just tested it with mbw benchmark and on the hypervisor, the QEMU process got memory from both (I have two) nodes. CPU pinning will give better performance, yet as you already stated, not that much on intel. The difference varies on the ratio of the memory distribution over the numa nodes, yet allocating all the wrong cpus will significantly worsen the problem also up to 2.5x slower, yet this is worse than the default with cycling around so it may just be a strong cornercase.You can fix this yourself if you use cpu-pinning and do it yourself for each VM.
But with a lot of VM's and additionaly if you move them between hosts, its simply Impossible to use cpu-pinning.
That is already available in the configuration file, yet not via the GUI and not automatically. I played around with it in this thread. It seems to work and I am really interested in seeing if it would be a solution for you and if it will be faster (and easier to setup than just running taskset).You will not solve the memory NUMA allocation
We use essential cookies to make this site work, and optional cookies to enhance your experience.