How to pool some cpu cores for some vms

skraw · May 17, 2020

Lets assume we have 32 cpu cores. How can I split some of them away in a pool (lets say 8 cores) and use them from some (not all) running vms?
This is trivially possible with libvirt/qemu, but I have not found a way in proxmox.
Any ideas?
--
Regards

PS: AFAIK there is no option for cpu pinning either, right?

LnxBil · May 19, 2020

You have to do it manually via taskset.

LnxBil · May 19, 2020

The question is always: why do you want something like this? Normally, the memory is also pinned directly to some internal cpu stuff like socket or numa group. If you pin your VM to use e.g. core 1-8 and all your memory is on physically on core 9-16, you're always running much slower than you would if you would let the OS take care of it.

skraw · May 19, 2020

The simple reason is overcommit of the cpus. There is simply a group of vms that are not so important. So I want this group to share a small amount of cores. Whereas some vms are very important, and I want these to have some cores exclusively.

LnxBil · May 21, 2020

skraw said:
The simple reason is overcommit of the cpus. There is simply a group of vms that are not so important. So I want this group to share a small amount of cores. Whereas some vms are very important, and I want these to have some cores exclusively.

I don't think that you will have a faster setup with this, in fact, it'll be slower depending on the usage of the cores and the memory path. Therefore it is not integrated into PVE, because it does not make things faster, only slower.

skraw · May 21, 2020

In fact the setup is a lot faster this way. I can prove because I did the same setup both on proxmox and on libvirt/qemu (where cpu pinning is possible). It shows that on libvirt/qemu the load on the host is about 2 lower than on proxmox. The network latencies show the same thing. proxmox is slower because of the not optimisable cpu usage. You loose another point, too. If you can pin the cpus you can use both core and ht for the same guest. Whereas in the current proxmox setup all cores seem to be equally treated which looses a lot if a core is in fact only a ht.

BobhWasatch · May 21, 2020

The setup is a lot faster for the special vm's and slower for the others. Overall system efficiency will be lower. Anyway, it seems like you could get a similar effect by using cpulimit and cpuunits parameters on the less-important VM's.

Riski · May 22, 2020

In hyper-threaded systems overall performance and efficiency can increase with proper core pinning. There's a huge difference in performace between (virtual) 4c/8t and 8c/8t cpu configs, but because of turbo boosting techniques and hyper-threading inefficiencies, power consumption is still about the same in both cases.
I use my servers spare time to re-encode video archives to x265 format. In that 8c/16t system I've had a container with 8 cores and in it 8 independent single-threaded encoders running at lowest possible priorities. Couple of times I've noticed that some of those encoders were running ~30% slower than usually and the reason was that Proxmox had pinned two virtual cores into one physical core (according to "pct cpusets").
Best workaround I've found is to start container with unlimited cores, start encoder processes and then set core count to 8 on the fly. This way Proxmox re-pins cores and seeing that there are heavy load on them it hands over cores in actually different physical cores instead of hyperthreaded ones.
There's also a small performance boost running 8 encoders in container with 8 cores instead of unlimited cores and cpulimit=8. When all cores are fully utilized, it seems that kernel does not aggressively balance load between them by constantly moving processes from core to core, which would be good in regards of context switching and cache stuff, I'd think. Althought that possible increase in performance is masked with cpulimit inaccuraty(?); encoders are not completely single threaded as they use about 104% cpu each and container summary screen shows about 102% cpu usage, so it uses a bit more resources than it's been given.
That load balancing stuff was probably also the reason why two encoders were running slower than the others in the same container, kernel incorrectly assumed that every core in container was equal and seeing all of them having a full load, it preferred keeping each process in their own cores.

As such, there are some cases where setting core count can be wiser than setting cpulimit, but when not using unlimited cores, containers performance depends on luck, has it been given cores in the same physical core or not. Even if core limiting wouldn't be the most optimal way in most cases, I'd still say core pinning shouldn't be based on luck or on workarounds like I use, it should be easily configurable.

skraw · May 22, 2020

I can confirm that the lack of context switching across cores (which is in fact happening when pinning) is a big gain in performance. This was the first reason why I did the pinning in libvirt/qemu. If you pin a multi-cpu vm to cores and their hts you also gain performance with the caches.
Overall it does make a lot sense. Configuring cpu pinning in groups would be very handy, as you then can move vms from cpu pool to cpu pool.

Search

Search

How to pool some cpu cores for some vms

skraw

Well-Known Member

LnxBil

Distinguished Member

LnxBil

Distinguished Member

skraw

Well-Known Member

LnxBil

Distinguished Member

skraw

Well-Known Member

BobhWasatch

Famous Member

Riski

New Member

skraw

Well-Known Member