In hyper-threaded systems overall performance and efficiency can increase with proper core pinning. There's a huge difference in performace between (virtual) 4c/8t and 8c/8t cpu configs, but because of turbo boosting techniques and hyper-threading inefficiencies, power consumption is still about the same in both cases.
I use my servers spare time to re-encode video archives to x265 format. In that 8c/16t system I've had a container with 8 cores and in it 8 independent single-threaded encoders running at lowest possible priorities. Couple of times I've noticed that some of those encoders were running ~30% slower than usually and the reason was that Proxmox had pinned two virtual cores into one physical core (according to "pct cpusets").
Best workaround I've found is to start container with unlimited cores, start encoder processes and then set core count to 8 on the fly. This way Proxmox re-pins cores and seeing that there are heavy load on them it hands over cores in actually different physical cores instead of hyperthreaded ones.
There's also a small performance boost running 8 encoders in container with 8 cores instead of unlimited cores and cpulimit=8. When all cores are fully utilized, it seems that kernel does not aggressively balance load between them by constantly moving processes from core to core, which would be good in regards of context switching and cache stuff, I'd think. Althought that possible increase in performance is masked with cpulimit inaccuraty(?); encoders are not completely single threaded as they use about 104% cpu each and container summary screen shows about 102% cpu usage, so it uses a bit more resources than it's been given.
That load balancing stuff was probably also the reason why two encoders were running slower than the others in the same container, kernel incorrectly assumed that every core in container was equal and seeing all of them having a full load, it preferred keeping each process in their own cores.
As such, there are some cases where setting core count can be wiser than setting cpulimit, but when not using unlimited cores, containers performance depends on luck, has it been given cores in the same physical core or not. Even if core limiting wouldn't be the most optimal way in most cases, I'd still say core pinning shouldn't be based on luck or on workarounds like I use, it should be easily configurable.