Proxmox slower in CPU/Memory than VMware

JesperAP · Jun 21, 2024

Hi,

I've done some performance tests with the tool from PassMark. What I notice is that my CPU and memory are slower than in VMware even though Proxmox has a better CPU than VMware.

From my understanding Proxmox should be better performing?

Is there something I can finetune? Am I doing something wrong?

Here are some screenshots:

LnxBil · Jun 21, 2024

What PVE version is this?
Why do you add flags and have cpu type host? Does not make sense for me.
Is the CPU socket layout identical with your host? If not, change it.
If you use large pages in your VM, are you also using large pages on your PVE host?
Why do you redact generated and hypervisor-internal mac addresses?

JesperAP · Jun 21, 2024

LnxBil said:
What PVE version is this?
Why do you add flags and have cpu type host? Does not make sense for me.
Is the CPU socket layout identical with your host? If not, change it.
If you use large pages in your VM, are you also using large pages on your PVE host?
Why do you redact generated and hypervisor-internal mac addresses?

Hi,

Version 8.2.2
These are the flags we added, is this not necessary?

What cpu type do I need to choose? I thought host was the best option?

What do you mean with? "Is the CPU socket layout identical with your host? If not, change it."
The VM has 4 cores 1 socket and the proxmox host has 36 cores 2 sockets.

I also don't know what you mean with large pages?

About the mac addres I have no clue why I did that

billy999 · Jun 21, 2024

Did you thoroughly research your CPU functions before toggling those flags? Does it have IBRS? Is the required microcode installed? Explicitly turning spec-ctrl on and PCID (presumably default) off is gonna destroy performance for loads profiting the most from ILP

JesperAP · Jun 21, 2024

billy999 said:
Did you thoroughly research your CPU functions before toggling those flags? Does it have IBRS? Is the required microcode installed? Explicitly turning spec-ctrl on and PCID (presumably default) off is gonna destroy performance for loads profiting the most from ILP

Even with all of the flags to default, the VM is as slow as with the flags

_gabriel · Jun 21, 2024

try mitigations=off as kernel pve options.
what vmware version is ?

Ramalama · Jun 21, 2024

Numa doesn't work on Proxmox.
Or lets say there is no logic, like assigning 4 Cores of the VM to the same Socket, to benefit from faster Ram.

As long as there is no numa Support, ESXI will always win on dual Socket Systems or Single Socket AMD Bergamo/Genoa Servers.
ESXI handles numa pretty well, there are even options to Assign CPU Cores based on L3 Cache/Hyperthreading.

On a Single Socket Server, as long as there is no need for Numa, Proxmox will always outperform ESXI.

PS: The option in VM-Settings to enable Numa in Proxmox is only beneficial for some rare Scenarios, 99% of the usecase the option is senseless. The option just passes through Numa Capabilities to the VM, So the VM could handle numa itself. But the issue is that CPU-Tasks on the Proxmox host itself "rotate", its pretty senseless to say it shortly.

You can PIN your VM-CPU's to the Physical Cores of your Proxmox host yourself, thats the only way at the moment.
Simply Pin all your 4 VM-Cores to the same Socket on the Host and you will outperform esxi.
But you have to do it for every VM and that can get very fast very confusing, since balancing the cores yourself with 20 VM's is not that easy.
I mean to balance out VM's between both sockets.

You can get even more performance on AMD systems at least (Genoa/milan/bergamo), if you assign the VM-Cores to the CPU's which have all access to the same L3-Cache. On Intel systems it has no benefits, since they are still mostly monotholic.
Monotholic CPUs are great, because you dont have to f... with Numa (compared to AMD) and L3-Cache, which makes everything a lot easier on Proxmox Side, you basically have only to Pin the CPUs of a VM to the same socket on Intel and thats it.

Cheers

JesperAP · Jun 21, 2024

Ramalama said:
Numa doesn't work on Proxmox.
Or lets say there is no logic, like assigning 4 Cores of the VM to the same Socket, to benefit from faster Ram.

As long as there is no numa Support, ESXI will always win on dual Socket Systems or Single Socket AMD Bergamo/Genoa Servers.
ESXI handles numa pretty well, there are even options to Assign CPU Cores based on L3 Cache/Hyperthreading.

On a Single Socket Server, as long as there is no need for Numa, Proxmox will always outperform ESXI.

PS: The option in VM-Settings to enable Numa in Proxmox is only beneficial for some rare Scenarios, 99% of the usecase the option is senseless. The option just passes through Numa Capabilities to the VM, So the VM could handle numa itself. But the issue is that CPU-Tasks on the Proxmox host itself "rotate", its pretty senseless to say it shortly.

You can PIN your VM-CPU's to the Physical Cores of your Proxmox host yourself, thats the only way at the moment.
Simply Pin all your 4 VM-Cores to the same Socket on the Host and you will outperform esxi.
But you have to do it for every VM and that can get very fast very confusing, since balancing the cores yourself with 20 VM's is not that easy.
I mean to balance out VM's between both sockets.

You can get even more performance on AMD systems at least (Genoa/milan/bergamo), if you assign the VM-Cores to the CPU's which have all access to the same L3-Cache. On Intel systems it has no benefits, since they are still mostly monotholic.
Monotholic CPUs are great, because you dont have to f... with Numa (compared to AMD) and L3-Cache, which makes everything a lot easier on Proxmox Side, you basically have only to Pin the CPUs of a VM to the same socket on Intel and thats it.

Cheers

So you're saying my server will not perform better in proxmox because it has 2 sockets and VMware handles this better?

Sorry, just saw your edit. How would I pin VM cpu's to physical cores?

Ramalama · Jun 21, 2024

JesperAP said:
So you're saying my server will not perform better in proxmox because it has 2 sockets and VMware handles this better?

Sorry, just saw your edit. How would I pin VM cpu's to physical cores?

https://bs.fri.stoss-medica.int:8006/pve-docs/chapter-qm.html#qm_cpu
Read the "affinity" Section.

In VM-Settings under CPU, tick "Advanced" and set Affinity.

To find out which Processors are assigned to which Socket on the Proxmox host, install apt install numactl and enter numactl --hardware
Then you see something like node 0 cpus: 0 1 2 3..... and node 1 cpus: ....
Thats your Sockets.

You have to restart your VM, if you don't some memory would be still on the old Socket.
Or better, shutdown and start it again, not just a simple restart.

PS: And the crap you did with enabling all features-flags in VM-Settings under "Processors" is absolutely senseless, i think its even worse in the combination with "host".
If you set the CPU to "host" your cpu will simply be passed through with all features the CPU Supports. All other options would set something like a mask, or fake the cpu, so the feature-flags makes sense there in some scenarios, but not for "host"

Cheers

JesperAP · Jun 22, 2024

_gabriel said:
try mitigations=off as kernel pve options.
what vmware version is ?

Thanks, this was the problem, with mitigations off I get the same or a bit more performance than vmware...

Does VMware not have mitigations on?

_gabriel · Jun 22, 2024

JesperAP said:
Does VMware not have mitigations on?

depends on version.
iirc, esxi 6.7 rtm (mid 2018) haven't mitigations enabled.

VinnyG · Jun 22, 2024

t

Ramalama said:
Numa doesn't work on Proxmox.
Or lets say there is no logic, like assigning 4 Cores of the VM to the same Socket, to benefit from faster Ram.

As long as there is no numa Support, ESXI will always win on dual Socket Systems or Single Socket AMD Bergamo/Genoa Servers.
ESXI handles numa pretty well, there are even options to Assign CPU Cores based on L3 Cache/Hyperthreading.

On a Single Socket Server, as long as there is no need for Numa, Proxmox will always outperform ESXI.

PS: The option in VM-Settings to enable Numa in Proxmox is only beneficial for some rare Scenarios, 99% of the usecase the option is senseless. The option just passes through Numa Capabilities to the VM, So the VM could handle numa itself. But the issue is that CPU-Tasks on the Proxmox host itself "rotate", its pretty senseless to say it shortly.

You can PIN your VM-CPU's to the Physical Cores of your Proxmox host yourself, thats the only way at the moment.
Simply Pin all your 4 VM-Cores to the same Socket on the Host and you will outperform esxi.
But you have to do it for every VM and that can get very fast very confusing, since balancing the cores yourself with 20 VM's is not that easy.
I mean to balance out VM's between both sockets.

You can get even more performance on AMD systems at least (Genoa/milan/bergamo), if you assign the VM-Cores to the CPU's which have all access to the same L3-Cache. On Intel systems it has no benefits, since they are still mostly monotholic.
Monotholic CPUs are great, because you dont have to f... with Numa (compared to AMD) and L3-Cache, which makes everything a lot easier on Proxmox Side, you basically have only to Pin the CPUs of a VM to the same socket on Intel and thats it.

Cheers

that's a really good tip saved me a ton of trouble

to the OP i have not used vmware but on passmark test the system I'm running scores top 2 out of 120 samples

JesperAP · Jun 22, 2024

_gabriel said:
depends on version.
iirc, esxi 6.7 rtm (mid 2018) haven't mitigations enabled.

We use 8

VinnyG · Jun 30, 2024

Ramalama said:
Numa doesn't work on Proxmox.
Or lets say there is no logic, like assigning 4 Cores of the VM to the same Socket, to benefit from faster Ram.

As long as there is no numa Support, ESXI will always win on dual Socket Systems or Single Socket AMD Bergamo/Genoa Servers.
ESXI handles numa pretty well, there are even options to Assign CPU Cores based on L3 Cache/Hyperthreading.

On a Single Socket Server, as long as there is no need for Numa, Proxmox will always outperform ESXI.

PS: The option in VM-Settings to enable Numa in Proxmox is only beneficial for some rare Scenarios, 99% of the usecase the option is senseless. The option just passes through Numa Capabilities to the VM, So the VM could handle numa itself. But the issue is that CPU-Tasks on the Proxmox host itself "rotate", its pretty senseless to say it shortly.

You can PIN your VM-CPU's to the Physical Cores of your Proxmox host yourself, thats the only way at the moment.
Simply Pin all your 4 VM-Cores to the same Socket on the Host and you will outperform esxi.
But you have to do it for every VM and that can get very fast very confusing, since balancing the cores yourself with 20 VM's is not that easy.
I mean to balance out VM's between both sockets.

You can get even more performance on AMD systems at least (Genoa/milan/bergamo), if you assign the VM-Cores to the CPU's which have all access to the same L3-Cache. On Intel systems it has no benefits, since they are still mostly monotholic.
Monotholic CPUs are great, because you dont have to f... with Numa (compared to AMD) and L3-Cache, which makes everything a lot easier on Proxmox Side, you basically have only to Pin the CPUs of a VM to the same socket on Intel and thats it.

Cheers

what happens if you set always 1 socket shouldn't that take care of the issue without having to pin vcpus manually?

spirit · Jul 1, 2024

Maybe cpu governor is not to max performance ?

Can you try to add grub options

"intel_idle.max_cstate=0 intel_pstate=disable processor.max_cstate=1"
?

Ramalama · Jul 1, 2024

VinnyG said:
what happens if you set always 1 socket shouldn't that take care of the issue without having to pin vcpus manually?

No it make absolutely no difference. The socket option makes only sense if you enable the numa option.
But that just tells the OS in the VM itself to use numa. Which is as i said previously absolutely senseless either 99% of the usecases.
(I dont know any usecase or anything that does benefit from that myself)

The Whole issue is, if you enable numa or use more as one Socket in the VM Setting or dont enable numa and use only one Socket makes absolutely no difference how Proxmox handle the VMs CPU-Thread (Tasks)

Each Core you give a VM, is a "Task" on the Proxmox host itself, and those Tasks on the host rotate Randomly (Details below) between all Physical Cores without any logic.
This is because qemu doesn't tells the Proxmox-Kernel which Tasks belog together. So if you give a VM 4-Cores, the Kernel sees those 4 "Tasks" as separate Tasks on the Host that have nothing common together.

This is glad god not a really dramatic issue (lets say it could be much worse), because the Kernel seems to have still some clever Logic, which leads to not that big of a performance penalty as it should actually be.

Rotate Randomly:
The kernel will not Rotate the task to another Core, if the Task is still busy, something inside the VM is sticking on the CPU, like a running Program that is currently busy. But as soon the Programm or whatever inside the VM is finished or entered a waiting loop, the Task on the host will usually Rotate to another CPU Core.
Its is how i understand it, probably not fully correct in detail, but on a high view it is definitively how it works.

As long as QEMU doesn't tell the Kernel that those 4 Tasks (4 vCPUs in VM) don't belong together somehow, there is no Numa Support on Proxmox at all. No matter what anyone says.
So only CPU-Pinning is a solution at the moment.

Its possible to create some hook scripts that are really clever and balance at least the VM's between Sockets or Numa-Nodes out (on startup at least). So it will get much easier. Pinning yourself 20 VM's is very challanging.

But real numa-support means also that the kernel can Rotate the VM-Tasks while the VM's are running, between the Cores of a Numa-Node.
Moving to another Numa-Node (still with near memory) but another L3-Cache should be supported either.

Numa is not only about Near and Far Memory, its about L3-Cache either. L3-Cache is actually a bigger performance factor. The reason is, that an Application that uses multiple CPUs (Multithreading Apps) can use L3-Cache to share data between tasks/cores, which is insanely fast.
If the Cores of a VM are spreaded around (Not on the same CCD on AMD-Server Systems), the Application cannot use the L3-Cache and needs to use memory, which is 3x slower.
I benchmarked that even with Iperf3 and Multitasking Archiving, with pinning i get 3x more performance. Iperf3 without pinning: 14-15GB/s, with pinning to same CCD (same L3-Cache): over 50GB/s.
The benchmarks are here on this Forums in another Thread.
But L3-Cache on intel-Server Systems is not that big of a deal, because of the Monotholic Design.
On AMD-Servers (Milan/Rome/Genoa) its an extreme huge issue, so huge that Proxmox makes absolutely no sense on those Systems without CPU-Pinning.

So the conclusion is, Sockets (in VM-Settings) in Proxmox is absolutely Senseless. Numa option in VM-Settings is absolutely senseless in my opinion either.
Cheers

VinnyG · Jul 1, 2024

Ramalama said:
No it make absolutely no difference. The socket option makes only sense if you enable the numa option.
But that just tells the OS in the VM itself to use numa. Which is as i said previously absolutely senseless either 99% of the usecases.
(I dont know any usecase or anything that does benefit from that myself)

The Whole issue is, if you enable numa or use more as one Socket in the VM Setting or dont enable numa and use only one Socket makes absolutely no difference how Proxmox handle the VMs CPU-Thread (Tasks)

Each Core you give a VM, is a "Task" on the Proxmox host itself, and those Tasks on the host rotate Randomly (Details below) between all Physical Cores without any logic.
This is because qemu doesn't tells the Proxmox-Kernel which Tasks belog together. So if you give a VM 4-Cores, the Kernel sees those 4 "Tasks" as separate Tasks on the Host that have nothing common together.

This is glad god not a really dramatic issue (lets say it could be much worse), because the Kernel seems to have still some clever Logic, which leads to not that big of a performance penalty as it should actually be.

Rotate Randomly:
The kernel will not Rotate the task to another Core, if the Task is still busy, something inside the VM is sticking on the CPU, like a running Program that is currently busy. But as soon the Programm or whatever inside the VM is finished or entered a waiting loop, the Task on the host will usually Rotate to another CPU Core.
Its is how i understand it, probably not fully correct in detail, but on a high view it is definitively how it works.

As long as QEMU doesn't tell the Kernel that those 4 Tasks (4 vCPUs in VM) don't belong together somehow, there is no Numa Support on Proxmox at all. No matter what anyone says.
So only CPU-Pinning is a solution at the moment.

Its possible to create some hook scripts that are really clever and balance at least the VM's between Sockets or Numa-Nodes out (on startup at least). So it will get much easier. Pinning yourself 20 VM's is very challanging.

But real numa-support means also that the kernel can Rotate the VM-Tasks while the VM's are running, between the Cores of a Numa-Node.
Moving to another Numa-Node (still with near memory) but another L3-Cache should be supported either.

Numa is not only about Near and Far Memory, its about L3-Cache either. L3-Cache is actually a bigger performance factor. The reason is, that an Application that uses multiple CPUs (Multithreading Apps) can use L3-Cache to share data between tasks/cores, which is insanely fast.
If the Cores of a VM are spreaded around (Not on the same CCD on AMD-Server Systems), the Application cannot use the L3-Cache and needs to use memory, which is 3x slower.
I benchmarked that even with Iperf3 and Multitasking Archiving, with pinning i get 3x more performance. Iperf3 without pinning: 14-15GB/s, with pinning to same CCD (same L3-Cache): over 50GB/s.
The benchmarks are here on this Forums in another Thread.
But L3-Cache on intel-Server Systems is not that big of a deal, because of the Monotholic Design.
On AMD-Servers (Milan/Rome/Genoa) its an extreme huge issue, so huge that Proxmox makes absolutely no sense on those Systems without CPU-Pinning.

So the conclusion is, Sockets (in VM-Settings) in Proxmox is absolutely Senseless. Numa option in VM-Settings is absolutely senseless in my opinion either.
Cheers

thanks for taking the time to thoroughly explain it as a newbie this really helps

that must be even more of a mess with I/O threads, disk/nic/gpu passthrough etc

luckily some other enlightened ppl like you have written about workarounds

but yeah a real PITA

Ramalama · Jul 1, 2024

VinnyG said:
thanks for taking the time to thoroughly explain it as a newbie this really helps

that must be even more of a mess with I/O threads, disk/nic/gpu passthrough etc

luckily some other enlightened ppl like you have written about workarounds

but yeah a real PITA

Thats all easy. In that department there is actually nothing that is missing.

There are only some weknesses:
- ZFS-Zvols beeing 5x slower as they should be (but that is an ZFS issue, nothing the Proxmox devs can do about)
Just mentioning this, because thats the "default storage option" for most users on Proxmox.
- No Numa-Support: Thats mentioned above already, but thats actually more QEMU-Related, every linux distro (apart from SLES/RHEL i believe) have that issue with qemu. But i believe on this point the Proxmox-Devs could actually improve the Situation with hook scripts at least.
- BackupServer backup speed hard limited to 1-1,2 GB/s, no matter what Server/Storage. But this Backup-Solution is still a lot faster as every backup offering for esxi.
Just mentioning this, because this point is easy for the Dev-Team to improve with some config based tuning parameters.

Apart of that, Proxmox has a ton of Strengths, that are far better as ESXI. No stupid VCS-VM that breaks every 2 Years, or Password expirations where you have to hack into grub to change the PW, or Native Storage that is a shitton faster as VMFS, easy updates through Package manager instead of the Stupid Update/Patchmanagement Plugin or on older ESXI a windows-vm for that, LXC Containers which are just amazing (but not amazing to backup, it basically just takes long compared to VM's xD)
There are a ton of things that are better compared to ESXI.
The other alternative is Hyper-V which makes no sense for anything else as hosting Windows-VM's, Nutanix-AHV which is not on par with Proxmox/ESXI or even Hyper-V and is basically just QEMU either.
And thats it, there is maybe unraid, but hell i don't like anything of that, thats a complete different solution targeting home-users only.

So in the end we are still here and complain about Proxmox xD
But it's still the best overall Hypervisor in my opinion, even if it would not be free.
And i have all 3 issues above with my Genoa Servers and Backup-Server with NVME-Storage... Still loving Proxmox xD

Cheers

PS: Hyper-V has one extreme Strength, and thats HA. I don't know in detail how it works, but if a host fails on our Hyper-V Cluster, the VM is just for max 1 second unreachable. Its just insane. But everything else is not great xD

Search

Search

Proxmox slower in CPU/Memory than VMware

JesperAP

New Member

LnxBil

Distinguished Member

JesperAP

New Member

billy999

Member

JesperAP

New Member

_gabriel

Famous Member

Ramalama

Renowned Member

JesperAP

New Member

Ramalama

Renowned Member

JesperAP

New Member

_gabriel

Famous Member

VinnyG

New Member

JesperAP

New Member

VinnyG

New Member

spirit

Distinguished Member

Ramalama

Renowned Member

VinnyG

New Member

Ramalama

Renowned Member

We value your privacy