Ubuntu 20 VM Much slower than in ESXi

I spun up a new VM to test rather than using one imported form ESXi and performed the same test.
Same results. Seems as though Proxmox is significantly slower than ESXi on the same hardware.
I'm going to abandon this path at least for now.
 
(Ubuntu 20.04) I have the same problem. Imported from vmware esxi 7.0 to proxmox 8.3.0 . if I create new one of vm on proxmox (same situation on vmware esxi 7.0), it's work. I don't have any idea to solve this also.
 
From some discussion about this in a Discord chat a while ago, the difference is most likely ESXi not applying cpu mitigations to VMs by default, whereas Proxmox does. If the cpu mitigations are applied in ESXi, then apparently it has the same "slowdown" effect.
 
  • Like
Reactions: Johannes S
Turning off mitigations did help, but still didn't come close as per reply #17 of this thread so I don't think it's just that.

I recently setup KVM on a server too. A VM there was taking about 2 times as long to do the same task as on ESXi but I gather that's a tuning issue. It sounds like there's a ton of things to tune to get the correct performance out of KVM.
 
Any idea if it was cpu bound, io bound, or something else?

Also, did you use the VirtIO drivers for everything?
I tried many drivers including VirtIO. Never figured it out. Finally gave up and concluded that Proxmox is just plain slower than ESXi. Too slow for my uses unfortunately. Proxmox has a lot of great features I'd like to use and ESXi licensing isn't very friendly for small companies now it's part of Broadcom
 
Again, do you remember if it was cpu bound, io bound, or something else?
No idea. My standard test is a task that is regularly performed by my server. It reads a lot of data from a database, processes it and puts the results back. The task itself is typically cpu limited when using SSD drive and a reasonable bus because it's single threaded. I use it as a benchmark test and know how long it should take on different machines.

The fact on the same machine it was taking 4 times longer says that something was a problem. Maybe there were more than one limitations. Seems like it was cpu but I don't know for sure. I should have been watching the CPU load to see. Normally one CPU is just about pegged from the task and another has a bit of a load from servicing the DB.
 
  • Like
Reactions: justinclift
Ahhh well. Yeah it's unfortunate the system isn't still around for looking into this in further detail.

It's possible there was something happening in the system to cause io (or other) problems, which would then lead to this "multiple time slower" problem. What you're describing sounds (to me) like the kind of thing I'd expect from an io problem (ie wrong storage or network driver).

If you do ever want to get around to investigating this in depth please let me know. You'll need a bunch of patience though as it'd be a case of running the test a bunch of times with various settings to figure out exactly wtf is going wrong and how to fix it. :)

(I'm actually super time limited myself at the moment, but hopefully that'll improve in the new year)
 
  • Like
Reactions: Johannes S
I have another server that I've just commissioned but it isn't actually needed until the new year. Maybe next week I'll have time to throw another drive in and load up Proxmox for another test.
 
  • Like
Reactions: justinclift
I took some time this week and did some more testing.

The test I'm doing isn't some random benchmark. This is a real world task that's performed on our servers frequently. It ends up fetching about 13k records from a database, processes the data and inserting about 4k records as a result. The database and queries have all been optimized. During this process two threads each vary from 20-50% of a CPU. This exercises CPU speed and I/O throughput.

First I loaded Linux onto the mac mini where I'd previously been testing Proxmox. The test results there were dismal. Something about that mini will not run Linux very quickly. ESXi (busy box) runs great but Linux and Proxmox are slow. Not sure why so I just abandoned that test mule.

Next I loaded up a server that is currently idle and performed the same tests in Ubuntu 24.04 running on ESXi, Proxmox, and bare metal. Here are the results:

Ubuntu - bare metal
Average Test completion time 9198 ms (baseline)

ESXi
Average Test completion time 9710 ms (5.6% slower than Ubuntu on bare metal which is reasonable)

Proxmox - (everything virtio, cpu host, see attached image of VM)
Average Test completion time 11641 ms (26.5% slower than Ubuntu on bare metal which seems slow)
 

Attachments

  • Screen Shot 2024-12-11 at 1.35.24 PM.png
    Screen Shot 2024-12-11 at 1.35.24 PM.png
    128.5 KB · Views: 5
Last edited:
  • Like
Reactions: justinclift
@GuyInCorner Awesome, thanks for getting around to this. Would you also be ok to screenshot the "Options" for the VM as well, as it's probably easier to do that rather than a bunch of back and forth questions to get the info. :)

With that host box, and the VM running on it, were any kind of cpu mitigations enabled?

If not, what's the approach you used for disabling them and verifying they're disabled? Asking because this thread has a decent chance of becoming solid reference information if we can get a fairly complete set of answers and drill into testing things out to figure what's going wrong. :)
 
  • Like
Reactions: GuyInCorner
afaik, you need set cache=none in PVE to be closer to baremetal and esxi
A good point to bring up, but depending upon what the VM is doing it can vary. I've tried all caching options and found 'direct sync' to be fastest.
'none' was actually 21% slower.
 
@GuyInCorner Awesome, thanks for getting around to this. Would you also be ok to screenshot the "Options" for the VM as well, as it's probably easier to do that rather than a bunch of back and forth questions to get the info. :)

With that host box, and the VM running on it, were any kind of cpu mitigations enabled?

If not, what's the approach you used for disabling them and verifying they're disabled? Asking because this thread has a decent chance of becoming solid reference information if we can get a fairly complete set of answers and drill into testing things out to figure what's going wrong. :)
This round of testing is being done on a Dell R730 with 2 x E5-2630 v3 @ 2.40GHz
Drives are SSD RAID 1 mirrors (which of interest allow things to run slightly faster than Raid 3 or Raid 10)

Attached are the VM options

I have done no mitigations. Here are the vulnerabilities listed from lscpu:
r730 - PVE - Host
Code:
Vulnerabilities:        
  Gather data sampling:   Not affected
  Itlb multihit:          KVM: Mitigation: Split huge pages
  L1tf:                   Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                    Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:               Mitigation; PTI
  Mmio stale data:        Mitigation; Clear CPU buffers; SMT vulnerable
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected

r730 - PVE - Ubuntu VM
Code:
Vulnerabilities:         
  Gather data sampling:   Not affected
  Itlb multihit:          Not affected
  L1tf:                   Mitigation; PTE Inversion; VMX flush not necessary, SMT disabled
  Mds:                    Mitigation; Clear CPU buffers; SMT Host state unknown
  Meltdown:               Mitigation; PTI
  Mmio stale data:        Mitigation; Clear CPU buffers; SMT Host state unknown
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Retpoline
  Srbds:                  Not affected
  Tsx async abort:        Not affected

r730 - Ubuntu - bare metal
Code:
Vulnerabilities:         
  Gather data sampling:   Not affected
  Itlb multihit:          KVM: Mitigation: VMX disabled
  L1tf:                   Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                    Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:               Mitigation; PTI
  Mmio stale data:        Mitigation; Clear CPU buffers; SMT vulnerable
  Reg file data sampling: Not affected
  Retbleed:               Not affected
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
  Srbds:                  Not affected
  Tsx async abort:        Not affected

r730 - ESXi - Ubuntu VM
Code:
Vulnerabilities:         
  Gather data sampling:   Not affected
  Itlb multihit:          KVM: Mitigation: VMX unsupported
  L1tf:                   Mitigation; PTE Inversion
  Mds:                    Mitigation; Clear CPU buffers; SMT Host state unknown
  Meltdown:               Mitigation; PTI
  Mmio stale data:        Mitigation; Clear CPU buffers; SMT Host state unknown
  Reg file data sampling: Not affected
  Retbleed:               Mitigation; IBRS
  Spec rstack overflow:   Not affected
  Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:             Mitigation; IBRS; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI SW loop, KVM SW loop
  Srbds:                  Not affected
  Tsx async abort:        Not affected
 

Attachments

  • Screen Shot 2024-12-11 at 3.48.57 PM.png
    Screen Shot 2024-12-11 at 3.48.57 PM.png
    175.4 KB · Views: 1
Last edited:
  • Like
Reactions: justinclift

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!