Windows/MSSQL workloads on PVE

davemcl

Member
Sep 24, 2022
121
13
18
Ive been evaluating Proxmox & another Hypervisor for MSSQL/Windows workloads and long story short I cant replicate the performance of the competitor with Proxmox.
Whilst fio, Geekbench & other benchmarks were fairly similar, when it got to actual real workloads that will eventually run on these servers the difference was staggering.
I favour PVE so Im hoping to track down the cause.

MSSQL ETL task
Current solution: Xeon E5-2689 v4 on VMWare 6.7 (OVH)
40m 18s
***********
New solution:
Dell R740 2 x XEON Gold 6152, RAID 10 SAS/SSD on Perc H730
Proxmox 7.3-4 - VM 16 vCPU, 32GB RAM
41m 06s

Competing Hypervisor - VM 16 vCPU, 32GB RAM
17m 20s
************
My Proxmox VM settings
> agent: 1
> bios: ovmf
> boot: order=scsi0;net0;ide0;ide2
> cores: 16
> cpu: host
> efidisk0: SSD-R10:vm-200-disk-1,efitype=4m,pre-enrolled-keys=1,size=528K
> ide0: none,media=cdrom
> machine: pc-q35-7.1
> memory: 32768
> meta: creation-qemu=7.0.0,ctime=1663398507
> name: Win-Srv-22-01
> net0: virtio=AE:9A:D1:19:00:E7,bridge=vmbr0,firewall=1,tag=25
> numa: 0
> ostype: win11
> scsi0: SSD-R10:vm-200-disk-2,cache=writeback,discard=on,iothread=1,size=64G
> scsi1: SSD-R10:vm-200-disk-3,discard=on,iothread=1,size=64G
> scsi2: SSD-R10:vm-200-disk-4,discard=on,iothread=1,size=128G
> scsi3: SSD-R10:vm-200-disk-5,discard=on,iothread=1,size=64G
> scsihw: virtio-scsi-single
> smbios1: uuid=38eb0316-63e2-4dbb-9e1a-e0f45bc1d220
> sockets: 2
> tags: windows
> tpmstate0: SSD-R10:vm-200-disk-0,size=4M,version=v2.0
> vmgenid: 5a31f7fb-22dd-4e1a-adcc-136fcba06c34

> uname -a
> Linux pve02 5.15.83-1-pve #1 SMP PVE 5.15.83-1 (2022-12-15T00:00Z) x86_64 GNU/Linux

Ive tried different CPU types in PVE with minimal changes in result.
Any recommendations on what to try next?

Dave
 
In Proxmox Im using raw format with LVM-thin.
Have also now tried
  • disabling mem ballooning
  • changing cache on controller to writeback
  • setting trace flag for SQL to T8038
  • Disable use tablet for pointer
 
> numa: 0
> sockets: 2

Try with NUMA enabled.
But to be honest, I doubt that will help (that) much, as you are "missing" over half of the performance (41m vs. 17m) for some reason, if I understand it correctly.

You do/did the tests with both hypervisors on the exactly same and not only identical hardware and configuration/setup? So you can definitely rule out any hardware problems and/or bios/UEFI settings and/or configuration differences, yes?

Does the other hypervisor have mitigations disabled? At least PVE has them enabled per default.

You could also test with the opt-in 6.1: [1] (and 5.19: [2]) kernel.

But since I not even know, what a "MSSQL ETL task" is, I unfortunately can not help any further, sorry.

[1] https://forum.proxmox.com/threads/opt-in-linux-6-1-kernel-for-proxmox-ve-7-x-available.119483
[2] https://forum.proxmox.com/threads/opt-in-linux-5-19-kernel-for-proxmox-ve-7-x-available.115090
 
The ETL task is "extract, transform & load" - just a bunch of database queries from source data into a staging database.

Tried the same ETL process on a different node that has 2 x Silver 4114 CPU’s, also a R740 with SSD backed RAID10 storage.
Initial process took 46m, went down to 23m after disabling all CPU vuln mitigation.

Ill try the later kernels, but yes the initial testing was on exact same server.
 
Last edited:
You could also try VirtIO-Block as disk (shouldn't be faster, but you never know) or passthroughing the storage controller (to see the virtualisation overhead).

Mitigations are definitely something that should be considered!
 
Try with NUMA enabled.
But to be honest, I doubt that will help (that) much, as you are "missing" over half of the performance (41m vs. 17m) for some reason, if I understand it correctly.

You do/did the tests with both hypervisors on the exactly same and not only identical hardware and configuration/setup? So you can definitely rule out any hardware problems and/or bios/UEFI settings and/or configuration differences, yes?

Does the other hypervisor have mitigations disabled? At least PVE has them enabled per default.

You could also test with the opt-in 6.1: [1] (and 5.19: [2]) kernel.

But since I not even know, what a "MSSQL ETL task" is, I unfortunately can not help any further, sorry.

[1] https://forum.proxmox.com/threads/opt-in-linux-6-1-kernel-for-proxmox-ve-7-x-available.119483
[2] https://forum.proxmox.com/threads/opt-in-linux-5-19-kernel-for-proxmox-ve-7-x-available.115090
Unfortunately later kernels made no difference.
 
On the same hardware as the 41m test, disabling mitigations reduces the test to just over 20m.
Not much I can do here.
 
you can also try to force cpu to max frequency (it'll help for latency, not sur for etl)

/etc/default/grub

GRUB_CMDLINE_LINUX="idle=poll intel_idle.max_cstate=0 intel_iommu=off intel_pstate=disable processor.max_cstate=1"

#update-grub && reboot
 
With the extra CPU flags if for example my CPU's support PCID am I meant to set "+pci" in that case?
My understanding was setting it to "host" passed the correct flags regardless... is this right?
 
retbleed mitigation is by far the most expensive for this server. Will try enabling hugepages and force max freq next.
 
Maybe I'm missing something, but without more information its impossible to say- although I'd guess the other hardware fits your disk IO better.

What is the other hypervisor?
what is the underlying hardware? cpu type and count, memory type, density, speed, storage technology etc.
how much additional load is on the hypervisor? (for both)
You mentioned raid10 for storage, but not make/model/quantity of disks.

Onto specifics- why are you making a 2 socket vm? there is almost no scenario where this is advantageous since your hardware has enough cores in a single socket.
 
On the same hardware as the 41m test, disabling mitigations reduces the test to just over 20m.
Not much I can do here.
retbleed mitigation is by far the most expensive for this server.

Did you check the status of the mitigations on the other hypervisor?
Sounds like the other hypervisor has mitigations (at least partly) disabled by default (or does not have all included/implemented (yet) at all).
This would obviously not be a fair comparison...

Or is this all already clear and we are now talking about the 20m vs. 17m difference?

Did you test with NUMA enabled yet?

PS.: Would be really nice (if somehow possible), if you could give @bbgeek17 informations about your test scenario, so that he can reproduce it. :)
 
The ETL task is "extract, transform & load" - just a bunch of database queries from source data into a staging database.

Tried the same ETL process on a different node that has 2 x Silver 4114 CPU’s, also a R740 with SSD backed RAID10 storage.
Initial process took 46m, went down to 23m after disabling all CPU vuln mitigation.

Ill try the later kernels, but yes the initial testing was on exact same server.


Hi,

I do not see, the exact specifications of your old/VMware solution(ram, storage/raid, OS guest, and so on).

So, I do not understand what you comparing....

By default VMware use 64k block size, and the mssql use the same value. Lvm as I find .... seems to use 512. So you compare Apple with oranges... Try to setup your LVM to use the same 64 block size(I can not help here, because I do not use LVM).

I am using only zfs in Proxmox. ... because I like to shoot in my foot ;)

Anyway, as a side note, I was able to migrate various mssql from better hardware to to less hardware VM (less cpu frequency, less memory, and so on) on Proxmox/zfs with better performance because I want to lose my time ... :)

Even more, because I have a bad habit, with a mssql expert friend help, I was able to migrate a mssql(free version) on centos 7 with better performance . This was 5 years ago... so 2 several mssql version upgrades with better performance. Maybe I am a lucky men ...

My suggestion is to use a mssql full time admin and a Proxmox full time admin(payed subscription is a good start point)

Good luck / Bafta !
 
@davemcl It would be interesting to try this in our performance lab. Is the testing you are doing now tied to your data, or is there a setup/task I can run on a fresh SQL installation that would be a good comparison?


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
Unfortunately its customer data - around a 100GB data warehouse.
If i get time I might try the DVDShop dataset which is publically available.
 
Maybe I'm missing something, but without more information its impossible to say- although I'd guess the other hardware fits your disk IO better.

What is the other hypervisor?
what is the underlying hardware? cpu type and count, memory type, density, speed, storage technology etc.
how much additional load is on the hypervisor? (for both)
You mentioned raid10 for storage, but not make/model/quantity of disks.

Onto specifics- why are you making a 2 socket vm? there is almost no scenario where this is advantageous since your hardware has enough cores in a single socket.
Other Hypervisor is XCP - all mitigations are enabled by default.
2 x Gold 6152 CPU, only allocating 16vCPU to VM
No other VM's running.
Storage is LVM thin, 6 x Toshiba SAS (Enterprise Mixed Use SSD) connected to Perc 740P in RAID10
Ive tried single & multi socket just for comparison sake.

Currently with disabling retbleed Im relatively happy with performance.
 
Last edited:
I went back to the bios and reset to defaults, loaded the "Performance System Profile", then loaded and applied the "Virtualization Optimized Performance Profile".
Task is now running at 17m.
Still have to have retbleed mitigation disabled.

Thanks to all that made suggestions, very happy & been a good learning experience.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!