MSSQL on ZFS

soapee01

Renowned Member
Sep 7, 2016
40
8
73
69
Hi,

I'm looking to install Windows Server 2019 Std with MSSQL Sql Server Express (probably 2017).

Drives:

Raid10 SSD 1TB drives (samsung 850 Pro MZ-76P1T0BW)

Historically I've used raw filesystems for mssql as suggested. I'd really like to have the ability to use snapshots. Can this configuration work well on ZFS, or are there needed performance tweaks to make this work? I've read adjusting block sizes may, or may not be required.

I'd also like to be able to use the same pool for a win2k19 active directory server, and whatever VMs I may need in the future.

Thanks!

James
 
I've decided to go forward and test. In general I've noticed that read speeds are quite good, but some writes (4k especially) can be sub par. I've used passmark to test the drives, and as a baseline, I'm comparing that to my desktop with an NVMe.

Attached is a PDF that shows the results from passmark.

Latency seems to be a real issue here, unless I'm misreading things. Is there anything else I can do to tune up this better?
  • Test Suite – Passmark Performance Test v9.0
  • NVME System:
    • AMD Ryzen 7 2700, 16GB Ram, Samsung Samsung 970 EVO 500GB - NVMe PCIe M.2 2280 SSD, Windows 10 Pro
  • Proxmox 6.0-4
    • SuperMicro SSG-6038R-E1CR16, 64GB Ram, 1 CPU (Intel XEON E5-2623 v4 @ 2.60GHz)
    • LSI SAS3008 Fusion-MPT SAS-3 HBA
    • Only 1 Guest OS for Testing
    • Guest OS: Windows 2019 Server Standard, 16GB Ram (Ballooning was Enabled per Default, but Minimum was 16384), 1 Socket-4Core CPU, Virtio Disks, Virtio Network
    • Zpool : rpool RAID1 – Samsung 860 Pro 256GB SSD (Proxmox Base), ashift default 12
      • This zpool isn’t running any guest OS’s, only Proxmox, ISOs, etc.
    • Zpool: ssd10 RAID10: Samsung 860 Pro 1TB SSD (VM Storage), atime 13
      • Manually created with the following command: zpool create -f -o ashift=13 ssd10 mirror /dev/sdc /dev/sdd mirror /dev/sde /dev/sdf
      • Compression off
      • ashift=13
      • Recordsize 128k (default)
      • ssd10/vm-100-disk-0 (DRIVE C:\ Windows)
        • Compression off
        • Volblocksize: 8k
        • Default NTFS Block Size
      • ssd10/vm-100-disk-1 (DRIVE F:\ for MSSQL)
        • Compression off
        • Volblocksize: 64k
        • Formatted NTFS with 64k block size
 

Attachments

As an added test, I ran passmark advanced disk testing with some custom threads based on their database testing config.

4 Threads, Each is a 500K file size, with block size of 64K. The rest of the settings I left the same. I'm not sure that this is a particularly valid test though.

Average write performance was around 1300MB/sec.
 
Why do you turn off the compression? In practise, compression helps the disk throughput, especially with databases (that do not include compression already).

For benchmarks, the Linux community often uses fio, which is the industry leading benchmark tool for all platforms. Using the settings often posted in the forums, you can create comparable results.

but some writes (4k especially) can be sub par.

Mainly due to con-/prosumer hardware.
 
Why do you turn off the compression? In practise, compression helps the disk throughput, especially with databases (that do not include compression already).
The CPU isn't the fastest. The assumption on my part was that this might bottleneck performance. It's worth testing again with it enabled.

For benchmarks, the Linux community often uses fio, which is the industry leading benchmark tool for all platforms. Using the settings often posted in the forums, you can create comparable results.
Meh, I already have a license for passmark. It's easy to use on windows, gives pretty graphs, etc. Didn't know that FIO was available for windows. Thanks for that.

Mainly due to con-/prosumer hardware.
Doubtful. Could performance be better with Enterprise SSD's at a much higher cost? Sure. Is this the cause of the latency? I'm not buying it. Normally I use an madadm raid10 array. I suppose I'll have to convert it over to that to know for sure.

ASIDE: I'm not going to argue MDADM over a hardware raid controller either with ya'll. MDADM has saved me more times than I can count vs some adaptec/broadcom/perc raid card, and performance on modern processors is very good. I'll take the small performance penalty for the recovery flexibility, configurability, and lower hardware costs.

My experience with ZFS is largely limited to FreeNAS, and the performance has always been less than optimal, but I've never really that cared much on a file server for a small office. Data Integrity, Snapshots, Encryption, and availability have been more important than speed. I use WD NAS Pro drives there anyway, so it's going to be slow. ZIL and L2ARC help this though. I'm not sure it'd do much on an ZFS SSD Raid10 array though, but I may be mistaken there as well. If that might help, I'd love to know. Perhaps a small high performance enterprise drive as a SLOG may improve things dramatically without costs going through the roof.

I'm also not willing to rule out the Broadcom sas-3008 HBA. I had to RMA the first server due to issues with it. I'm not convinced that there are not underlying driver issues with it. Their support usually demands that I run an ancient driver last updated with debian 8.

ZFS Snapshots on a VM though? That's amazing. ZFS just has a lot of knobs to tweak, and I'm certainly no expert. I'm expecting worse performance than what I usually get with some fantastic extra features. I'm happy proxmox has added this feature in. :-)
 
Doubtful. Could performance be better with Enterprise SSD's at a much higher cost?

Write performance for sure, especially random:

http://www.sebastien-han.fr/blog/20...-if-your-ssd-is-suitable-as-a-journal-device/

Normally I use an madadm raid10 array.

Yes, please try that too. mdadm should be faster, the question is how much. IIRC, mdadm was about 1.5-2 times faster than ZFS on 6 SAMSUNG MZ7WD960 in RAID10/stripped mirror.

Perhaps a small high performance enterprise drive as a SLOG may improve things dramatically without costs going through the roof.
Depends on the workload. SLOG only receives SYNC writes, anything else is written asynchronously.

Please also keep an eye on your wearout. Non-Enterprise SSDs are running out pretty quick with ZFS. I experienced on a PVE-laptop with an SAMSUNG EVO that the drive has almost wore out in 2 months.
 
It's best to have to same blocksize as your database.

These are consumer grade devices, please go with enterprise grade hardware.

It's not what the vendors categorize the drive to be, it is its properties that matters. Many enterprise SSD's (categorized as such by their vendor) has worse performance and endurance spec than Samsung's Pro SSDs
 
It's not what the vendors categorize the drive to be, it is its properties that matters. Many enterprise SSD's (categorized as such by their vendor) has worse performance and endurance spec than Samsung's Pro SSDs

Yes, you're right. Therefore I stated "enterprise grade", not "enterprise named". It obviously depends on where you buy your "enterprise grade" drives. Normally you would buy Samsung enterprise drives (e.g. PM series) or Intel or whatever your supplier has. If you go with Dell, IBM, HPE, Fujitsu or any other full-tier manufacturer, you're good to go, especially if you buy SAS SSDs
 
Hi, just to notify you, that ZFS now works with MSSQL. Because the lack of O_DIRECT in ZFS which MSSQL needs and before that, we need to run SQL Server over a ZVOL formatted in ext4 which causes potential performance impact.

After all these years, ZFS had implemented O_DIRECT finally, and now it works out of the box with SQL Server, at least with Express.

Proof: I ran my own official Bitwarden server that uses SQL Server 2017 Express.
 
Server configuration: 20 Cores/40 Threads x 2 Sockets = 80 cores.
Memory: 640GB with 150GB used across all VMs.
The server has a dedicated Proxmox OS SSD and multiple 4TB Crucial MX500 SSDs.
It has been running the Windows OS VM.
OS: Windows Server 2016 Standard.
DB: MS-SQL Express Advanced 2008 R2.
Initially set up with LVM, featuring 100GB storage, 8 Cores, and 8GB RAM. Memory utilization ranged from 50% to 60% with all virtual drivers & Qemu-guest.
With the need for Two-Server Replication, the decision was made to transition to ZFS Storage for VMs.
A PVE Community subscription setup is in place for these existing servers.

While everything is functioning as expected, there is an issue with report generation in the SQL Database. The process is taking nearly twice as long on the ZFS file-system compared to LVM storage on the same model of SSD.
For instance, a process that took 10-12 minutes on LVM now takes 20-25 minutes on ZFS. Reverting to LVM improves performance.

Various attempts have been made to address the issue:

Tried both single ZFS disk of 4TB with "Compression Off" and a setup of 2x 4TB SSDs (ZFS RAID0) for the SQL server.
Adjusted ZFS block size from 8k to 64k.
Formatted a new NTFS drive with a 64K block size.
Additionally, suggestions from these sources were tested:

https://pve.proxmox.com/wiki/Performance_Tweaks

Disabled memory ballooning.
Changed the cache on the controller to writeback.
Disabled the use of the tablet for the pointer.
Attempted to set the trace flag for SQL to T8038, although this option couldn't be set up in MS-SQL Express Advanced 2008 R2.
https://forum.proxmox.com/threads/windows-mssql-workloads-on-pve.120718/#post-524518

https://blog.zanshindojo.org/proxmox-zfs-performance/

https://forum.proxmox.com/threads/mssql-on-zfs.56992/

Any suggestions for fine-tuning would be greatly appreciated. If no solutions are found, it might be necessary to revert to LVM due to performance issues in MSSQL.

Thank you in advance.
 
What a pity. Yes the idea behind disabling it would be to check if write speed increases by not writing sync stuff twice. Those non-enterprise SSDs are very slow with respect to sync writes, so it could be the culprit, yet as you mention reading is slow, this cannot be this problem.
 
Yes, it seems so. I have requested the vendor whose application it is to check with their developer about the MSSQL-DB and the application, to determine what help they might need from our side in the VM. It appears that all options have been tried out. let see if I get any update/fix or workaround would update in forum, thanks for help.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!