Disk (SSD )Performance Question

eagle2020

Member
Aug 18, 2020
22
0
6
54
I know there are tons of questions, but I couldn’t find a satisfying answer.
For performance testing I’ using an old server:
HP Dl380 Gen 10 Server, just one Xeon E5-2609 installed, 16GB RAM.
Proxmox is installed on a hardware raid (scsi hdd).
Two consumer SSD (Crucial MX500), each one with one zfs pool (just for testing!!).
Only one Windows 2019 Server VM is runnimg on the 1st zpool. The 2nd virtual disk of the VM is on the 2nd zpool.
I just copied a 20GB File from 1st zpool to the 2nd zpool.
I thought as I’m using a single disk (no raidz) the transfer rate should be almost as fast as on a physical server (maybe 10 % less because of overhead)
But I only got 10 MB/sec!
Can the consumer SSD be the reason? I cannot image, the overhead is that high?
Why is the transfer rate that slow?

Frank
 
So, I was busy last days, I just checked again:
I already installed 'virt_io_win_driver_installer' ans 'qemu guest client'.
Are there other drivers I have to install?
 
I thought as I’m using a single disk (no raidz) the transfer rate should be almost as fast as on a physical server (maybe 10 % less because of overhead)
But I only got 10 MB/sec!
Those MX500 aren't fast (especially when doing sync writes, as they can't cache them or optimize data before writing to NAND) and ZFS is causing a lot of overhead. MAybe you should try LVM-Thin if you don't want any raid. It got way less overhead.
Are there other drivers I have to install?
Those two should be fine.
 
Those MX500 aren't fast (especially when doing sync writes, as they can't cache them or optimize data before writing to NAND) and ZFS is causing a lot of overhead. MAybe you should try LVM-Thin if you don't want any raid. It got way less overhead.

Those two should be fine.
In fact, I was hoping that the MX500 is the bottleneck. This setup was just for testing. In my productive system I would use server SSD.
I just thought, that the difference was not that high (because in a desktop client the MX500 performs well)
Thanks for your help :)
 
You shouldn't compare ZFS to your usually ext4 or ntfs filesystem.
You can benchmark it yourself. I did that and have seen a write amplification between factor 3 (big sequential async writes) and 82 (4k random sync writes) depending on the workload. If writing 1 GB of data in the worst-case scenario actually writes 82GB of data to the NAND of the SSD and your SSD can only handle 550MB/s, you don't have to wonder when it can only achieve 7.2 MB/s (because 550 MB/s divided by write amplification of factor 82). And that also means that the common TBW of a 1TB consumer TLC SSD of 600TB could be exceeded after only writing 7.3TB (600TB TBW divided by 82 write amplification factor). Lets say it is 5 year warrany OR 600TB written. With such a high write amplification that would mean you can't write more than an average of 46.4kb/s or you will loose your warrany before the 5 years are over because the TBW would be reached...so yeah...its not just money making that it is highly recommended to only use way more durable and performant Enterprise SSD with ZFS...
 
Last edited:
In fact, by now, I was not aware of 'write amplification'. But anyway, I only copied one 10 GB from one disk to an empty one, so I didn't expect such a slow-down.
Just one question: do you think, in real world (Proxmox cluster + storage cluster with zfs, of course with enterprise SSD) zfs is fast enough to serve for multiple VMs with at least 3 SQL servers? Or should I stick with an ext4 cluster (losing the snapshot feature) and a hardware raid?
I'm not convinced, Proxmox with zfs is fast enough especially for SQL databases, even though using recommended hardware and configured properly.
 
snapshot is supported in ext4 because VM'vDisks are stored on a datastore with different fs format : lvmthin.
 
ZFS should be fine as long as you get proper hardware and don't screw up the storage setup. For example, don't think of using a raidz1/2/3 when running DBs as the primary workload because this would limit IOPS performance to the performance of a single drive and you would need to increase the volblocksize way too high for DBs in order to not loose too much capacity due to padding overhead.
2, 4, or 8 enterprise/datacenter grade NVMes for mixed or write intense workloads in a striped mirror would give plenty of performance...even with the big overhead of ZFS.
 
ZFS should be fine as long as you get proper hardware and don't screw up the storage setup. For example, don't think of using a raidz1/2/3 when running DBs as the primary workload because this would limit IOPS performance to the performance of a single drive and you would need to increase the volblocksize way too high for DBs in order to not loose too much capacity due to padding overhead.
2, 4, or 8 enterprise/datacenter grade NVMes for mixed or write intense workloads in a striped mirror would give plenty of performance...even with the big overhead of ZFS.
Sounds promising, I was proposed to use 4 way mirror whith nvme drives, using a separate vdev for the databases!
Does a separate SLOG make sense in connection with 4-way-mirror?
 
I kow, but for the Proxmox Backup Server I need ZFS as far as I know.
PBS doesn't rely on filesystem.
it has his own mechanism for snapshot/encryption.
pve (will ask to qemu process) to read all vDisks of vm then upload to pbs.
on PBS, it receive data then write to backup datastore in many parts distributed in the 65k subfolders of the hidden directory .chunks.
more explanations on Proxmox Backup Server Reference Documentation.

edit: typo, Your DL380 is Gen8 instead Gen10
 
Last edited:
Does a separate SLOG make sense in connection with 4-way-mirror?
Only if that SLOG is faster then your NVMes. So probably not. Maybe if you get a fast optane.
PBS doesn't rely on filesystem.
it has his own mechanism for snapshot/encryption.
pve (will ask to qemu process) to read all vDisks of vm then upload to pbs.
on PBS, it receive data then write to backup datastore in many parts distributed in the 65k subfolders of the hidden directory .chunks.
more explanations on Proxmox Backup Server Reference Documentation.

edit: typo, Your DL380 is Gen8 instead Gen10
Jup, ZFS is only required if you want to use HDDs for data + SSDs for metadata.
 
Last edited:
Only if that SLOG is faster then your NVMes. So probably not. Maybe if you get a fast optane.

Jup, ZFS is only required if you want to use HDDs for data + SSDs for metadata.
Concerning SLOG:
to my understanding, when using zfs, each data is written twice to the disks:
It needs to be written to the ZIL and to the pool - and if the ZIL is within the pool, this reduces my pool performance.
If I'm using zfs for Proxmox, I definitely want to use sync writes.
So, even if my SLOG is just as fast as my pool, I should see an increase in performance, because the Intent Log does not need to be written to the pool disks but to the separate SLOG (so I’m taking load from the pool) – is this right?
Or do I have a misunderstanding?
 
Concerning SLOG:
to my understanding, when using zfs, each data is written twice to the disks:
It needs to be written to the ZIL and to the pool - and if the ZIL is within the pool, this reduces my pool performance.
If I'm using zfs for Proxmox, I definitely want to use sync writes.
So, even if my SLOG is just as fast as my pool, I should see an increase in performance, because the Intent Log does not need to be written to the pool disks but to the separate SLOG (so I’m taking load from the pool) – is this right?
Or do I have a misunderstanding?
Yes, that is correct. But don't forget that striping more disks also increases the write IOPS of your pool so that the result may be the same.
Lets say you just got 4 identical disks to work with. You could either use option A or B.

Option A): 2 disks in a mirrored normal vdev + mirrored SLOG
Will write data once to SLOG + once to normal vdev. It indeed should be faster than just a normal vdev 2 disk mirror without a SLOG as the normal vdevs will be hit by half the IO.

Option B): 4 disks in a striped mirror normal vdev without SLOG
Will have to write data twice to normal vdevs but its two mirrors striped together and not just a single mirror so you should get double the IOPS performance. Performance should be comparable to option A as the normal vdevs need to write double the amount of data but that with double the speed.

A SLOG might even slow your pool down. Lets say you got a 8-disk striped mirror so you got 4 times the IOPS performance of a single disk. And then you have a SLOG with just 1 times the IOPS performance of a single disk. So having the ZIL on the normal vdevs should result in double the speed.
Here you would need to use a disk as a SLOG that got atleast got 4 times the IOPS performance of your disks used for the normal vdevs.
 
Yes, that is correct. But don't forget that striping more disks also increases the write IOPS of your pool so that the result may be the same.
Lets say you just got 4 identical disks to work with. You could either use option A or B.

Option A): 2 disks in a mirrored normal vdev + mirrored SLOG
Will write data once to SLOG + once to normal vdev. It indeed should be faster than just a normal vdev 2 disk mirror without a SLOG as the normal vdevs will be hit by half the IO.

Option B): 4 disks in a striped mirror normal vdev without SLOG
Will have to write data twice to normal vdevs but its two mirrors striped together and not just a single mirror so you should get double the IOPS performance. Performance should be comparable to option A as the normal vdevs need to write double the amount of data but that with double the speed.

A SLOG might even slow your pool down. Lets say you got a 8-disk striped mirror so you got 4 times the IOPS performance of a single disk. And then you have a SLOG with just 1 times the IOPS performance of a single disk. So having the ZIL on the normal vdevs should result in double the speed.
Here you would need to use a disk as a SLOG that got atleast got 4 times the IOPS performance of your disks used for the normal vdevs.
Thank you for your clarification :)
 
SLOG might still be a good idea in some cases. Have for example a look at the sync write IOPS and TBW/DWPD of a "Intel Optane SSD DC P5800X 400GB, U.2". Its SLC NAND so fastest IOPS (except for RAM) and most durable flash storage you can get. So might be more durable and cheaper to get fewer bigger but slower SSDs + an Optane SLOG instead of buying just more SSDs for more stripes to get the same sync write performance or life expectation.
 
SLOG might still be a good idea in some cases. Have for example a look at the sync write IOPS and TBW/DWPD of a "Intel Optane SSD DC P5800X 400GB, U.2". Its SLC NAND so fastest IOPS (except for RAM) and most durable flash storage you can get. So might be more durable and cheaper to get fewer bigger but slower SSDs + an Optane SLOG instead of buying just more SSDs for more stripes to get the same sync write performance or life expectation.
In fact, my next Proxmox- + storage cluster is going to be designed by professionals. I hope they choose the appropriate disks.
But I have to care about costs and most important, performance!
Nobody can guarantee the system performance when using several SQL servers. And when the system is delivered, I can't return is the performance is not sufficient!
So I have asked them for a zfs based storage, because I thought zfs is needed when using a ProxmoxBackupServer.
But now I think, a LVM based storage is better suited for performance reason. I get same performance with lower price.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!