ZFS Slow write performance

coKO

Member
Oct 26, 2020
3
0
6
42
Hello,

Can someone advice how to handle this issue:

I have recently added new node to one cluster to improve IO and write speeds used by VMs running apps /obsolete HR and Accounting software/ and I have decided to use SSD drives. Configuration is as follows:
DELL R730 2x2660v4 /128GB DDR4 RAM, Perc H730p runing in HBA mode
Storage:
2 x 600 GB SAS HDD in ZMirror for PVE
6 x 1920 GB SATA SAMSUNG DATACENTER SSD PM893 in RaidZ2

I found that write speeds are extremely low and have done some fio tests. Results are pretty bad if I test the pool. For direct device tests there are no issues
Few day later one of SSD degraded and a have exchanged it with new one and decided to do some test with different configurations. I tried with almost all possible RaidZ pool types Z2, Z10, MIrror, even Single disk and in each type write speeds are terrible, for example:

RaidZ pool type Single disk:
fio --ioengine=libaio --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 --runtime=20 --time_based --name seq_read --size=2G --name=Z-SNGL --size=2G

Result: WRITE: bw=3762KiB/s (3853kB/s), 1837KiB/s-1925KiB/s (1881kB/s-1972kB/s), io=73.5MiB (77.1MB), run=20005-20008msec

Direct device test:
fio --ioengine=libaio --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 --runtime=20 --time_based --name seq_read --size=2G --name=/dev/sdc --size=2G

Result: WRITE: bw=119MiB/s (125MB/s), 10.1MiB/s-109MiB/s (10.6MB/s-115MB/s), io=2391MiB (2507MB), run=20001-20012msec

----

Perc H730p is with latest firmware

---
I have no idea what to do and fix. The only possibility for me is to test with another HBA

Regards
NG
 
Did you use ashift=12 to create the pool? Fio params are a bit unusual. Why 4k blocksize with sequential writes? For IOPS you would want to use bs=randwrite and for throughput rw=write bit with a big blocksize like bs=1M.
Also keep in mind that enterprise SSDs can cache sync writes, so you will have to write alot to really fill up the RAM and SLC caches to not only benchmark caches but real NAND performance. I usually write dozens of even hundreds of GBs to make sure not to just benchmark caches.

But thats indeed a very steep performance drop, even for ZFS with its massive overhead.
 
Did you use ashift=12 to create the pool?
Yes
Fio params are a bit unusual. Why 4k blocksize with sequential writes? For IOPS you would want to use bs=randwrite and for throughput rw=write bit with a big blocksize like bs=1M.
With the same params on different node with ZFS Z2 pool, ashift=12 = compression - 6 x 1.92TB Intel S4510 results are even better than direct drive tests /from above/.

I will do some RAM tests as well to test the harwdware for potential faults..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!