ZFS Performance on SATA Disks

sahostking

Renowned Member
Hi guys

Installed a few servers using SATA disks no SSDs in RAID10 using proxmox installer. Its for internal use but want the best performance I can get.

Servers run the following:

Intel Xeon E5-1620 3.5GHz
6 x 1 TB SATA Enterprise disks at 7200
ZFS install in raid10 ofcourse
64GB ECC memory where I limited the ZFS and arc max conf to use 24GB
atime set to off and primarycache changed to metadata
Also set swappiness to 10 as per proxmox wiki.

Now I moved some from our hardware raid setup to this server with OpenVZ and having some lags at times. Even the few KVM VPS servers seem to "freeze" randomly 1 a day for some odd reason.

Will this help:

http://letsgetdugg.com/2009/10/21/zfs-slow-performance-fix/

I see and quote this part:

"
SATA disks do Native Command Queuing while SAS disks do Tagged Command Queuing, this is an important distinction. Seems like OpenSolaris/Solaris is optimized for the latter with a 32 wide command queue set by default. This completely saturates the SATA disks with IO commands in turn making the system unusable for short periods of time.

Dynamically set the ZFS command queue to 1 to optimize for NCQ.

echo zfs_vdev_max_pending/W0t1 | mdb -kw
And add to /etc/system

set zfs:zfs_vdev_max_pending=1
Enjoy your OpenSolaris server on cheap SATA disks!"

How do I check this?
 
Last edited:
It won't help, ZFS is not ready for production VM storage.

Why? There are a lot of people here (including me) with a slightly other opinion.
 
To OP, I have been running ZFS on SATA disks for years and have never had any performance issues.

In fact, I see SAS as nothing but a waste of money these days.
 
Will this help:

http://letsgetdugg.com/2009/10/21/zfs-slow-performance-fix/

I see and quote this part:

"
SATA disks do Native Command Queuing while SAS disks do Tagged Command Queuing, this is an important distinction. Seems like OpenSolaris/Solaris is optimized for the latter with a 32 wide command queue set by default. This completely saturates the SATA disks with IO commands in turn making the system unusable for short periods of time.
You did remember to read the update?
Update: The following information could be beneficial to some, however my issues actually were with Caviar black drives shipping with TLER disabled. You need to pay Western Digital a premium for their “RAID” drives with TLER enabled. So for anyone reading this, avoid consumer Western Digital drives if you plan on using them for RAID.
 
I'd argue that not only is it ready, but that it also is VASTLY superior to other alternatives, like iSCSI and has been for years.
I'm talking about LOCAL target/storage.

Just compare benchmarks RAW+ext4 with ZFS on same hardware. ZFS will be much slower and there's nothing you can do with that - there's tons of complaints on official ZoL tracker (github.com/zfsonlinux) and devs have not focused yet on that problems. + there's ZFS fragmentation which you can't beat after a while - only destroy/create pool.

If you host 1-2 VMs with similar workload - it's OK, if you host 40-50 different VMs with NTFS/ext4 and other guests FS inside - it's a big problem comparing to RAW+ext4.

Cache SSD/ZIL - nothing helps, there's just problems 'by design" in Linux realization of ZFS
 
Last edited:
@melanch0lia: Yes, you're not wrong, but you cannot compare raw+ext4 to ZFS only in terms of speed. ZFS will most certainly loose, as will brtfs or any cluster filesystem.

Yet, if you compare features, ZFS is the real deal. Raw+ext4 does not provide Snapshots, COW, incremental send/receive, checksumming, multi-tiering, volume manager, online shrink and reduce (per volume), thin-provisioning, compression, deduplication, etc.

These features will never be as fast as raw+ext4, how could they? ZFS is used because of its features, not because of its speed.
 
@melanch0lia: Yes, you're not wrong, but you cannot compare raw+ext4 to ZFS only in terms of speed. ZFS will most certainly loose, as will brtfs or any cluster filesystem.

Yet, if you compare features, ZFS is the real deal. Raw+ext4 does not provide Snapshots, COW, incremental send/receive, checksumming, multi-tiering, volume manager, online shrink and reduce (per volume), thin-provisioning, compression, deduplication, etc.

These features will never be as fast as raw+ext4, how could they? ZFS is used because of its features, not because of its speed.
I want to say that native ZFS on Oracle Solaris works much faster than Linux ZFS. That's why I say that ZFS on Linux is not ready yet as production storage. Also BTRFS in terms of speed can be more likely equal compared to ext4+RAW.

And you can disable whole bunch of that features - ZoL will be slower anyway. Just because it is ZoL.

I highly recommend to use LVM/LVM Thin where it's possible.
 
Last edited:
You did remember to read the update?

Don't confuse SATA and TLER. There are plenty of SATA drives that have TLER, this is not a SAS thing. I currently use WD RED drives, which do have TLER.

In fact, WD has multiple lines of Enterprise class SATA drives with TLER, including their Re, Se and Gold lines. What's more they also have more drive cache than their SAS counterparts, and may even be a bit faster.

You are referring to using CONSUMER SATA drives in ZFS, which do not have TLER. You used to be able to enable TLER on WD's Caviar Black's, but these days that is no longer possible.

Let's discuss what TLER actually is for a moment.


Consumer drives are intended to be used as single drives. Because of this, any read error is a real problem. To try to minimize this problem Consumer drives, when they encounter a read error try, try and try again to re-read the sector, hopefully rescuing the data from it. This process takes lots of time, (in some cases several minutes) during which the drive is not responsive to the operating system. It is completely unnecessary on a drive in a RAID array, because the data in that sector that can't be read can easily be recreated from parity.

All TLER (which stands for Time Limited Error Recovery) is a time limit for how long the drive spends doing error recovery. On a consumer drive it will keep trying for several minutes until exhaustion and it finally gives up. On an enterprise drive with TLER it tries for a few seconds or less, and if it cant read the sector sends a read error to the Raid Controller/OS and the data is recreated from parity instead.

Keep in mind, this also means that drives with TLER probably SHOULD NOT be used as individual system drives, as they are more prone to data loss than a consumer drive without TLER, so I wouldn't go using those old RAID drives in a desktop as single drives.

The reason extended error recovery is a problem with RAID is because the drive becomes unresponsive for the several minutes it is trying to read the sector. Most hardware RAID cards will assume that the drive is dead, and automatically drop it from the array (which we really don't want). ZFS does not do this, but it does wind up slowing down and causing performance problems while the drive tries to read the sector.

All that being said, I ran WD Green drives without TLER in my first ZFS setup. I wouldn't recommend it for a production system, but for a low cost home system that only used ZFS for file server type duties it worked fine. When I had a drive failing and it started doing the extended error recovery reads it would result in very low performance, (and video stutters since I had my media server read from that drive) but it wasn't a huge deal, as I then quickly got a replacement drive and resilvered the pool. I wouldn't use these drives to host VM Images, though, as that kind of performance drop might have significant issues on the VM's running on it.

So, anyway, to sum things up:

TLER has nothing to do with the SAS vs. SATA debate. (though I don't think I've ever seen a SAS drive without TLER). There are plenty of Enterprise and home NAS SATA drives with TLER. TLER is a good idea on ZFS (but not as important as it is with traditional hardware RAID) and definitely needed for pools where there are real consequences to performance problems (like running VM images off of them), but on lower performance, file server type pools, you can get away with drives without TLER, understanding that it may slow things down significantly when a drive experiences a read error.
 
  • Like
Reactions: correctonet
I want to say that native ZFS on Oracle Solaris works much faster than Linux ZFS. That's why I say that ZFS on Linux is not ready yet as production storage. Also BTRFS in terms of speed can be more likely equal compared to ext4+RAW.

And you can disable whole bunch of that features - ZoL will be slower anyway. Just because it is ZoL.

Have you tested this recently?

I have never used ZFS on Solaris, but I used the BSD port of ZFS for many years, and it is an older, more mature port that is widely considered on par with the Solaris original.

I actually saw INCREASED performance when I exported my BSD based pool and imported it in Proxmox under ZoL.

I'd be curious if you could link anything you are using as a basis for your argument.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!