(7.1) ZFS Performance issue

kromberg

Member
Nov 24, 2021
87
6
13
53
I have migrated a bare server over to proxmox just recently. The server is up to date running the non-production repo. The server has a HBA controller with (6) SATA3 6Gbs 4TB drives connected. A ZFS pool has been created with the (6) drives in raidz( raid5 ). In general I am seeing really poor performance from the pool. Reads I am only seeing about 150M/s. Writes start off around 120M/s and then fall off to around 60M/s to 70M/s. I had added an Intel S3700 SSD for cache and log devices, but that does not make a difference nor am I seeing the SSDs being used( at least in zpool iostat ). When the server was bare running Fedora 33 with the same setup( minus the SSD log and cache devices ) I was seeing around 500M/s reads and 300M/s writes.

I have looked around a researched a bit, but I can not find anything that would indicate what would cause the discrepancy between proxmox and running bare. Anyone have any ideas or things to look at?
 
First are HDDs never good as a VM storage, even if you add a SSD for caching. They just don't got enough IOPS.
Using raidz1 instead of a striped mirror makes it even worse, because 6 HDDs in a raidz1 got slower IOPSs than a single HDD.
And a SLOG or L2ARC don't always make your pool faster. They can also make your pool slower. You should try to remove them and see if the pool gets faster. The SLOG is only used for sync writes. Most writes that don't come from a DB should be async writes and there a SLOG won't help at all.
And with the L2ARC you are sacrificing very fast ARC for more of slower L2ARC. So you get more read cache but the read cache will be slower.
Also make sure that you are not using SMR HDDs. They won't work well with ZFS and should always be avoided anyway because of the crappy performance that can make a host unresponsive.
 
Last edited:
First are HDDs never good as a VM storage, even if you add a SSD for caching. They just don't got enough IOPS.
Using raidz1 instead of a striped mirror makes it even worse, because 6 HDDs in a raidz1 got slower IOPSs than a single HDD.
And a SLOG or L2ARC don't always make your pool faster. They can also make your pool slower. You should try to remove them and see if the pool gets faster. The SLOG is only used for sync writes. Most writes that don't come from a DB should be async writes and there a SLOG won't help at all.
And with the L2ARC you are sacrificing very fast ARC for more of slower L2ARC. So you get more read cache but the read cache will be slower.
Also make sure that you are not using SMR HDDs. They won't work well with ZFS and should always be avoided anyway because of the crappy performance that can make a host unresponsive.

Not sure I see the point of post as it is meaningless and contains nothing useful.....

After digging around a bit more I found that the default pool ashift of 12 produces block sizes of 4k while the default block size for a ZVOL is 8k for VM disks. Basically that is doubling the amount of work being done in the pool. I rebuilt the zfs pool with an ashift of 13 and things are performing much better with the block sizes aligned. Hopefully I can find a couple more tweaks that can be done.
 
Not sure I see the point of post as it is meaningless and contains nothing useful.....
You were complaining about poor performance and I told you alot of factors why your pool is slow and how to make it faster...
After digging around a bit more I found that the default pool ashift of 12 produces block sizes of 4k while the default block size for a ZVOL is 8k for VM disks. Basically that is doubling the amount of work being done in the pool. I rebuilt the zfs pool with an ashift of 13 and things are performing much better with the block sizes aligned. Hopefully I can find a couple more tweaks that can be done.
I could explain you why that is stupid and your pool is wasting 8TB of storage but sounds like you don't care about feedback...
 
Last edited:
  • Like
Reactions: wigor and 0bit
You were complaining about poor performance and I told you alot of factors why your pool is slow and how to make it faster...

I could explain you why that is stupid and your pool is wasting 8TB of storage but sounds like you don't care about feedback...
I am interested in feedback, but I was not asking what new hardware to buy and how to structure things. I simply asked why is there a performance different between proxmox and another OS using the same hardware. Obviously the usage of the hardware will be a bit different and that is what I am trying to figure out. Things that are useful are like what I found with the block sizes on the VM disks and the pool shift setting being different.
 
You want your pools block size (or in other words your zvols volblocksize) way bigger than the sector size (or in other words the ashift you have chosen) of your disks or you will loose most of your capacity due to padding overhead. You can't directly see the padding overhead because its indirect and only effects zvols and not datasets. So ZFS will report that you got 20TB of usable storage but everything written to a zvol will consume 66% more space, so after writing 12TB of data to your zvols these will be 20TB in size. And then don't forget that a ZFS pool will get slow as soon as it is more than 80% full, because ZFS uses Copy-on-Write so it always needs alot of unfragnemted free space to operate. So of that 12TB you now got, you can actually only use 9.6TB or 8.73TiB if you care about performance. And if you want to use snapshots these will need space too, so you might want to store even way less data on that pool (for example 6TiB of data if you want to reserve a third of the usable capacity for snapshots).

If you don't want that big padding loss you want your volblocksize atleast 8 times higher than your sector size. If you use a ashift=12 you get a sector size of 4K. So you want to increase your volblocksize to atleast 32K. If you are using a ashift=13 you want the volblocksize to be atleast 64K and so on.
If your volblocksize is only 1 or 2 times the sector size you loose 50% of your total raw capacity. If your volblocksize is 4 times the sector size you loose 33% of your total raw capacity. If your volblocksize is 8, 16 or 32 times the sector size you loose 20% of your total raw capacity.
So right now you loose the capacity of 1 of your 6 disks due to parity and the capacity of 2 of your drives to padding overhead and only 3 of the 6 disks are actually usable. So you got the same usable capacity as a striped mirror (raid10) but a striped mirror would be way faster and also would provide a better reliability.
So using raidz1 only would make sense if you atleast increase the volblocksize to 32K and then you get massive performance problems when your guests are doing reads/writes that are smaller than 32K. So stuff like a MySQL or posgres DBs will really suck on that pool because they do 8K or 16k sync writes.

If you do the maths behind ZFS for 6 disks it looks like this in theory (might be a bit different in reality because of compression and so on):
lost raw capacity:random IOPS @ 32k+ operations:random IOPS @ 4K operations:sequential write/read throughput @ 32K+ operations:sequential write/read throughput @ 4K operations:Drives may fail:
striped mirror (ashift=12; volblocksize=8k):50% (50% parity loss)3x1.5x3x / 6x1.5x / 3x1-3
raidz1 (ashift=13; volblocksize=8K):50% (17% parity loss + 33% padding loss)1x0.5x5x / 5x2.5x / 2.5x1
raidz1 (ashift=12; volblocksize=32k):20% (17% parity loss + 3% padding loss)1x0.125x5x / 5x0.625x / 0.625x1

So you could either increase the volblocksize making your performance way more worse for small reads/writes to get your capacity loss down from 50% to only 20% or you switch to a striped mirror with a way better performance with the same 50% capacity loss. But running that raidz1 with ashift=13 and 8K volblocksize is basically nearly the worst option you could choose.

If these 4TB drives are HDDs then they are already very bad at handling IOPS. And a VM storage mostly benefits from high IOPS, so the IOPS should be the most important performance factor that will be bottlenecking first and there a striped mirror would perform 3 to 12 times better, compared to a proper raidz1, resulting in a way lower IO delay speeding up your VMs. So raidz1 is really only an option if you want to sacrifice alot of performance for a bit more capacity or if you use that pool as a cold storage mostly doing very big async sequential reads/writes or if you use LXCs that will use datasets instead of zvols. If those are SSD a raidz1 might be not that bad as long as you don't want to run applications like DBs that do small sync writes. You still would get a very bad IOPS drop but this won't be that much bottlenecking, because a good SSD easily can do 1000 times the IOPS so even with bad IOPS performance that might be fast enough in real world workloads.

See this blog post of the ZFS head developer if you want to learn more about how raidz works in detail on block level and why there is padding overhead and how to calculate it: https://www.delphix.com/blog/delphi...or-how-i-learned-stop-worrying-and-love-raidz
 
Last edited:
  • Like
Reactions: 0bit
Cool, thanks for the info Dunuin. This is going to be real helpful. Let me a couple of the combos and I will report back.
 
Hi Guys,

I am reading this and I am terified

Some time ago I got reasonably priced 15HDD server from Hetzner.
Without much reading I installed Proxmox with RAIDZ2 and all default settings - a single pool as you might imagine

Just as mentioned above. LXC performance is acceptable
VM performance sucks - big time

there is no way I can get there installed SSD for any caching or log

I assume there could be options to set it better - like going with RAIDZ2 pool for cold storage and other pool for VM storage
but since it is ZFS it looks like I am stuck with it until I can find a replacement machine to move my live services there - at least temporarily

do I get it roughly correct?
what would be the best config I could apply to this machine - if I have a chance?

are there any other options I have besides waiting?
 
14 disk striped mirror (raid10) + 1 disk hot spare would give you 7 times the IOPS performance you got now. And you probably want to increase the vollblocksize to something way higher than the default 8K. Using a 15 disk raidz2 and the default 8K volblocksize you basically waste 66% of the raw capacity when storing VMs...
 
Last edited:
I should have read before starting :(
I take that my only two options are now:
1. backup and restore in different configuration (eg 4+1HDD as RAIDZ10+spare and 10HDD in RAIDZ2)
2. suck it up and use the server for cold storage nad LXC and get another one for VMs
 
1. backup and restore in different configuration (eg 4+1HDD as RAIDZ10+spare and 10HDD in RAIDZ2)
Won't help that much. IOPS performance scales with number of vdevs, not number of disks. So you want a lot of striped vdevs, especially when running VMs/LXCs on top of HDDs. So a 14 disk raid10 would be 7 striped mirrors so 7 vdevs and 7 times the IOPS performance of your 15 disk single vdev raidz2. A 4 disk raid10 is only two striped mirror, so 2 devs and only 2 times the IOPS performance of a raidz2.
 
Last edited:
Won't help that much. IOPS performance scales with number of vdevs, not number of disks. So you want a lot of striped vdevs, especially when running VMs/LXCs on top of HDDs. So a 14 disk raid10 would be 7 striped mirrors so 7 vdevs and 7 times the IOPS performance of your 15 disk single vdev raidz2. A 4 disk raid10 is only two striped mirror, so 2 devs and only 2 times the IOPS performance of a raidz2.
Ah I see

so basically smaller machine for VMs preferably on NVME :)
 
Preferable on Enterprise grade SSDs. I would highly prefer a ZFS mirror on Enterprise SATA SSDs over Consumer NVMe SSDs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!