3 x 4TB Samsung SSD in ZFS raidz1 => poor performance

vaschthestampede

Active Member
Oct 21, 2020
118
7
38
38
I have a server with 3 SSD Samsung SSD 860 EVO 4TB .
I configured them with ZFS in raid1.

Intermittently but more and more frequently I have performance problems, l'IO delay exceeds 30%.

Where am I wrong?

Is ZFS suitable for this?
Is Ceph better?
 
Try SM or PM line of Samsungs SSDs; they have capacitors and will behave much much better.
These desktop drives you are using are to slow for ZFS.

If you choose (t)LVM and / or hardware raid, IO delay will come down, but not with ZFS.
 
Hardware raid is not a option.

I'm going to upgrade to server-grade SSDs.

I think about ditching ZFS.

Is Ceph better?
 
Well, depends on what you mean "better". For me personally it is not, It is slower and more complex in my experience.
While you can not do true HA with ZFS (replication is scheduled), and it is quite slow, I prefer to use it for production as I really need the features it has.

Also consider not using RAIDZ, but mirror (1+0, 1) option. Will be much faster.
 
If it's slower then it's not worth it.

The system must manage more than 40 VMs and the weak link is storage.
Sometimes a single VM using the disk is enough to block all the others.
 
Related question; Would it be worth while using a combination of ZFS and a "regular" file system. E.g. use ZFS for Proxmox host (rpool) and data storage (HDD pool) and use a single drive for VMs/LXCs as they can easily be backed up with snapshots?
 
Related question; Would it be worth while using a combination of ZFS and a "regular" file system. E.g. use ZFS for Proxmox host (rpool) and data storage (HDD pool) and use a single drive for VMs/LXCs as they can easily be backed up with snapshots?
That is an option if you don't need high availability but keep in mind that your VMs aren't protected against data corruption like bit rot if you are not using ZFS/CEPH. Your backups won't help you to restore corrupted files, because without the ability to scrub you will never know if a file is already damaged before you back it up. So its not unlikely that all your backups will only contain corrupted versions of your VMs.
 
Last edited:
  • Like
Reactions: GazdaJezda
To make ZFS run well, the top things:
  • Get enterprise-class drives, SSDs if possible
  • Use striped mirrors (ZFS equivalent of RAID10), the more disks the better
  • Make sure ashift is correct
  • Read and learn how ZFS works. It's complex.
Second-tier things, won't make as much a difference but will help:
  • Use a fast PCI SSD as SLOG
  • Use virtio drivers in VMs
  • Use separate store for PVE host OS and VM storage
Third-tier things, possibly helpful but address the above first:
In fact, read everything you can from Jim Salter at Ars Technica and jrs-s.net. He's probably the leading expert in this ZFS + KVM space.

~
If you don't want to spend money or put in major effort:
  1. Format all drives as ext4
  2. Use one drive for the PVE host OS
  3. Use one drive for VM storage. Again, ext4! Not ZFS!
  4. Use the third drive to take regular backups.
  5. Back up somewhere off the machine and also off-site if you care about your data.
Good luck!
 
That is an option if you don't need high availability but keep in mind that your VMs aren't protected against data corruption like bit rot if you are not using ZFS/CEPH. Your backups won't help you to restore corrupted files, because without the ability to scrub you will never know if a file is already damaged before you back it up. So its not unlikely that all your backups will only contain corrupted versions of your VMs.
Yes, I guess so, but since the actual data (files, media and whatnot) used by the VMs/LXCs would be stored on the HDD ZFS pool would it matter if you got a corrupted file on a VM or LXC? In that case it would most likely be a "system" related file which should not be that critical anyway as you could always rebuild the VM or LXC container as long as the data is intact on the storage pool.
 
I use ZFS raidz1 also for rpool and there is fine, no performance issue.

For HA it seems that ceph is the best option, am I missing any options?
 
Yes, I guess so, but since the actual data (files, media and whatnot) used by the VMs/LXCs would be stored on the HDD ZFS pool would it matter if you got a corrupted file on a VM or LXC? In that case it would most likely be a "system" related file which should not be that critical anyway as you could always rebuild the VM or LXC container as long as the data is intact on the storage pool.
The question then is, where do you put the DBs? On the SSD where data can get corrupted or on the secure HDD ZFS pool what will be hit hard by all the sync IOPS. Even if you don't run a webserver with MySQL, most programs will use some kind of databases in the background to store their stuff.
 
The question then is, where do you put the DBs? On the SSD where data can get corrupted or on the secure HDD ZFS pool what will be hit hard by all the sync IOPS. Even if you don't run a webserver with MySQL, most programs will use some kind of databases in the background to store their stuff.
The DBs would then be on the SSDs as part of the VM data, which could get corrupted yes, but probably nothing that could not get rebuilt quite fast I think. As much complexity using ZFS on the VM drive seems to introduce to me it feels like a bigger risk as long as one cannot afford very expensive SSD / NVMe drives and have extensive knowledge on how to configure the Proxmox/ZFS/VMs correctly for that type of setup.
 
New to proxmox here. Thanks to this thread and some others I have disappointingly learned I did not do enough pre-research coming into my build. Came at this with desktop experience and now have to sell my 4 Samsung SSDs. Thanks to OP for bringing up this topic and those who fleshed out the options here.
 
New to proxmox here. Thanks to this thread and some others I have disappointingly learned I did not do enough pre-research coming into my build. Came at this with desktop experience and now have to sell my 4 Samsung SSDs. Thanks to OP for bringing up this topic and those who fleshed out the options here.
You are not the first! :) Although I must say, since I have no experience with this myself yet, I am getting really confused based on the feedback I am getting. There are equally many people who states that this is not an issue as there are people preaching doomsday scenarios if using consumer SSDs. I like to be at the cautious side, but those enterprise drives can set you back quite a bit depending on your setup. I've got replies from several running with consumer Samsung SSDs with ZFS on heavy load servers for several years and stil have a bunch of lifetime left on the SSDs afterwards, so it is difficult to know what to believe without first hand experience.
 
The original point was not the life time of the disk but the performance.

In my case even just one VM can block all the others.
 
The original point was not the life time of the disk but the performance.

In my case even just one VM can block all the others.
With regards to enterprise SSD I actually thought it was the opposite when it comes to speed. I.e. that consumer drives are typically faster, but with lower lifetime than enterprise SSDs, but maybe I have misunderstood. Maybe someone can elaborate?
 
I don't think that's the problem.

I think it's more of a ZFS configuration problem or ZFS itself.

Is raidz1 that bad?
How and where to configure ZFS correctly.
 
I don't think that's the problem.

I think it's more of a ZFS configuration problem or ZFS itself.

Is raidz1 that bad?
How and where to configure ZFS correctly.

Perhaps you should provide some more details of your setup. Then it will be easier for people to help.

For instance how are you using the RAIDZ1 pool? Is it used for the Proxmox host, the VM images as data storage for the VMs? How have you configured you pool(s), what ZFS options have you set?
 
I have two zpools, one the host and one per the VM disks.
  • rpool (mirror), 2 x 500GB samsung 970 evo, for proxmox
  • fastSSD (raidz1), 3 x 4TB samsung 860 evo, for VM disks.
I have create the ZFS pools with the proxmox GUI.
No other configuration have been made.
 
You are not the first! :) Although I must say, since I have no experience with this myself yet, I am getting really confused based on the feedback I am getting. There are equally many people who states that this is not an issue as there are people preaching doomsday scenarios if using consumer SSDs. I like to be at the cautious side, but those enterprise drives can set you back quite a bit depending on your setup. I've got replies from several running with consumer Samsung SSDs with ZFS on heavy load servers for several years and stil have a bunch of lifetime left on the SSDs afterwards, so it is difficult to know what to believe without first hand experience.
That all really depends on your workload and setup. My homeserver is running 20 VMs and these are writing 900GB per day while idleing where most of the writes are just logs/metrics created by the VMs themself. Sum that up and a 1TB consumer SSDs TBW will be exeeded within a year.
If you are not using any DBs doing alot of small sync writes and if you skip raid/zfs and just use a single SSD with LVM it might survive for many years.
With regards to enterprise SSD I actually thought it was the opposite when it comes to speed. I.e. that consumer drives are typically faster, but with lower lifetime than enterprise SSDs, but maybe I have misunderstood. Maybe someone can elaborate?
Again, its all about the workload. Consumer SSDs are great if you need small bursts of sequential async reads/writes but the performance will drop massively as soon as the cache is filled up because of long sustained loads. In such a case a enterprise SSD will be much faster because the performance won't drop that hard. And most consumer SSDs won't be able to use caching at all for sync writes, so here the performance is always horrible.