ZFS Configuration planning for Virtual Machines

Null0N

New Member
Jul 23, 2020
5
0
1
27
Hello All,

First time post here, please go easy on me if anything is amiss. Also sorry for the long post.

I'm a long time Windows user, with very little experience with ZFS or Linux in general. I'm looking into moving away from my Hyper-V setup and migrating over to Proxmox as my primary Hypervisor, and sort out my mix of storage while I'm at it.

Currently, I run Hyper-V and a Windows fileserver on a single baremetal install, then virtualise out the following roles: Domain controller, WSUS, AV management, CA, VPN server and network controller. This leads to a load of problems if there's ever a crash or unexpected shutdown as the Hyper-V server boots without access to the DC among other things.

I've recently moved all my storage to Windows Storage Spaces with Tiering, in order to take advantage of a recent 10Gb network upgrade. It works great for simple volumes without redundancy, frequently getting 400MB/s+ read and writes for a 250GB SSD caching an 8TB drive right now. The issue comes when looking at parity setups. Frankly, when it comes to parity in storage spaces, it's trash. Terrible write performance and needlessly convoluted ways to enable SSD caching to try to alleviate some of the performance issues.

So, to ZFS it is. It seems to just be a better option when looking at parity setups, as well as Proxmox being a lighter weight hypervisor compared to full fat Windows Server.

Here's my planned storage setup. This is still very much in planning and won't be happening very soon:

Boot/root: 2x 120/240GB SSD in RAID1
VM Storage: 2x 500GB SSDs in RAID1
Pool 1: 4x 6/8TB HDD in RAIDZ (might be 8TB, might be 6TB drives)
Pool 2: 4x 6TB HDD in RAIDZ
Cache: 2x 1TB NVMe SSD. Potentially in RAID1 or just a single drive for each pool.

I plan on virtualising a Windows Server install to act as the file server, so pretty much all of the data from the two pools are going to be assigned to that VM and then shares will go out from there. I'll also probably have another SSD array for some shares that don't need that much capacity, like redirected user profiles. I'm set on sticking to Windows based VMs. I partially use these VMs as a UAT environment for work as my job is entirely based on Windows server, save for the limited usage of ESXi and vSphere/Center.

Storage usage: It's mostly media and images, but also games (not large, latency sensitive, but rather visual novels and less demanding stuff) and various documents. My PCs storage and access pretty much everything from the file server.

My goal mostly here is to determine the best way of caching the hard drive pools to take advantage of my 10Gb network. I frequently write large 10GB+ files to the fileserver, so having a large SSD write buffer which then offloads to the HDDs would be ideal.

In terms of redundancy, I can tolerate losing whatever is in cache and hasn't been offloaded yet. Very little of the data is critical and can be re-obtained, and the stuff that is has multiple backups and doesn't change frequently. Still, losing data is not ideal, so I would like some level of fault tolerance. I've lost 4TB of data in the past when I had some HDDs in RAID0, which was a learning experience. I also have a mix of 2TB and 4TB drives that would be removed from the current file server and re-purposed into a backup server of sorts.

If you could help me with what would be optimal in terms of ZIL and L2ARC, that would be much appreciated, as well as any glaring errors you can spot.

System specs:
Intel Xeon E5-2680 v2 (10C20T)
Asus X79 Deluxe
32GB DDR3 1600MHz (this can be upgraded to 64GB if needed. Also, I'm sorry, but it's not ECC. Unbuffered DDR3 ECC is just too hard to come by at a reasonable price)
LSI SAS9211-8I (Flashed to IT mode. Maybe x2, as drives will be connected via 2 SAS backplanes, 4 drives from each backplane. Might have 2 so if one fails, I can move to the other)
Asus XG-C100C 10Gb NIC
860W Platinum PSU
4U Case with 2x SAS backplanes, hosting 4 hotswap drives on each.
 

Q-wulf

Well-Known Member
Mar 3, 2013
612
37
48
my test location
Just some things that come to mind on top my head:

  1. Memory. As per https://pve.proxmox.com/pve-docs/chapter-pve-installation.html#_zfs_performance_tipsyou should be using 4 GB + 1 TB per Raw Disk Space.
    1. A Quick Calculation shows that you will have at least 8x6 TB in RaidZ(1) split on two pools. That makes your minimum requirement for ZFS to be 52 Gigs of Ram (4+8x6x1). if your maximum instzallable ram is 64 Gig, that does not leave a lot of ram for VMs on a production system.
  2. NON-ECC Ram. If you use ANY software Raid-like functionality that does Parity calculations you will need ECC-Ram. For ZFS specifically that means if you use RaidZ(1-3) you will need ECC-Ram. The reason behind it being that ZFS does expect the data in ram to be 100% correct. If it is not, your parity calculations are wrong and your data is borked.
 
Last edited:

aaron

Proxmox Staff Member
Staff member
Jun 3, 2019
2,839
434
88
Boot/root: 2x 120/240GB SSD in RAID1
VM Storage: 2x 500GB SSDs in RAID1
Pool 1: 4x 6/8TB HDD in RAIDZ (might be 8TB, might be 6TB drives)
Pool 2: 4x 6TB HDD in RAIDZ
Cache: 2x 1TB NVMe SSD. Potentially in RAID1 or just a single drive for each pool.

Why do you split up the pool1 and 2 like that? You can add multiple vdevs to a single pool and thus have one big pool. Thus, you can avoid arbitrary size limits.
Secondly, be aware that using any RAIDz (no matter which parity level) does not yield the best performance for VMs and you might be surprised how much space is used for parity data, depending on the block size used when creating the VMs disk. See https://forum.proxmox.com/threads/zfs-counts-double-the-space.71536/#post-320919 for a quick explanation.

I would go ahead and create a pool made of mirror vdevs (RAID 10 like). No extra parity data and much better performance.

If you could help me with what would be optimal in terms of ZIL and L2ARC, that would be much appreciated, as well as any glaring errors you can spot.

The ZIL really does not need to be large! It only stores a few seconds of sync writes before they are written down to the slow spinning HDDs. It is good though if it is consistently fast. That's why an Intel Optane is a good choice for it. In a personal server of mine the ZIL is 4 GB large and usually only a few hundred MB are used. Sometimes when doing disk heavy stuff, around 1 GB is used.

The 2 1TB NVME SSDs that you consider as cache: no need for redundancy, if it fails you will only lose the cache but no data. Also be aware to have plenty of RAM as the L2ARC also needs its space in RAM.

Honestly though, before investing in these I would first spend the money on more RAM for the ARC itself. Use some monitoring system and keep an eye on the ARC hit rate. If you have enough RAM you should be able to get close to 100%, meaning that most read requests can be satisfied from RAM without accessing the slow disk underneath it.

As already mentioned, you should get quite a bit more RAM in there. 32 GB won't make you happy. Depending on how much you want to give the VMs even 64 GB might be a bit on the short side.

Even an NVME SSD is considerably slower than RAM.

If you cannot get close too a 100% ARC hit rate during normal operations, then an L2ARC might help to increase the performance. But be aware that it needs some time to warm up as it will be empty after a fresh boot.

Another thing: The rumor that ZFS really needs ECC is just that. A rumor. If you like your data, ECC is a good choice no matter which file system you use. This is actually more true for file systems that do not checksum everything as you won't be able to detect bad data written to disk.
 

LnxBil

Famous Member
Feb 21, 2015
5,955
727
133
Germany
NON-ECC Ram. If you use ANY software Raid-like functionality that does Parity calculations you will need ECC-Ram. For ZFS specifically that means if you use RaidZ(1-3) you will need ECC-Ram. The reason behind it being that ZFS does expect the data in ram to be 100% correct. If it is not, your parity calculations are wrong and your data is borked.

That has nothing to do with ZFS. You will loose your data either way with any filesystem if you have corrupt memory. ZFS is no difference.
 

LnxBil

Famous Member
Feb 21, 2015
5,955
727
133
Germany
Boot/root: 2x 120/240GB SSD in RAID1
VM Storage: 2x 500GB SSDs in RAID1
Pool 1: 4x 6/8TB HDD in RAIDZ (might be 8TB, might be 6TB drives)
Pool 2: 4x 6TB HDD in RAIDZ
Cache: 2x 1TB NVMe SSD. Potentially in RAID1 or just a single drive for each pool.

Why such a complicated setup with 4 pools??? Create one SSD and one HDD pool (with metadata SSDs) and you will have a fast setup.
 

Null0N

New Member
Jul 23, 2020
5
0
1
27
Why do you split up the pool1 and 2 like that? You can add multiple vdevs to a single pool and thus have one big pool. Thus, you can avoid arbitrary size limits.
Secondly, be aware that using any RAIDz (no matter which parity level) does not yield the best performance for VMs and you might be surprised how much space is used for parity data, depending on the block size used when creating the VMs disk. See https://forum.proxmox.com/threads/zfs-counts-double-the-space.71536/#post-320919 for a quick explanation.

I would go ahead and create a pool made of mirror vdevs (RAID 10 like). No extra parity data and much better performance.



The ZIL really does not need to be large! It only stores a few seconds of sync writes before they are written down to the slow spinning HDDs. It is good though if it is consistently fast. That's why an Intel Optane is a good choice for it. In a personal server of mine the ZIL is 4 GB large and usually only a few hundred MB are used. Sometimes when doing disk heavy stuff, around 1 GB is used.

The 2 1TB NVME SSDs that you consider as cache: no need for redundancy, if it fails you will only lose the cache but no data. Also be aware to have plenty of RAM as the L2ARC also needs its space in RAM.

Honestly though, before investing in these I would first spend the money on more RAM for the ARC itself. Use some monitoring system and keep an eye on the ARC hit rate. If you have enough RAM you should be able to get close to 100%, meaning that most read requests can be satisfied from RAM without accessing the slow disk underneath it.

As already mentioned, you should get quite a bit more RAM in there. 32 GB won't make you happy. Depending on how much you want to give the VMs even 64 GB might be a bit on the short side.

Even an NVME SSD is considerably slower than RAM.

If you cannot get close too a 100% ARC hit rate during normal operations, then an L2ARC might help to increase the performance. But be aware that it needs some time to warm up as it will be empty after a fresh boot.

Another thing: The rumor that ZFS really needs ECC is just that. A rumor. If you like your data, ECC is a good choice no matter which file system you use. This is actually more true for file systems that do not checksum everything as you won't be able to detect bad data written to disk.
Splitting the pools that way primarily came from lack of knowledge on vdev setup. I've been researching more last night and came across vdev striping, so yes, my end setup would be as you suggested for the vdevs. I do wonder though, can you setup 1 vdev, use it, then later create the 2nd and then stripe them, or do you need to destroy the existing one and rebuild to do that?

It was my intention to have the main OS drives for each of the VMs on just SSDs. The vdevs would almost all be passed through/allocated to one fileserver, which would then share out the storage. All the other VMs have very little storage requirements and for the most part aren't very intensive. The file server VM is the priority as it's the primary use case for the server.

My current RAM usage for the VMs is around 8GB total, though it will jump up a bit when one of them is actively processing stuff, such as approving/handing out updates through WSUS.

Again, with ZIL and SLOG, this was from a lack of knowledge or misunderstanding how they actually work. From what I understand now, in terms of write caching, RAM is really what matters, as even with SLOG and sync set to always, it will still write to RAM and SLOG.. I think? My idea was to try to get the SSD(s) to act as a write buffer, mainly for large sequential writes, as most random writes can be absorbed by RAM, but there doesn't seem to be a simple way of getting that set up.

In terms of ARC and L2ARC, I think I'll have to play around with it. I have a VM running Proxmox at the moment to get familiar with the interface and learn some level of Linux CLI, but can't really do much with it other than that. I can create VMs in it but cannot launch then nested.

In regards to ECC, I've seen it thrown around a lot that it's 100% essential and it's often talked about like the entire filesystem is GOING to fail right away if you don't use it. But then, like you said, some people I know with extensive experience both with ZFS and enterprise storage systems, say there's nothing really about ZFS that makes it any different to other file systems that makes ECC any more or less important.

It's a good option, yes. But it's just too cost prohibitive and, for the most part, not really supported 100% on the X79 platform, even with the Xeon. I'll be backing up most things, so if it does fail due to a memory error, I rebuild. Not a massive deal as this is entirely personal use.


Why such a complicated setup with 4 pools??? Create one SSD and one HDD pool (with metadata SSDs) and you will have a fast setup.
After some more research, I'd probably consolidate the SSDs, creating a partition for boot and the rest for VM OS disks. For the HDDs, I'd now be leaning towards two RAIDz vdevs in a single pool. As for cache, I'll have to play around with it to find out how to best use it. It seems to work fundamentally different to what I'm used to with Windows Storage Spaces.


I'm still yet to see if ZFS is right, or even viable for my use case. I may end up just using Proxmox as a hypervisor and mostly just passing disks through to a WIndows server to manage them through storage spaces.
 

LnxBil

Famous Member
Feb 21, 2015
5,955
727
133
Germany
From what I understand now, in terms of write caching, RAM is really what matters, as even with SLOG and sync set to always, it will still write to RAM and SLOG.. I think?

Yes, and that is not what you want. The async write is normally only done in memory, so setting sync=always will increase your storage IO load.

My idea was to try to get the SSD(s) to act as a write buffer, mainly for large sequential writes, as most random writes can be absorbed by RAM, but there doesn't seem to be a simple way of getting that set up.

large sequential writes are not significantly faster on SSD and depending on your hdd backend, they can be slower than on hdds. In ordinary filesystems, large sequential read/writes are bypassed by the cache, because it does not make sense to cache these things, especially if they are much larger than the cache and would only evict useful cache date. In ZFS, you have a better caching mechanism, but in the end, if your throughput is much larger than your cache, you will not and cannot cache a lot.

In regards to ECC, I've seen it thrown around a lot that it's 100% essential and it's often talked about like the entire filesystem is GOING to fail right away if you don't use it. But then, like you said, some people I know with extensive experience both with ZFS and enterprise storage systems, say there's nothing really about ZFS that makes it any different to other file systems that makes ECC any more or less important.

You can forget what other people say and stick to the link I posted, that is the guy who invented and programmed ZFS, if he doesn't know, no one does. ECC is always better, but it is not necessary. You can even use ZFS on non-ECC-system and have better error correction than on any other system (how to do that, see the link and let Matt explain).

In terms of ARC and L2ARC, I think I'll have to play around with it. I have a VM running Proxmox at the moment to get familiar with the interface and learn some level of Linux CLI, but can't really do much with it other than that. I can create VMs in it but cannot launch then nested.

The crucial part is to monitor the buffer cache hit rate. If it is not high enough, you will not gain anything and will certainly better off not using L2ARC, because it also restricts your ARC. Also note that L2ARC is built on the go while the OS is running, so a reboot will invalidate everything and you need to start again (automatically in the background).

It's a good option, yes. But it's just too cost prohibitive and, for the most part, not really supported 100% on the X79 platform, even with the Xeon. I'll be backing up most things, so if it does fail due to a memory error, I rebuild. Not a massive deal as this is entirely personal use.

What is not supported on the Xeon platform? If you're talking about ECC, it is supported for decades.

I'm still yet to see if ZFS is right, or even viable for my use case. I may end up just using Proxmox as a hypervisor and mostly just passing disks through to a WIndows server to manage them through storage spaces.

If you're more comfortable with MSSS, then use them, but you they are not as advanced as ZFS, which is the most advanced filesystem on this planet (we know off) and over 100 years of engineering went into it, but it has a steep learning curve. If you're interested in reading more about it, I can recommend the ZFS books from Allan Jude and Michael Lucas.
 

Null0N

New Member
Jul 23, 2020
5
0
1
27
Thank you for the very detailed response.

What is not supported on the Xeon platform? If you're talking about ECC, it is supported for decades.
ECC's certainly supported on the Xeon platform, but not when using the X79 chipset (as opposed to C series chipsets for Sandy/Ivybridge Xeons). Unbuffered ECC kind of works, but it's hit and miss.

I've actually returned the Xeon I have and I'm moving to X99 with a v3 Xeon. Mostly because DDR4 is just cheaper and more readily available nowadays and I can move platform while keeping RAM if I want to. Some of the boards support ECC RDIMMs up to 128GB max so I might pick one up if ones comes up at a decent price.

Yes, and that is not what you want. The async write is normally only done in memory, so setting sync=always will increase your storage IO load.



large sequential writes are not significantly faster on SSD and depending on your hdd backend, they can be slower than on hdds. In ordinary filesystems, large sequential read/writes are bypassed by the cache, because it does not make sense to cache these things, especially if they are much larger than the cache and would only evict useful cache date. In ZFS, you have a better caching mechanism, but in the end, if your throughput is much larger than your cache, you will not and cannot cache a lot.



You can forget what other people say and stick to the link I posted, that is the guy who invented and programmed ZFS, if he doesn't know, no one does. ECC is always better, but it is not necessary. You can even use ZFS on non-ECC-system and have better error correction than on any other system (how to do that, see the link and let Matt explain).



The crucial part is to monitor the buffer cache hit rate. If it is not high enough, you will not gain anything and will certainly better off not using L2ARC, because it also restricts your ARC. Also note that L2ARC is built on the go while the OS is running, so a reboot will invalidate everything and you need to start again (automatically in the background).



What is not supported on the Xeon platform? If you're talking about ECC, it is supported for decades.



If you're more comfortable with MSSS, then use them, but you they are not as advanced as ZFS, which is the most advanced filesystem on this planet (we know off) and over 100 years of engineering went into it, but it has a steep learning curve. If you're interested in reading more about it, I can recommend the ZFS books from Allan Jude and Michael Lucas.
I'm going to play around with this. I think I'm going to move to Proxmox as a hypervisor to avoid having a mutli-purpose server running on bare metal. Right now it's a massive pain to take down the main server for updates, should be much easier with the relatively small attack surface that Proxmox has, in comparison to Windows Server Desktop at least.

I'll probably give ZFS a go and see how the system handles it. I'm actually leaning more towards a mirror+stripe/RAID 10 setup for the hard drives for the increased seq read/write performance. This is pretty basic through ZFS, but MSS doesn't really support it as far as I can tell. I like how MSS seems to handle SSD caching though, and it actually benefits from a large cache as it will just store more on the SSD as space allows. It does seem to bypass the SSD for write-back cache (default WBC is 1GB in MSS) for large sequential writes, but that's somewhat a non-issue with RAID 10 on 8 drives as the throughput should be pretty good,

I'm wondering about the option of having a ZFS striped mirror, passing that through to Windows and using MSS to cache it. Not sure if that would mess with how ZFS works, or if many people have documented trying it before.
 

LnxBil

Famous Member
Feb 21, 2015
5,955
727
133
Germany
ECC's certainly supported on the Xeon platform, but not when using the X79 chipset (as opposed to C series chipsets for Sandy/Ivybridge Xeons). Unbuffered ECC kind of works, but it's hit and miss.

I've actually returned the Xeon I have and I'm moving to X99 with a v3 Xeon. Mostly because DDR4 is just cheaper and more readily available nowadays and I can move platform while keeping RAM if I want to. Some of the boards support ECC RDIMMs up to 128GB max so I might pick one up if ones comes up at a decent price.

Ah, on consumer boards. I understand. I haven't and will never use Xeon on those, the enterprise grade hardware is cheap if you buy it used and you will get a ton of features.

I'm actually leaning more towards a mirror+stripe/RAID 10 setup for the hard drives for the increased seq read/write performance.

Always best performance, especially in comparison to RAIDz.

I'm wondering about the option of having a ZFS striped mirror, passing that through to Windows and using MSS to cache it. Not sure if that would mess with how ZFS works, or if many people have documented trying it before.

The question is what ZFS brings to the table if you also use MSS inside of the VM. You have two technologies on top of each other. If you also passthrough the SSD, so that you can use cache, you cannot snapshot the ZFS anymore without corrupting the storage space. I would stick to one technology and use what it offers.
 

Null0N

New Member
Jul 23, 2020
5
0
1
27
Ah, on consumer boards. I understand. I haven't and will never use Xeon on those, the enterprise grade hardware is cheap if you buy it used and you will get a ton of features.
True, when they come up. I'm watching a used board right now that has manufacturer validated support for ECC RDIMM that might go for a decent price.

I'm just in full on research mode now. Even if I don't use ZFS in the end, it's very interesting how it works. Since reading more, I'm now further leaning towards multiple mirror vdevs, which should make the sequential performance where I want it.

I'm now just researching how ZFS manages cache. Found a video of a talk one of the OpenZFS devs did on how ZFS caching works so watching through that.

Always best performance, especially in comparison to RAIDz.
From what I'm understanding of mirrored vdevs, it's generally a better performing option than RAIDz and while it has worse worst-case redundancy compared to RAIDz2, it's less susceptible to further failures when rebuilding as there's no parity calculations when resilvering, it's just a straight copy from another drive in the vdev, more or less. Also easier to expand it seems.

The question is what ZFS brings to the table if you also use MSS inside of the VM. You have two technologies on top of each other. If you also passthrough the SSD, so that you can use cache, you cannot snapshot the ZFS anymore without corrupting the storage space. I would stick to one technology and use what it offers.
I thought it might be the case that it removes some features from ZFS.

With the increased threats of ransomware, having frequent snapshots setup would be a big benefit from ZFS.
 

LnxBil

Famous Member
Feb 21, 2015
5,955
727
133
Germany
From what I'm understanding of mirrored vdevs, it's generally a better performing option than RAIDz and while it has worse worst-case redundancy compared to RAIDz2, it's less susceptible to further failures when rebuilding as there's no parity calculations when resilvering, it's just a straight copy from another drive in the vdev, more or less. Also easier to expand it seems.

The worst-case redundancy can be - if necessary - be worked against with a tipple mirror, so you have even more read performance and two drives can fail.
 

LnxBil

Famous Member
Feb 21, 2015
5,955
727
133
Germany
With the increased threats of ransomware, having frequent snapshots setup would be a big benefit from ZFS.

Best is in combintation with a fileserver, e.g. SAMBA that can read the snapshots directly from ZFS. You can use it from windows with the ordinary "file history" dialog.
 

Null0N

New Member
Jul 23, 2020
5
0
1
27
The worst-case redundancy can be - if necessary - be worked against with a tipple mirror, so you have even more read performance and two drives can fail.
Issue there is getting into less than 50% capacity efficiency.

I'm not worried about the data to the point of going that far. Even 50% is painful for the data that's being stored. Having to replace a single drive when one fails isn't an issue, especially if the rebuild is quick.

Best is in combintation with a fileserver, e.g. SAMBA that can read the snapshots directly from ZFS. You can use it from windows with the ordinary "file history" dialog.
That's neat. I'll look into that.

I mentioned it before, but basically all of the mass storage will be passed through to a file server VM. All other VMs will run from regular SSD drives, possibly in RAID 1 and they require very little storage (Max 100GB for the most space-hungry VM I have, which is my AV management server). The mass capacity storage is used for my desktop PCs and shared media storage.
 

Q-wulf

Well-Known Member
Mar 3, 2013
612
37
48
my test location
Secondly, be aware that using any RAIDz (no matter which parity level) does not yield the best performance for VMs

Always best performance, especially in comparison to RAIDz.

Is there any hard data on the severity ? If i did read this [POST] correct, the overhead is <=2x for a RaidZ2 (8k data + 2x4k parity calculations just for the writes)
e.g. a 8 SSD-disk Raidz2 vs Raid10 ?
 
Last edited:

LnxBil

Famous Member
Feb 21, 2015
5,955
727
133
Germany
If i did read this [POST] correct, the overhead is <=2x for a RaidZ2 (8k data + 2x4k parity calculations just for the writes)

It depends on the recordsize/volblocksize, yes. If you're unlucky, every disk only gets e.g. 1k of a big block, but ashift implies 4K, so you're wasting 3k of space for each block. If you're lucky and have big recordsizes, if does not matter so much. It just depends on your data.

e.g. a 8 SSD-disk Raidz2 vs Raid10 ?

Performancewise, there is a huge impact (4 stripe-mirror vs. RAIDz2 with 8 drives) - logic taken from Jude, Lucas FreeBSD Mastery: ZFS, page 42
  • Read IOPS of 8 drives vs 1 drive
  • Write IOPS of 4 drives vs 1 drive
  • Read MB of 8 drives vs 6 drives
  • Write MB of 4 drives vs. 6 drives (RAIDz2 is actually faster)
but there is imho no "waste calculator" for raidz2 (yet).
 

Q-wulf

Well-Known Member
Mar 3, 2013
612
37
48
my test location
Thanks. that book was a good read.
Espeically the section about "Six to Twelve Disks".

my question is this: This logic was based on Spinners (HDD's) and most likely can not be translated to SSD's on a 1:1 ratio regarding streaming read/writes (due to ther enot being read/write heads for SSDs). But for IOPS this should be accurate, and scale the same ?

So if IOPS are your concern and 1 SSD would have a 500k Read and a 500k write IO; You'd be using 8 of them. you'd gain the following IO-characteristics :
4x 2 disk mirror: 3000k Read IOpS and 1500k write IOpS.
1x8 Disk Raidz1: 500k Read IOpS and 500k Write IOpS.
1x8 Disk Raidz2: 500k Read IOpS and 500k Write IOpS.
1x8 Disk Raidz3: 500k Read IOpS and 500k Write IOpS.

The only noticeable difference between RaidZ[1-3] would be that your streaming read/write bandwith would sink gradually; altho your Streaming writes would be higher for RaidZ1 and RaidZ2 and equal for raidZ3 compared to a same-sized Mirror.

So if IO per second are your concern --> Mirrors. Example: VM's
If your Streaming Read/writes are your concern --> RaidZ1 or RaidZ2; at RaidZ3 you might as well do a mirror, since you have the same space waste, same streaming writes, but double streaming reads and significantly higher Read/Write IOpS. Example here would be e.g. Network Security Cam recording.

That is about the gist of it, right ?
 

LnxBil

Famous Member
Feb 21, 2015
5,955
727
133
Germany
So if IO per second are your concern --> Mirrors. Example: VM's
If your Streaming Read/writes are your concern --> RaidZ1 or RaidZ2; at RaidZ3 you might as well do a mirror, since you have the same space waste, same streaming writes, but double streaming reads and significantly higher Read/Write IOpS. Example here would be e.g. Network Security Cam recording.

That is about the gist of it, right ?

Yes, that is a good gist.
In Addtion: the fault tolerance is also one concern: more devices can fail and resilver stresses more disks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!