[TUTORIAL] FabU: Can I use ZFS RaidZ for my VMs?

UdoB

Distinguished Member
Nov 1, 2016
2,596
1,333
243
Germany
Assumption: you use at least four identical devices for that. Mirrors, RaidZ, RaidZ2 are possible - theoretically.

Technically correct answer: yes, it works. But the right answers is: no, do not do that! The recommendation is very clear: use “striped mirrors”. This results in something similar to a classic Raid10.

(1) RaidZ1 (and Z2 too) gives you the IOPS of a single device, completely independent of the actual number of physical devices. For the “four devices, mirrored” approach this will double --> giving twice as many Operations per Second. For a large-file fileserver this may be not so important, but for multiple VMs running on it concurrently as high IOPS as possible are crucial!

(2) It is a waste of space because of padding blocks: Dunuin has described that problem several times, an extreme example for RaidZ3 : https://forum.proxmox.com/threads/zfs-vs-single-disk-configuration-recomendation.138161/post-616199 “A 8 disk raidz3 pool would require that you increase the block size from 8K (75% capacity loss) to 64K (43% capacity loss) or even 256K (38% capacity loss)“


There seem to be some counter arguments against “only mirrors”:

(3) Resiliency: "I will use RaidZ2 with six drives to allow two to fail. Mirrors are less secure, right?"

Yes. In a single RaidZ2-vdev any two devices may fail without data loss. In a normal mirror only one device may fail.

BUT: there are triple mirrors! These are being so rarely discussed that I need to mention them here explicitly. Let us compare that RaidZ2 with six devices:

(3a) the RaidZ2 will give us the performance of a single drive and the usable capacity of four drives. Two drives may fail.

(3b) the two vdev with triple mirrors gives us the IOPS of two drives for writing data + six fold read performance! Any two of each vdev may fail! (So up to four drive may die - but only in a specific selection.)

(4) Capacity: the only downside of (3) is that the capacity shrinks down to two drives.


Recommendation: for VM storage use a mirrored vdev approach. For important data use RaidZ2 or RaidZ3.

In any case note that “Raid” of any flavor and/or having snapshots does not count as a backup. Never!


See also:
 
Beginners often confuse hardware RAID5/6 with BBU (which can cache sync writes) with ZFS RaidZ1/2 (with unfortunate block size alignment on consumer drives) just because both can deal with one/two missing drive(s). The performance behavior is indeed completely different (as well as the supported feature set) and RaidZ, as you already explained, is mostly unsuitable for VMs.
 
Last edited:
It couldn't hurt to add that a single vdev stripe of multiple disks, whether it's a misconfiguration or a misunderstanding of the striped mirror concept, is the worst choice of all. Even worse than using a single disk because it at least doubles the failure rate.
 
  • Like
Reactions: UdoB and Johannes S
It couldn't hurt to add that a single vdev stripe of multiple disks, whether it's a misconfiguration or a misunderstanding of the striped mirror concept, is the worst choice of all. Even worse than using a single disk because it at least doubles the failure rate.
Yes, absolutely correct. For the interested reader, let me show you two basic examples:

This is the bad approach, it has zero redundancy - and if one device fails the whole pool is gone:
Code:
# zpool create dummypool /rpool/dummy/disk-a.img /rpool/dummy/disk-b.img /rpool/dummy/disk-c.img /rpool/dummy/disk-d.img 

# zpool status dummypool
  pool: dummypool
 state: ONLINE
config:

        NAME                       STATE     READ WRITE CKSUM
        dummypool                  ONLINE       0     0     0
          /rpool/dummy/disk-a.img  ONLINE       0     0     0
          /rpool/dummy/disk-b.img  ONLINE       0     0     0
          /rpool/dummy/disk-c.img  ONLINE       0     0     0
          /rpool/dummy/disk-d.img  ONLINE       0     0     0

While what we are recommending is this to use mirrors:
Code:
# zpool create dummypool  mirror /rpool/dummy/disk-a.img /rpool/dummy/disk-b.img  mirror /rpool/dummy/disk-c.img /rpool/dummy/disk-d.img 

# zpool status dummypool
  pool: dummypool
 state: ONLINE
config:

        NAME                         STATE     READ WRITE CKSUM
        dummypool                    ONLINE       0     0     0
          mirror-0                   ONLINE       0     0     0
            /rpool/dummy/disk-a.img  ONLINE       0     0     0
            /rpool/dummy/disk-b.img  ONLINE       0     0     0
          mirror-1                   ONLINE       0     0     0
            /rpool/dummy/disk-c.img  ONLINE       0     0     0
            /rpool/dummy/disk-d.img  ONLINE       0     0     0

:)