[SOLVED] default zvol size smaller than pool size

gothbert

Member
Apr 3, 2021
27
4
23
45
Hi,

I just plugged four 1 TB SSDs in a server and installed latest Proxmox VE with ZFS.

Code:
root@vmserver:~# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 931.5G  0 disk
├─sda1   8:1    0  1007K  0 part
├─sda2   8:2    0   512M  0 part
└─sda3   8:3    0   931G  0 part
sdb      8:16   0 931.5G  0 disk
├─sdb1   8:17   0  1007K  0 part
├─sdb2   8:18   0   512M  0 part
└─sdb3   8:19   0   931G  0 part
sdc      8:32   0 931.5G  0 disk
├─sdc1   8:33   0  1007K  0 part
├─sdc2   8:34   0   512M  0 part
└─sdc3   8:35   0   931G  0 part
sdd      8:48   0 931.5G  0 disk
├─sdd1   8:49   0  1007K  0 part
├─sdd2   8:50   0   512M  0 part
└─sdd3   8:51   0   931G  0 part

The pool size is 3.62 TB as expected:

Code:
root@vmserver:~# zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
rpool  3.62T  1.41G  3.62T        -         -     0%     0%  1.00x    ONLINE  -

The available space on the ROOT and data zvols are only 2.55 TB, though:

Code:
root@vmserver:~# zfs list
NAME               USED  AVAIL     REFER  MOUNTPOINT
rpool             1.02G  2.55T      151K  /rpool
rpool/ROOT        1.02G  2.55T      140K  /rpool/ROOT
rpool/ROOT/pve-1  1.02G  2.55T     1.02G  /
rpool/data         140K  2.55T      140K  /rpool/data

I expected each zvol to max out the available pool size. Is there a reason for not having all the pool size available in a single zvol? And how can this be changed? Or am I just mistaken.

Thank you for a tip.

Best regards,
Boris
 
What kind of pool is it? zpool status?
 
Thank you, Aaron, for your reply.

It's a RAID-Z1 pool:

Code:
root@vmserver:~# zpool status
  pool: rpool
state: ONLINE
scan: none requested
config:

NAME                                                STATE     READ WRITE CKSUM
rpool                                               ONLINE       0     0     0
raidz1-0                                          ONLINE       0     0     0
ata-SanDisk_SSD_PLUS_1000GB_191177452012-part3  ONLINE       0     0     0
ata-SanDisk_SSD_PLUS_1000GB_205066452013-part3  ONLINE       0     0     0
ata-SanDisk_SSD_PLUS_1000GB_205066455305-part3  ONLINE       0     0     0
ata-SanDisk_SSD_PLUS_1000GB_205066442310-part3  ONLINE       0     0     0

errors: No known data errors

When looking for a confirmation on the web that roughly one disk is used for parity, I came along this calculator: https://wintelguy.com/zfs-calc.pl

Now I can answer my question myself. The total disk size is 4 * 1 TB = 4 TB. The zpool and zfs commands list the capacity in TiB, though, which gives 3.62 TiB for the raw pool and 2.55 TiB = 2.80 TB. This is somewhat below the expected amount of 3 TB but understandable given the reservation for parity and padding and the slop space allocation as listed on the calculator web site.
 
Okay. If you plan to place VMs on that pool, please be aware that raidz and datasets of type volume (used for VMs to provide a block device) have the mostly unexpected side effect to use up a lot of extra data for parity. Check this section in the documentation that talks about that. ZFS filesystem datasets don't suffer that badly from this problem as their record size can be up to 128k which leads to a much better data / parity ratio.

The TL;DR is, that for each block in a ZFS volume dataset, parity blocks of at least the ashift size need to be stored. We recommend to use RAID10 like pool layouts made up of several mirror vdevs to avoid this parity overhead and you will get better IOPS performance which is what you want with VMs.
 
Thank you, Aaron, for cautioning me about the undesired effects of RAIDZ1 when placing VMs on the pool, which I indeed plan to do. I have chosen RAIDZ1 in the first place because it provides redundancy while loosing only roughly one fourth of the capacity. Having done the maths again and having read the docs on the merits of RAID10 over RAIDZ1, I will go for RAID10.
 
  • Like
Reactions: aaron