Sanity Check on volblocksize and recordsize settings for VMs and supporting datasets on ZFS 2.2.0 Mirror VDEV Pool?

SInisterPisces · Mar 1, 2024

Hello,

I've posted a few threads about this sort of thing in the past as I've learned the difference between recordsize and volblocksize and what the various settings are best for, and feel like I've got a pretty good understanding of the basics. I wanted to post my understanding of volblocksize and recordsize in one place to see if I've finally got it straight in my head. Hopefully this will be useful for others just starting out as well.

Please let me know if I'm completely confused about something.

ZFS Workoad Tuning: https://openzfs.github.io/openzfs-docs/Performance and Tuning/Workload Tuning.html
I try to use these recommendations when I can, but sometimes I get confused a bit if I'm not doing soething that's an exact match.

Setup Details:

4xPM883 1.92 TB Enterprise SSDs
2 mirror vdevs in pool
RAIDZ: Unused. I mention this because a lot of discussion assumes Z1/2/3 backing storage and answers are based on mitigating overhead related to that, which doesn't apply to mirrors.

Recordsize - Dataset (CT storage)
What it is: Dataset fixed logical "block" size (not entirely correct, but a useful way to think about it if you're used to thinking about SSD sector sizes). Fixed after being set on dataset creation.
Sizing: Adjust to content of data being stored (e.g., database files, video files, etc.)
Useful sizes:

8K: Postgres (though 16k can apparently be useful for "pre-faulting" the next page of Postgres data in "sequential scans." I don't know enough about Postgres to do anything but stick with the default rec. (See: https://vadosware.io/post/everything-ive-seen-on-optimizing-postgres-on-zfs-on-linux/ )
16K: MariaDB
1M:
- Big files (e.g., measured in hundreds of megabytes or gigabytes or more)
  Examples: video files, some types of backups (e.g., non-incremental backups that just get bigger over time, like minecraft server backups stored in monolithic .TAR files, other types of monolithic backups), ISOs (?)

Questions

1M Recordsize: How big does a single file have to be, of whatever type, before you want to start storing it on a 1M dataset? (That is, what is the minimum individual file size where the 1 MB recordsize starts actively helping you?)
CT Storage for Linux Containers: What recordsize should be used for a Linux LXC's backing storage?

Volblocksize - zVols (VM storage)

What it is: fixed logical "block" size (not entirely correct, but a useful way to think about it if you're used to thinking about SSD sector sizes). Fixed after being set on dataset creation.
Sizing: Adjust to content of data being stored
Now defaults to 16k under current ZFS as of Friday, March 1, 2024. Is there ever any reason to change this when using mirror vdev zpools?
How exactly does PVE do this? I'm looking at this part of the manual: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_storage_types
- It says this, when using local ZFS: "Disk images for VMs are stored in ZFS volume (zvol) datasets, which provide block device functionality."
- I'm not sure what a zvol dataset is, as I thought those were two different things (do they mean, "zvol stored inside a dataset?" That would make more sense, since the dataset stores the volblocksize property.).

Other Questions

Qcow2 vs. RAW format for VM disks: My disk storage has always been local ZFS, thin provisioned, and I only ever see RAW as an available option. Is this actually a problem? Am I doing something wrong? Are thre implications for volblocksize/recordsize? EDIT: I've read the PVE-Admin Guide's NFS Storage section, and now see that qcow2 is usable over NFS to enable snapshots and clones, as per: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#storage_nfs. So unless there's some really good reason not to, I probably want to use qcow2 over NFS shared storage?
Local ZFS vs. Shared Storage (NFS connection to TrueNAS, using ZFS):
1. Obviously, the ZFS settings would be done on TrueNAS here.
2. Are there any special considerations when using TrueNAS/ZFS-over-NFS re: recordsize/volblocksize?
Shared Storage via iSCSI: I've never used iSCSI with Proxmox before. Does this have any impact on the recordsize/volblocksize/anything else related?

Nemesiz · Mar 5, 2024

I will ask a question for thought.

If recordsize or volblocksize is set to 1M, what is the minimum read IO for that block even need part of it only?

For example MariaDB with 16K database storage on recordsize/volblocksize with 1M.

Will ZFS read 16K of request or all 1M of the block before return the data to the software?

LnxBil · Mar 6, 2024

SInisterPisces said:
Recordsize - Dataset (CT storage)
What it is: Dataset fixed logical "block" size (not entirely correct, but a useful way to think about it if you're used to thinking about SSD sector sizes). Fixed after being set on dataset creation.

No, it's not fixed, it's dynamic and can be changed afterwards and on demand. The recordsize sets the maximum size and is a therefore an upper bound. If you write less, it'll write less. The actual stored data is (after compression) the ceil of amount of bytes divided by the ashift value (in a mirror). So having bigger record sizes will increase the probability to save some space by compressing it.

The compression part is also true for anything, so having e.g. a volblocksize of 8K and ashift of 4K (12), you'll only save space by compression if the compressed block is <4K. Having a ashift of 512 (8) is much better for archieving higher compression rates than 4K, yet only applicable on disks with 512n blocksize.

Search

Search

Sanity Check on volblocksize and recordsize settings for VMs and supporting datasets on ZFS 2.2.0 Mirror VDEV Pool?

SInisterPisces

Active Member

Nemesiz

Renowned Member

LnxBil

Distinguished Member