Hello,
I've posted a few threads about this sort of thing in the past as I've learned the difference between recordsize and volblocksize and what the various settings are best for, and feel like I've got a pretty good understanding of the basics. I wanted to post my understanding of volblocksize and recordsize in one place to see if I've finally got it straight in my head. Hopefully this will be useful for others just starting out as well.
Please let me know if I'm completely confused about something.
ZFS Workoad Tuning: https://openzfs.github.io/openzfs-docs/Performance and Tuning/Workload Tuning.html
I try to use these recommendations when I can, but sometimes I get confused a bit if I'm not doing soething that's an exact match.
Setup Details:
Recordsize - Dataset (CT storage)
What it is: Dataset fixed logical "block" size (not entirely correct, but a useful way to think about it if you're used to thinking about SSD sector sizes). Fixed after being set on dataset creation.
Sizing: Adjust to content of data being stored (e.g., database files, video files, etc.)
Useful sizes:
Volblocksize - zVols (VM storage)
Other Questions
I've posted a few threads about this sort of thing in the past as I've learned the difference between recordsize and volblocksize and what the various settings are best for, and feel like I've got a pretty good understanding of the basics. I wanted to post my understanding of volblocksize and recordsize in one place to see if I've finally got it straight in my head. Hopefully this will be useful for others just starting out as well.
Please let me know if I'm completely confused about something.
ZFS Workoad Tuning: https://openzfs.github.io/openzfs-docs/Performance and Tuning/Workload Tuning.html
I try to use these recommendations when I can, but sometimes I get confused a bit if I'm not doing soething that's an exact match.
Setup Details:
- 4xPM883 1.92 TB Enterprise SSDs
- 2 mirror vdevs in pool
- RAIDZ: Unused. I mention this because a lot of discussion assumes Z1/2/3 backing storage and answers are based on mitigating overhead related to that, which doesn't apply to mirrors.
Recordsize - Dataset (CT storage)
What it is: Dataset fixed logical "block" size (not entirely correct, but a useful way to think about it if you're used to thinking about SSD sector sizes). Fixed after being set on dataset creation.
Sizing: Adjust to content of data being stored (e.g., database files, video files, etc.)
Useful sizes:
- 8K: Postgres (though 16k can apparently be useful for "pre-faulting" the next page of Postgres data in "sequential scans." I don't know enough about Postgres to do anything but stick with the default rec. (See: https://vadosware.io/post/everything-ive-seen-on-optimizing-postgres-on-zfs-on-linux/ )
- 16K: MariaDB
- 1M:
- Big files (e.g., measured in hundreds of megabytes or gigabytes or more)
Examples: video files, some types of backups (e.g., non-incremental backups that just get bigger over time, like minecraft server backups stored in monolithic .TAR files, other types of monolithic backups), ISOs (?)
- Big files (e.g., measured in hundreds of megabytes or gigabytes or more)
- 1M Recordsize: How big does a single file have to be, of whatever type, before you want to start storing it on a 1M dataset? (That is, what is the minimum individual file size where the 1 MB recordsize starts actively helping you?)
- CT Storage for Linux Containers: What recordsize should be used for a Linux LXC's backing storage?
Volblocksize - zVols (VM storage)
- What it is: fixed logical "block" size (not entirely correct, but a useful way to think about it if you're used to thinking about SSD sector sizes). Fixed after being set on dataset creation.
- Sizing: Adjust to content of data being stored
- Now defaults to 16k under current ZFS as of Friday, March 1, 2024. Is there ever any reason to change this when using mirror vdev zpools?
- How exactly does PVE do this? I'm looking at this part of the manual: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_storage_types
- It says this, when using local ZFS: "Disk images for VMs are stored in ZFS volume (zvol) datasets, which provide block device functionality."
- I'm not sure what a zvol dataset is, as I thought those were two different things (do they mean, "zvol stored inside a dataset?" That would make more sense, since the dataset stores the volblocksize property.).
Other Questions
- Qcow2 vs. RAW format for VM disks: My disk storage has always been local ZFS, thin provisioned, and I only ever see RAW as an available option. Is this actually a problem? Am I doing something wrong? Are thre implications for volblocksize/recordsize? EDIT: I've read the PVE-Admin Guide's NFS Storage section, and now see that qcow2 is usable over NFS to enable snapshots and clones, as per: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#storage_nfs. So unless there's some really good reason not to, I probably want to use qcow2 over NFS shared storage?
- Local ZFS vs. Shared Storage (NFS connection to TrueNAS, using ZFS):
- Obviously, the ZFS settings would be done on TrueNAS here.
- Are there any special considerations when using TrueNAS/ZFS-over-NFS re: recordsize/volblocksize?
- Shared Storage via iSCSI: I've never used iSCSI with Proxmox before. Does this have any impact on the recordsize/volblocksize/anything else related?
Last edited: