Feature request: volblocksize per zvol

Dunuin

Distinguished Member
Jun 30, 2020
13,836
4,049
243
Germany
What I'm really missing is a option to set the volblocksize for single zvols. Now it isn't really possible to optimize the storage performance even if ZFS itself would totally allow that.

Lets say I got this scenario:

ZFS Pool: ashift=12, two SSDs in a mirror
VM: 1st zvol is storing the ext4 root fs and a volblocksize of 4K would be fine so it matches the 4K of the ext4. 2nd zvol is only storing a MySQL DB doing reads/writes as 16K blocks, so I would like to use a 16K volblocksize. 3rd zvol is only storing big CCTV video steams so I would like the volblocksize to be 1M.

This all can be achieved by manually creating the zvols (for example with zfs create -V 32gb -o volblocksize=16k rpool/vm-100-disk-1) and then using qm rescan 100 and attaching the manually created zvols to the VM with VMID 100. Its a bit annoying to do it that way but it works.
But it only works as long as you don't need to restore that VM from backup.

Because right now it works like this:
There is only one place where you can setup a blocksize and this is the ZFS storage itself (GUI: Datacenter -> Storage -> YourZFSPool -> Edit -> "Block size" textfield) and everytime a new zvol gets created it will be created with a volblocksize value that is set in that ZFS storages "Block size" textfield. That means you can only set a global volblocksize for the entire ZFS storage and this is especially bad because the volblocksize can only be set at creation and can't be changed later.

Now lets say I got my custom VM with three zvol with a volblocksize of 4K, 16K and 1M. Then I need to restore it from backup. Restoring a backup will first delete the VM with all its zvols and then create a new VM from scratch using the same VMID based on the data stored in the backup. But all zvols will be newly created and PVE creates them with the volblocksize set for the entire ZFS storage. Lets say the ZFS storages "Block Size" is set to the default 8K, so my restored VM will now use a 8k volblocksize for all 3 disks no matter what volblocksize was used previously.

What I would like to see is a global ZFS storage blocksize you can set at the ZFS storage like it is now but that this will only be used if not overwritten by the VMs config. Then I would like to see a new "block size" field when creating a new virtual Disk for a VM and this value should be stored in the VMs config file like this: scsi1: MyZFSPool:vm-100-disk-0,cache=none,size=32G,blocksize=16k
Each time PVE needs to create a new zvol (adding another disk to a VM or doing a migration or restoring a backup) it could then first look at the VMs config file to see if a blocksize for the virtual disk is defined there. If it is, it will create that zvol with the value defined there as the volblocksize. If it is not defined there it will use the global ZFS storages blocksize as a fallback.

That way nothing would change if you don't care about the volblocksize, as you can continue using the global blocksize defined by the ZFS storage. But if you care about it you could overwrite it for specific virtual disks.

One workaround for me was to create several datasets on the same pool and adding each dataset as its own ZFS storage. Lets say I got a "ZFS_4k", a "ZFS_16k" and a "ZFS_1M" ZFS storage. I could then give each ZFS storage its own blocksize and I could define what volblocksize to use for each Zvol by storing it on the right ZFS storage.
But this also got two downsides:
1.) you get alot of ZFS storages if you need alot of different block sizes
2.) when restoring a VM you can only select a single storage where all virtual disks will be restored to. With the example above I then would need my three zvols of that VM to be on three different ZFS storages. After a restore all three zvols would be stored on the same ZFS storage, so I would need to move two of them after that causing additional downtime and SSD wear.

Someone elso would like to see this feature?
How hard would such a feature be to implement? Looks easy to me for ZFS but maybe it might be problematic because other storages would need to be supported too?
Should I create a feature request in the bug tracker or is there already a similar feature request?
 
Last edited:
+1 from a new PVE/ZFS user.

I'd enjoy this as well, as I'd have to create less datasets on my pool. :) I know I'm going to have disks that need different volblocksize values, so I really wanted to figure this out before I went further in the tutorial I'm using.

More seriously, I spent several hours figuring out how to do this because I assumed that since volblocksize is a per-zVol property, I should be able to set it per zVol.

Luckily, I randomly hit just the right Google result to figure out how to do it, otherwise I'd still be stuck.
 
You kind of "can" do this.
Set up your ZPOOL with whatever block size
Set up VMs
Change blocksize to desired for some VMs
Move storage (or backup and restore) the VMs you want on this blocksize from and back to the zpool
I do not know WHY this works but I can confirm this allows for granular blocksize of different VMs/zvols within the pool (my understanding was that this would not change without a pool rebuild, or completely emptying the pool first)
 
You kind of "can" do this.
Set up your ZPOOL with whatever block size
Set up VMs
Change blocksize to desired for some VMs
Move storage (or backup and restore) the VMs you want on this blocksize from and back to the zpool
I do not know WHY this works but I can confirm this allows for granular blocksize of different VMs/zvols within the pool (my understanding was that this would not change without a pool rebuild, or completely emptying the pool first)
Creating the zvols with different volblocksize isn't the problem. I can also do that directly by creating a new zvol with desired blocksize by running "zfs create -V -o volblocksize=16k" and then add that zvols as a disk to the VM by editing the config file with "nano /etc/pve/qemu-server/VMID.conf".

The problem is that a backup restore or migration will create the zvol with the volblocksize whatever is set as "Block Size" for the ZFS storage at the time doing the restore/migration and won't care what volblocksize was used before backing it up/migrating it.

So you would have to do your workaround each time you restore or migrate a VM. Thats very unconvinient and requires additional documentation of what disk of what VM should use what volblocksize.
 
Last edited:
  • Like
Reactions: VictorSTS
It is unlikely that we will make properties configurable per disk image that are specific to a storage type. I understand the appeal, especially if all you use is one or two storage types that you know very well.

But once you need to move/migrate/restore the disk image to another storage type, how do we handle that property? There might be a property that could be considered "the same", but it will likely have slightly different behavior and/or performance impacts that can be unexpected.
What if the new storage type has no property that comes close? Do we drop it from the config or leave it, letting the user think that it is configured, even though it is actually ignored?

Multiple ZFS datasets with their respective Proxmox VE storage configuration that specifies the volblocksize is a valid approach that avoids ambiguities.
 
  • Like
Reactions: SInisterPisces
But once you need to move/migrate/restore the disk image to another storage type, how do we handle that property? There might be a property that could be considered "the same", but it will likely have slightly different behavior and/or performance impacts that can be unexpected.
What if the new storage type has no property that comes close? Do we drop it from the config or leave it, letting the user think that it is configured, even though it is actually ignored?
It could be named selfexplanatory, something like "scsi1: MyZFSPool:vm-100-disk-0,cache=none,size=32G,zfs_volblocksize=16K" instead of a general "scsi1: MyZFSPool:vm-100-disk-0,cache=none,size=32G,blocksize=16K". And then default to whatever is used now, when moving/restoring/migrating that VM to a non-ZFS storage? But I guess it then also gets messy when people want to see the same for ceph, LVM and whatever.

PS:
Still annoying that you can only restore all disks of a VM to a single target storage. When restoring a VM from a backup you can't specify which virtual disk should be restored to which target storage, so all zvols will be created with the same volblocksize, no matter if I like that volblocksize or not. Not great when you want to optimize stuff, for example a single VM that should store a MySQL with 16K volblocksize on one virtual disk and some big video files with 128K volblocksize on another virtual disk. And yes, I know that I can not specify a target storage and then it will restore it to the storages the virtual disks were previously stored on. But that won't help when you use backup+restore to move VMs between different Nodes, like I need to do here, because I can't use online/offline/cross-cluster migration because of ZFS native encryption.
 
Last edited:
Creating the zvols with different volblocksize isn't the problem. I can also do that directly by creating a new zvol with desired blocksize by running "zfs create -V -o volblocksize=16k" and then add that zvols as a disk to the VM by editing the config file with "nano /etc/pve/qemu-server/VMID.conf".

The problem is that a backup restore or migration will create the zvol with the volblocksize whatever is set as "Block Size" for the ZFS storage at the time doing the restore/migration and won't care what volblocksize was used before backing it up/migrating it.

So you would have to do your workaround each time you restore or migrate a VM. Thats very unconvinient and requires additional documentation of what disk of what VM should use what volblocksize.
Hi,

You could create several zfs storage(datasets) with different blocksize: 4k, 16k, and so on on all nodes.

Then you create any VM with vDisks in the desired storage. So in sich case you solve the problems.

Good luck / Bafta !
 
Hi,

You could create several zfs storage(datasets) with different blocksize: 4k, 16k, and so on on all nodes.

Then you create any VM with vDisks in the desired storage. So in sich case you solve the problems.

Good luck / Bafta !
Yes, that is how I do it. But a bit annoying when moving VMs between nodes by backup+restore as you can't define what vdisk should be restored to what storage.
Lets say I got a VM with 3 vdisks. 8K, 16K and 128K on 3 different storages on node 1. When restoring that VM to node 2 I can only select a single storage, lets say I choose the 8K one, and all 3 vdisks will be created using 8K volblocksize. Then I have to move the disks to the other two storages but that requires space I probably don't got anymore after the restore.
 
  • Like
Reactions: SInisterPisces
May I ask what would be the proper way to add a dataset to Proxmox?
I haven't found any guideline - only for regular zpools.
 
May I ask what would be the proper way to add a dataset to Proxmox?
I haven't found any guideline - only for regular zpools.
zfs create YourExistingPool/NameOfNewDataset
Then you can add that dataset as a new ZFS storage using another block size via GUI " Datacenter -> Storage -> Add -> ZFS" and point it to "YourExistingPool/NameOfNewDataset".
 
Last edited:
zfs create YourExistingPool/NameOfNewDataset
Then you can add that dataset as a new ZFS storage usimg another block size via GUI " Datacenter -> Storage -> Add -> ZFS" and point it to "YourExistingPool/NameOfNewDataset".
ahh thanks for that - my mistake was to look on a node directly rather then cluster...
 
Yes, that is how I do it. But a bit annoying when moving VMs between nodes by backup+restore as you can't define what vdisk should be restored to what storage.
Lets say I got a VM with 3 vdisks. 8K, 16K and 128K on 3 different storages on node 1. When restoring that VM to node 2 I can only select a single storage, lets say I choose the 8K one, and all 3 vdisks will be created using 8K volblocksize. Then I have to move the disks to the other two storages but that requires space I probably don't got anymore after the restore.
This is where I get hung up as well. Creating the disks is easy enough, but restoring them and retaining the proper blocksize should be simpler.
 
Don't know if something has changed in the meantime, but in the last version (PVE 8.1), this seams to work just fine.

The trick is to declare multiple ZFS storages in the Datacenter tab, that impacts on the same ZFS Pool, but with different block size.
This doesn't anything on the pool, but when I create Vdisks on these Storage, PVE will create zvols with volblocksize configured for the "storage device"

For example, in my testing lab I've the ZFS storage "ZFS-Lab2" (named equally on every cluster node):
Screenshot 2023-11-26 at 17.37.53.png

Then I've created ZFS Storage "ZFS-Lab2-8K" and "ZFS-Lab2-64K":
Screenshot 2023-11-26 at 17.38.33.png

with the related block sizes:
Screenshot 2023-11-26 at 17.38.44.png


Now let's create a test VM with 3 disks. One on the default storage and the other two on the new storage devices:
Screenshot 2023-11-26 at 17.41.40.png

Here it is:
Screenshot 2023-11-26 at 17.56.09.png

Now let's backup the VM on PBS with default options:
Screenshot 2023-11-26 at 17.58.03.png

and restore it over the original VM (leaving the storage setting to the default "From backup configuration"):
Screenshot 2023-11-26 at 17.59.07.png
Screenshot 2023-11-26 at 17.59.17.png

Result: VM restored with original storage configuration
Screenshot 2023-11-26 at 17.59.46.png

Now let's destroy the original VM:
Screenshot 2023-11-26 at 18.00.01.png


(continued on next post for attachment limit...)
 
Don't know if something has changed in the meantime, but in the last version (PVE 8.1), this seams to work just fine.

The trick is to declare multiple ZFS storages in the Datacenter tab, that impacts on the same ZFS Pool, but with different block size.
That approach has worked since forever, but thank you for explaining it for others.
 
  • Like
Reactions: EdoFede
Now all the VM parameters are gone from the PVE node.

Browsing che PBS1 backup store for the VM backups...
Screenshot 2023-11-26 at 18.00.58.png

Now let's restore it on a new VID (leaving the storage setting to the default "From backup configuration"):
Screenshot 2023-11-26 at 18.03.18.png

BANG, VM restored keeping the original storage configuration :)
Screenshot 2023-11-26 at 18.03.49.png

"under the hood" storage Zvols:
Bash:
root@proxlab1:~# zfs get volblocksize |grep vm-400
ZFS-Lab2/vm-400-disk-0     volblocksize  16K       default
ZFS-Lab2/vm-400-disk-1     volblocksize  8K        -
ZFS-Lab2/vm-400-disk-2     volblocksize  64K       -


Code:
INFO: starting new backup job: vzdump 100 --remove 0 --notification-mode auto --node proxlab1 --notes-template '{{guestname}}' --storage PBS1 --mode snapshot
INFO: Starting Backup of VM 100 (qemu)
INFO: Backup started at 2023-11-26 17:58:04
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: VM Name: TestVolblocksize
INFO: include disk 'scsi0' 'ZFS-Lab2:vm-100-disk-0' 2G
INFO: include disk 'scsi1' 'ZFS-Lab2-8k:vm-100-disk-1' 2G
INFO: include disk 'scsi2' 'ZFS-Lab2-64k:vm-100-disk-2' 2G
INFO: creating Proxmox Backup Server archive 'vm/100/2023-11-26T16:58:04Z'
INFO: starting kvm to execute backup task
INFO: started backup task '2e4bd3a2-ef62-40a9-910a-63d07792d55b'
INFO: scsi0: dirty-bitmap status: created new
INFO: scsi1: dirty-bitmap status: created new
INFO: scsi2: dirty-bitmap status: created new
INFO: 100% (6.0 GiB of 6.0 GiB) in 3s, read: 2.0 GiB/s, write: 0 B/s
INFO: Waiting for server to finish backup validation...
INFO: backup is sparse: 6.00 GiB (100%) total zero data
INFO: backup was done incrementally, reused 6.00 GiB (100%)
INFO: transferred 6.00 GiB in 5 seconds (1.2 GiB/s)
INFO: stopping kvm after backup task
INFO: adding notes to backup
INFO: Finished Backup of VM 100 (00:00:08)
INFO: Backup finished at 2023-11-26 17:58:12
INFO: Backup job finished successfully
ERROR: could not notify via target `mail-to-root`: could not notify via endpoint(s): mail-to-root: At least one recipient has to be specified!
TASK OK

Code:
new volume ID is 'ZFS-Lab2:vm-400-disk-0'
Warning: volblocksize (8192) is less than the default minimum block size (16384).
To reduce wasted space a volblocksize of 16384 is recommended.
new volume ID is 'ZFS-Lab2-8k:vm-400-disk-1'
new volume ID is 'ZFS-Lab2-64k:vm-400-disk-2'
restore proxmox backup image: /usr/bin/pbs-restore --repository pve-backup@pbs@backlab1:ZFSLab1 vm/100/2023-11-26T16:58:04Z drive-scsi0.img.fidx /dev/zvol/ZFS-Lab2/vm-400-disk-0 --verbose --format raw --skip-zero
connecting to repository 'pve-backup@pbs@backlab1:ZFSLab1'
open block backend for target '/dev/zvol/ZFS-Lab2/vm-400-disk-0'
starting to restore snapshot 'vm/100/2023-11-26T16:58:04Z'
download and verify backup index
progress 1% (read 25165824 bytes, zeroes = 100% (25165824 bytes), duration 0 sec)
progress 2% (read 46137344 bytes, zeroes = 100% (46137344 bytes), duration 0 sec)
progress 3% (read 67108864 bytes, zeroes = 100% (67108864 bytes), duration 0 sec)
.....
progress 98% (read 2105540608 bytes, zeroes = 100% (2105540608 bytes), duration 0 sec)
progress 99% (read 2126512128 bytes, zeroes = 100% (2126512128 bytes), duration 0 sec)
progress 100% (read 2147483648 bytes, zeroes = 100% (2147483648 bytes), duration 0 sec)
restore image complete (bytes=2147483648, duration=0.01s, speed=183691.48MB/s)
restore proxmox backup image: /usr/bin/pbs-restore --repository pve-backup@pbs@backlab1:ZFSLab1 vm/100/2023-11-26T16:58:04Z drive-scsi1.img.fidx /dev/zvol/ZFS-Lab2/vm-400-disk-1 --verbose --format raw --skip-zero
connecting to repository 'pve-backup@pbs@backlab1:ZFSLab1'
open block backend for target '/dev/zvol/ZFS-Lab2/vm-400-disk-1'
starting to restore snapshot 'vm/100/2023-11-26T16:58:04Z'
download and verify backup index
progress 1% (read 25165824 bytes, zeroes = 100% (25165824 bytes), duration 0 sec)
progress 2% (read 46137344 bytes, zeroes = 100% (46137344 bytes), duration 0 sec)
progress 3% (read 67108864 bytes, zeroes = 100% (67108864 bytes), duration 0 sec)
.....
progress 98% (read 2105540608 bytes, zeroes = 100% (2105540608 bytes), duration 0 sec)
progress 99% (read 2126512128 bytes, zeroes = 100% (2126512128 bytes), duration 0 sec)
progress 100% (read 2147483648 bytes, zeroes = 100% (2147483648 bytes), duration 0 sec)
restore image complete (bytes=2147483648, duration=0.01s, speed=176418.28MB/s)
restore proxmox backup image: /usr/bin/pbs-restore --repository pve-backup@pbs@backlab1:ZFSLab1 vm/100/2023-11-26T16:58:04Z drive-scsi2.img.fidx /dev/zvol/ZFS-Lab2/vm-400-disk-2 --verbose --format raw --skip-zero
connecting to repository 'pve-backup@pbs@backlab1:ZFSLab1'
open block backend for target '/dev/zvol/ZFS-Lab2/vm-400-disk-2'
starting to restore snapshot 'vm/100/2023-11-26T16:58:04Z'
download and verify backup index
progress 1% (read 25165824 bytes, zeroes = 100% (25165824 bytes), duration 0 sec)
progress 2% (read 46137344 bytes, zeroes = 100% (46137344 bytes), duration 0 sec)
progress 3% (read 67108864 bytes, zeroes = 100% (67108864 bytes), duration 0 sec)
.....
progress 98% (read 2105540608 bytes, zeroes = 100% (2105540608 bytes), duration 0 sec)
progress 99% (read 2126512128 bytes, zeroes = 100% (2126512128 bytes), duration 0 sec)
progress 100% (read 2147483648 bytes, zeroes = 100% (2147483648 bytes), duration 0 sec)
restore image complete (bytes=2147483648, duration=0.01s, speed=180709.36MB/s)
rescan volumes...
TASK OK
 
Last edited:
  • Like
Reactions: IsThisThingOn
Yes, but I've got multiple PVE nodes here and move VMs between them by backup+restore (migration is not an option because of ZFS native encryption...). Each node got like 20-30 storages because of all the PBS namespaces and different recordsizes/volblocksizes while there are actually only 2 ZFS pools per node and 3 PBS. So its really messy to do it that way. And not setting a target storage isn't really an option here when restoring to a different node with different storageIDs.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!