ZFS and proxmox

Mirmanium

Active Member
Aug 14, 2020
73
11
28
45
Hello,

I am getting crazy to put in place ZFS in my server. I am new on ZFS world so now playing a bit to understand how it works before migrate all my data.
I have 3x4TB ZFS raidz1 and created using this command in PVE:

Code:
zpool create zfspool -f -o ashift=12 raidz /dev/disk/by-id/ata-ZZZ /dev/disk/by-id/ata-XXX /dev/disk/by-id/ata-YYY

have some questions you may help to bring me some light:

- I don't understand why proxmox allows me to add hard disk bigger than maximum allowed ~7,5TB, for instance 10TB i.e:
1609100486927.png
I can add several of them and different guest machines identify as 10TB.
How can I manage my free space then?
What will happen once reaching ~7,5TB? some of the guest machine will show still free space but unable to write there?

I run this command to see free space:

Code:
zfs list -o space

NAME                               AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
seagatedisks/media                 6.98T  26.1G        0B    128K             0B      26.1G
seagatedisks/media/vm-105-disk-0   6.98T  4.78G        0B   4.78G             0B         0B
seagatedisks/media/vm-113-disk-0   6.98T  21.3G        0B   21.3G             0B         0B
seagatedisks/media/vm-113-disk-1   6.98T  74.6K        0B   74.6K             0B         0B

I was writing some files on the
Code:
seagatedisks/media/vm-113-disk-0   6.98T  21.3G        0B   21.3G             0B         0B

but I removed the files, also format the disk on guest machine but still showing used 21.3G
Then, again how can I manage my entire free space? I am really confused.

Sorry if this is not the place to raise this questions but not sure where to do it.

Thank you,
 
- I don't understand why proxmox allows me to add hard disk bigger than maximum allowed ~7,5TB, for instance 10TB i.e:
How did you configure the storage under Datacenter -> Storages? Is it configured as "thin provisioned"?

In that case there will be no quota set and you have to make sure to keep enough space free for the pool to work (<80% for best performance).

but I removed the files, also format the disk on guest machine but still showing used 21.3G
The zfs list output looks like the storage is thin provisioned. This means, the volumes space usage will grow with every write. Do free the space again you will need to use trim/discard inside the guest. This will tell which areas can be zeroed out all the way down to the storage.



One last thing regarding this:
I have 3x4TB ZFS raidz1 and created using this command in PVE:
For a VM workload, consider using only mirrored vdevs. Could be a single mirror or many (raid10 like). You will have better performance regarding IOPS and will not be surprised by how much parity data is using of the available space with zvols (used for the VM disks). For more details check the following chapter in the PVE docs: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_zfs_raid_considerations
 
Thanks @aaron for your reply.

How did you configure the storage under Datacenter -> Storages? Is it configured as "thin provisioned"?

I did it over command line but checking it now on Datacenter -> Storage I can see they were created as "thin provisioned".
1609148559938.png



Yeah, enabling trim update free space on zfs :)
I read about mirrored vdevs but there are couple of things. I don't have extra $ for an additional hd to enable raid01. This server is my homelab server so I was looking for a balance between having 1 disk parity and maximum amount of space available.

Regarding:

In that case there will be no quota set and you have to make sure to keep enough space free for the pool to work (<80% for best performance).

Does it apply to raid01 too? if not, for my raiz1 configuration the total amount of free space will be 7,5 (80%) = ~6Tb :S

Thank you,
 
Last edited:
Yeah, enabling trim update free space on zfs :)
I read about mirrored vdevs but there are couple of things. I don't have extra $ for an additional hd to enable raid01. This server is my homelab server so I was looking for a balance between having 1 disk parity and maximum amount of space available.

Regarding:
Does it apply to raid01 too? if not, for my raiz1 configuration the total amount of free space will be 7,5 (80%) = ~6Tb :S

Thank you,
Right now you only got 4,8TB of usable capacity for your data + snapshots.
In theory you should be able to use 8TB and ZFS will list you that you got 8TB free, but in reality you only got something like 6TB. Thats because you didn't increased your volblocksize, so you got alot of padding overhead and every virtual HDD you already have created uses something like 150% space and that can't be change after creation. Look at your virtual HDDs with the "zfs list" command. If you write 1GB inside your VM, around 1.4 or 1.5GB will be written to the zfs pool, so you can't write 8TB inside the VMs.
And because 20% of the space needs to free for ZFS to work properly, you got like 4,8TB usable capacity.

If you want to fix that padding overhead you need to increase the blocksize to atleast 16k and recreate every virtual HDD.

And ZFS doesn't work like your normal filesystems. Its a copy on write filesystem. Its like a empty book where you aren't allowed to change whats already written. Every change you want to do needs to be written as a new entry on a free page. You can write as long as there are enough free pages left. To get new pages you need to use "discard" which tells ZFS which pages aren't needed anymore. ZFS will rip out there pages, cleans them and glue them at the end of the book so you don't run out of free pages. Because it can't edit already written pages you need to make sure there are always free pages left or it get stuck.
 
Last edited:
Thanks @Dunuin,
Indeed I am playing right now with this so I can remove the disk and create it again to increase to at least 16K volblocksize. Due to I don't really know benefits/cons around increasing volblocksize I can add more than 16k if I will get more space available.
Any suggestion?

I did some test from a 12GB file:
  • disk-1: 16k
  • disk-2: 128k
  • disk-3: 8k
1609322901084.png

Almost no difference between 16k and 128k but a lot as you mentioned from 8k.
I haven't notice any performance difference among them while transfering the data.


And ZFS doesn't work like your normal filesystems. Its a copy on write filesystem. Its like a empty book where you aren't allowed to change whats already written. Every change you want to do needs to be written as a new entry on a free page. You can write as long as there are enough free pages left. To get new pages you need to use "discard" which tells ZFS which pages aren't needed anymore. ZFS will rip out there pages, cleans them and glue them at the end of the book so you don't run out of free pages. Because it can't edit already written pages you need to make sure there are always free pages left or it get stuck.

Got it. Thanks :)
 

Attachments

  • 1609322662652.png
    1609322662652.png
    2.8 KB · Views: 0
Last edited:
You guest OSs filesystem is most likely writing in 4K blocks to the virtual HDD is working with 512B blocks. In an ideal world your guests blocksize should be greater then the hosts block size or else you got overhead because blocks need to be merged. If you got 16K volbocksize on your host and the guest writes in chunks of 4K (8x 512B virtual blocks), your host needs to merge 4 of those 4K blocks (or 32x 512B blocks) so it can store them as a 16K block. Every time your guest is doing small wandom writes and wants for example to change a single 4K block your hosts needs to read a full 16K block, change 4K and write that 16K block again. So if you got alot of small writes you want your volblocksize to be as small as possible.
So I would go with 16K unless you are planning to move your VMs to a bigger pool in the future. With 16K you cant just move your VMs to a raidz1 pool of 5 HDDs, because you would get the same problem like now because a raidz1 with 5 drives needs atleast a volblocksize of 32K.