zvol using much more space than the imputable to parity + padding

H25E

Member
Nov 5, 2020
68
4
13
32
Hello everybody,

I have a zvol with a volsize of 3.91T. This zvol is attached as a data disk to a VM that works as fileserver (SAMBA/NFS). Here you have the details of df -hT from inside the VM:
1605526577567.png

2.4T used from 3.9. The zvol has volblocksize = 8K, and is on a pool with only one RAIDz-1 of four disks and ashift=12(4kb). I have been reading about parity and padding and understand that for each block of data I want to write I will need 2 sectors for the data, 1 for the parity and 1 for padding because the total number of written sectors must be multiple of (p+1[2]). This in an efficiency of 50%, vs the 75% efficiency of an ideal RAIDz-1 the zvol should be wasting 1.5x (0.75/0.5) times more space than the ideal. That should be 3.6TB (2.4T * 1.5). But here you have what zfs get says to me:
1605527222378.png

Only with the space related fields:
1605527269124.png

Why 5.46T? Thats 2.3x greater. And the logicalused at 3.75? Can't understand... Any help?

Thanks for your time,


Héctor
 
Sorry fabian. I (believe I) understand what the doc wants to mean here but don't know how this solves my question. Probably my fault, but I have already done the maths for parity and padding waste in my original post and they don't match with the data. Have I done it wrong?
 
space used as seen by ext4 is not necessarily space used as seen by ZFS. e.g. ext4 might see some blocks as unused, but hasn't told the upper layers via TRIM/DISCARD that it does not need them anymore. or it might account some blocks as not fully used, but ZFS does not know about that and has to treat them as used. the 3.75T is what the VM has written (volblocksize-aligned of course) and hasn't yet discarded.
 
  • Like
Reactions: H25E
Look at that table. With ashift 12, zraid1 with 4 drives you might want to use 8 sectors (32k to loose 33% space) or 16 sectors (64k to loose 27% space). Right now you are using 2 sectors, so 8k with 50% space lost.
You need to destroy and replace every virtual harddisk to change the Volblocksize.
 
Last edited:
I have an hypothesis.

The logical used space is 3.75T, that's 1.5x the space shown from inside the VM (2.4T), and 1.5x is expected to be the waste of my setup vs an ideal raidz1. If I done "maths" correctly in my first post.

But if you look the definition for logical used it says that it excludes the metadata. It is possible that all the difference between 3.75T and 5.4T it's because of metadata? Xattr is on.

There is a way to find space used by metadata? Hasn't find one.
 
Last edited:
Look at that table. With ashift 12, zraid1 with 4 drives you might want to use 8 sectors (32k to loose 33% space) or 16 sectors (64k to loose 27% space). Right now you are using 2 sectors, so 8k with 50% space lost.
You need to destroy and replace every virtual harddisk to change the Volblocksize.
Yes, I understand the concept. But what I'm trying to explain is that this do not explain all the gap between 2.4T and 5.46T. Or at least I think not.


space used as seen by ext4 is not necessarily space used as seen by ZFS. e.g. ext4 might see some blocks as unused, but hasn't told the upper layers via TRIM/DISCARD that it does not need them anymore. or it might account some blocks as not fully used, but ZFS does not know about that and has to treat them as used. the 3.75T is what the VM has written (volblocksize-aligned of course) and hasn't yet discarded
Sorry, didn't see your answer before posting mine. I'll think about it.
 
Last edited:
Did you run "fstrim -a" inside the VM and enabled discard in the Proxmox SCSI controller settings like fabian mentioned?
 
  • Like
Reactions: H25E
Thanks guys, didn't known that this was a thing. I was still in chapter 3, trying to understand the storage. I have done both and now results are different.

  • Logicalused & logicalreferenced now meet the measure from inside the VM (2.4T)
  • Referenced & usedbydataset show 3.54, that meets exactly the expected waste due to parity and padding (1.5 times)
  • The used it's at 4.03T. 1.7 times the 2.4T from inside the VM and 1.14 times the referenced value. It's that due to
it (ext4) might account some blocks as not fully used, but ZFS does not know about that and has to treat them as used. @fabian

But in that case, why isn't reflected on the referenced value too?

Anyway, thanks for your help guys. Best regards,


Héctor

EDIT: Also a second question. About trim/discard the doc says that is useful for thin provisioning disks, but mine wasn't. Refreservation is = to volsize. Why it has been necessary do this procedure?
 
Last edited:
EDIT: Also a second question. About trim/discard the doc says that is useful for thin provisioning disks, but mine wasn't. Refreservation is = to volsize. Why it has been necessary do this procedure?
I'm not sure but ZFS is a copy-on-write filesystem. If you got a 1GB file and change half of it, it won't just edit the file. It will keep the old file and add the changes so you got 1,5GB of data even if ext4 tells you the file is still 1GB. Thats why snapshotting works, because it adds changes and doesn't overwrite stuff so all changes are always revertable. Without enabled snapshoting it should merge that changes and delete old stuff over time. Without discard run regularily inside your VMs the VMs won't tell ZFS what data is deleted and what not. I could think that ZFS can't cleanup if it doesn't know whats deleted.

I think that is also was fabian said.
 
Last edited:
  • Like
Reactions: H25E

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!