zvol using much more space than the imputable to parity + padding

H25E · Nov 16, 2020

Hello everybody,

I have a zvol with a volsize of 3.91T. This zvol is attached as a data disk to a VM that works as fileserver (SAMBA/NFS). Here you have the details of df -hT from inside the VM:

2.4T used from 3.9. The zvol has volblocksize = 8K, and is on a pool with only one RAIDz-1 of four disks and ashift=12(4kb). I have been reading about parity and padding and understand that for each block of data I want to write I will need 2 sectors for the data, 1 for the parity and 1 for padding because the total number of written sectors must be multiple of (p+1[2]). This in an efficiency of 50%, vs the 75% efficiency of an ideal RAIDz-1 the zvol should be wasting 1.5x (0.75/0.5) times more space than the ideal. That should be 3.6TB (2.4T * 1.5). But here you have what zfs get says to me:

Only with the space related fields:

Why 5.46T? Thats 2.3x greater. And the logicalused at 3.75? Can't understand... Any help?

Thanks for your time,

Héctor

fabian · Nov 16, 2020

https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_zfs

read "3.8.3 ZFS RAID Level Considerations", especially the "Size, Space usage and Redundancy" part

H25E · Nov 16, 2020

Sorry fabian. I (believe I) understand what the doc wants to mean here but don't know how this solves my question. Probably my fault, but I have already done the maths for parity and padding waste in my original post and they don't match with the data. Have I done it wrong?

fabian · Nov 16, 2020

space used as seen by ext4 is not necessarily space used as seen by ZFS. e.g. ext4 might see some blocks as unused, but hasn't told the upper layers via TRIM/DISCARD that it does not need them anymore. or it might account some blocks as not fully used, but ZFS does not know about that and has to treat them as used. the 3.75T is what the VM has written (volblocksize-aligned of course) and hasn't yet discarded.

Dunuin · Nov 16, 2020

Look at that table. With ashift 12, zraid1 with 4 drives you might want to use 8 sectors (32k to loose 33% space) or 16 sectors (64k to loose 27% space). Right now you are using 2 sectors, so 8k with 50% space lost.
You need to destroy and replace every virtual harddisk to change the Volblocksize.

H25E · Nov 16, 2020

I have an hypothesis.

The logical used space is 3.75T, that's 1.5x the space shown from inside the VM (2.4T), and 1.5x is expected to be the waste of my setup vs an ideal raidz1. If I done "maths" correctly in my first post.

But if you look the definition for logical used it says that it excludes the metadata. It is possible that all the difference between 3.75T and 5.4T it's because of metadata? Xattr is on.

There is a way to find space used by metadata? Hasn't find one.

H25E · Nov 16, 2020

Dunuin said:
Look at that table. With ashift 12, zraid1 with 4 drives you might want to use 8 sectors (32k to loose 33% space) or 16 sectors (64k to loose 27% space). Right now you are using 2 sectors, so 8k with 50% space lost.
You need to destroy and replace every virtual harddisk to change the Volblocksize.

Yes, I understand the concept. But what I'm trying to explain is that this do not explain all the gap between 2.4T and 5.46T. Or at least I think not.

space used as seen by ext4 is not necessarily space used as seen by ZFS. e.g. ext4 might see some blocks as unused, but hasn't told the upper layers via TRIM/DISCARD that it does not need them anymore. or it might account some blocks as not fully used, but ZFS does not know about that and has to treat them as used. the 3.75T is what the VM has written (volblocksize-aligned of course) and hasn't yet discarded

Sorry, didn't see your answer before posting mine. I'll think about it.

Dunuin · Nov 16, 2020

Did you run "fstrim -a" inside the VM and enabled discard in the Proxmox SCSI controller settings like fabian mentioned?

H25E · Nov 16, 2020

Thanks guys, didn't known that this was a thing. I was still in chapter 3, trying to understand the storage. I have done both and now results are different.

Logicalused & logicalreferenced now meet the measure from inside the VM (2.4T)
Referenced & usedbydataset show 3.54, that meets exactly the expected waste due to parity and padding (1.5 times)
The used it's at 4.03T. 1.7 times the 2.4T from inside the VM and 1.14 times the referenced value. It's that due to

it (ext4) might account some blocks as not fully used, but ZFS does not know about that and has to treat them as used. @fabian

But in that case, why isn't reflected on the referenced value too?

Anyway, thanks for your help guys. Best regards,

Héctor

EDIT: Also a second question. About trim/discard the doc says that is useful for thin provisioning disks, but mine wasn't. Refreservation is = to volsize. Why it has been necessary do this procedure?

Dunuin · Nov 16, 2020

H25E said:
EDIT: Also a second question. About trim/discard the doc says that is useful for thin provisioning disks, but mine wasn't. Refreservation is = to volsize. Why it has been necessary do this procedure?

I'm not sure but ZFS is a copy-on-write filesystem. If you got a 1GB file and change half of it, it won't just edit the file. It will keep the old file and add the changes so you got 1,5GB of data even if ext4 tells you the file is still 1GB. Thats why snapshotting works, because it adds changes and doesn't overwrite stuff so all changes are always revertable. Without enabled snapshoting it should merge that changes and delete old stuff over time. Without discard run regularily inside your VMs the VMs won't tell ZFS what data is deleted and what not. I could think that ZFS can't cleanup if it doesn't know whats deleted.

I think that is also was fabian said.

Search

Search

zvol using much more space than the imputable to parity + padding

H25E

Active Member

fabian

Proxmox Staff Member

H25E

Active Member

fabian

Proxmox Staff Member

Dunuin

Distinguished Member

H25E

Active Member

H25E

Active Member

Dunuin

Distinguished Member

H25E

Active Member

Dunuin

Distinguished Member

We value your privacy