Turning on ZFS compression on pool

DynFi User

Active Member
Apr 18, 2016
132
14
38
47
dynfi.com
Just a little question : my PBS is configured using ZFS and compression has been left to default which is "on" and "local" which stands for "lz4".

  1. Shall this be left to the default "on" value ?
    1. Is there any interest in using compression with PBS (= isn't PBS using it's own compression - in which case ZFS compression will simply be lost processor cycles for nothing).
  2. If we have different datastore for different types of backup :
    1. Does it have to be turned on for VM backups ?
    2. Does it have to be turned on for file backups ?
    3. If compress is interesting which algorithm shall we use preferably (lz4 or a more efficient zstd) ?

This might have a quite significant impact on processor usage and using new algorithm such as zstd might also have positive impact.
Hence the reason why I am asking these questions.
 

Dunuin

Famous Member
Jun 30, 2020
6,756
1,571
149
Germany
I also would like to know that. Right now I got lz4 enabled, atime disabled, deduplication disabled and sync=standard.

I just installed a PBS VM on my TrueNAS server, mounted a dataset on that TrueNAS server using NFS into the VM and added the PBS as a storage to my PVE.

The ZFS pool on TrueNAS is also encrypted so I don't need to active the PBS encryption. But is the communication between PVE and PBS encrypted if PBS encryption is disabled?
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,281
1,565
164
South Tyrol/Italy
shop.proxmox.com
Shall this be left to the default "on" value ?
  1. Is there any interest in using compression with PBS (= isn't PBS using it's own compression - in which case ZFS compression will simply be lost processor cycles for nothing).

Yes, Proxmox Backup Server already uses zstd compression for the blocks, but ZFS compression has some heuristics to detect compressed streams, or streams which cannot really benefit from compression, early and avoid re-compression. As this heuristic is relatively cheap, the performance penalty is small, so in practice this won't matter too much.

If we have different datastore for different types of backup :
  1. Does it have to be turned on for VM backups ?
  2. Does it have to be turned on for file backups ?
IIRC, ZFS compression works on records (but tbh, not 100% sure from top of my head), which is something between 4 KiB and 128 Kib in a default system (ashift 12 and default record (max) size) and as the dynamic chunk is 64 KiB to 4 MiB default and the fixed is static 4 MiB by default, they are similar enough that one may not want to invest a lot of time coming up with different fine-tuning parameters for each.

The answer, even if it won't make you all too happy is: it just won't matter much, but you can assume that ZFS level compression won't gain much regarding used file size either way, so it's not worth enabling, but if you do the impact is pretty much negligible, so I'd not sweat it too much.
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,281
1,565
164
South Tyrol/Italy
shop.proxmox.com
The ZFS pool on TrueNAS is also encrypted so I don't need to active the PBS encryption. But is the communication between PVE and PBS encrypted if PBS encryption is disabled?

Transport encryption and encryption at rest are two different things. The communication between the client and the server are always going through TLS, there is never an unencrypted communication channel for any backup or API data, else one could snoop on the authentication etc.

Also note that while TrueNAS encryption can be a valid replacement for your use case, it is not equivalent to the Proxmox Backup Server one. The PBS one is client-side encryption, that means the server (and other with access to it) can never snoop on the actual backup data sent by the client, which allows that the server, or at least physical access to it, do not have to be 100% trusted.

As you probably enter the decryption password of the whole pool on boot on the TrueNAS it has also good protection for when the server is powered off, but in PBS the server doesn't have, nor needs unencrypted data access even if online.
 
  • Like
Reactions: Dunuin

leesteken

Famous Member
May 31, 2020
1,740
354
88
I thought the PBS Datastore needed atime=on, or at least relatime=on (which also requires atime=on). Is this true or can it be turned off completely on ZFS?
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,281
1,565
164
South Tyrol/Italy
shop.proxmox.com
Support for atime (Access Time) is relevant for Proxmox Backup Server functionality, one should use relatime to get some performance improvements.

Edited: confused mtime/atime importance previously
 
Last edited:

Dunuin

Famous Member
Jun 30, 2020
6,756
1,571
149
Germany
And what about the recordsize? Is it fine to use the default 128K or would it be beneficial to use for example 1M or 32K if the underlaying pool would allow that without too much padding overhead?
 

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,281
1,565
164
South Tyrol/Italy
shop.proxmox.com
I thought the PBS Datastore needed atime=on, or at least relatime=on (which also requires atime=on). Is this true or can it be turned off completely on ZFS?
A colleague asked me to recheck, and you're actually right.

It's the atime and that needs to be at least relatime (on naturally works too). Sorry for the confusion, one should not post from top of one's head after certain hours. I'm going to edit above comment to avoid people using the coming to possible bad conclusions from its wrong info.

We're using atime to actually benefit from relatime performance optimization. We could use mtime too and that would allow to make the "trailing window" for GC smaller, but performance was deemed to be more important.

The above drawbacks from my other post may be true but should in practice not be a real issue, verify goes over existing chunks only anyway, and an admin triggered find or grep may want to exclude the backup directory to drastically improve performance for their command too.

Anyhow, sorry for any potential confusion caused.
 

Tmanok

Active Member
We're using atime to actually benefit from relatime performance optimization. We could use mtime too and that would allow to make the "trailing window" for GC smaller, but performance was deemed to be more important.
@Dunuin I found it, the reason why GC is 24Hr minimum. Thanks Thomas!

On another note, more related to this thread and to clear my head after reading conflicting information about ZFS atime from another thread:
  • PVE VM Storage can safely be configured with relatime instead of atime using ZFS.
  • PBS Datastores need atime? Or can they also use relatime?
Thanks Again Thomas,


Tmanok
 
  • Like
Reactions: Dunuin

leesteken

Famous Member
May 31, 2020
1,740
354
88
Don't forget that a ZFS pool needs atime=on for relatime=on to work, in contrast to filesystems like ext4.

EDIT for clarification: I don't mean that atime appears on when you enable relatime on a ZFS pool. I mean that you need to explicitly turn on atime and turn on relatime both, for relatime to work. In other words: is atime is off then relatime is not working (even when relatime is set to on).
 
Last edited:

Tmanok

Active Member
Don't forget that a ZFS pool needs atime=on for relatime=on to work, in contrast to filesystems like ext4.
Thank you, that's good to know for diagnostic purposes, I had overlooked that. Doesn't mean that relatime doesn't do "the same" (similar) thing to EXT4 just because atime appears to be "on" while it is using relatime of course.

relatime is fine.
Relatime it is, thank you Thomas.

Tmanok
 

gneto

New Member
Jun 21, 2022
10
0
1
Good Morning ! I couldn't understand the suggestion that best applies, I have 12 vms that occupy 4TB of disk and 4 SSDs In ZFS without compression.
I am wanting to replicate on another identical server what is the best solution? It is in Raidz/Raidz1.
Should I put another SSD in each or use compression?
The VMs have php/Mysql and about 2TB of images and photos.
Backup Server is separate
Tanks
 

Dunuin

Famous Member
Jun 30, 2020
6,756
1,571
149
Germany
So you are talking about ZFS as a storage for VMs/LXCs using replicatiopn between two PVE nodes and not as a storage for a PBS datastore synced between two PBS servers?

4 disk raidz1 is bad for MySQL as you would need to use a volblocksize of atleast 32K (in case of ashift=12) to not loose alot of capacity to padding overhead. And MySQL is sync writing with 16K blocks and writing with lower blocksizes to higher blocksizes is always problematic.
 

gneto

New Member
Jun 21, 2022
10
0
1
Do you think I should put in raid 0 without compression and synchronize?
Or I create within the two servers this Storage because the ceph I already discarded is very complex.
Would the storage solution be a separate one?

Sorry for the beginner's doubt.
And Thanks
 

Dunuin

Famous Member
Jun 30, 2020
6,756
1,571
149
Germany
ZFS with replication is not a real shared storage, its two local storages that get synced by replication. So you need similar ZFS pools with the identical name on both nodes of your cluster. So both nodes need 4 SSDs each. And I wouldn't use raid0. You will loose all the reliability and bit rot protection using that. And because of replication, if a file corrupts on one node it will also be corrupted within a few minutes on the other node. If you care about your data and performance then a striped mirror (raid10) would be a better choice. But then you might need to buy even more disks.
 

gneto

New Member
Jun 21, 2022
10
0
1
In this case, the best thing is to use the zfs z1 pool to maintain the integrity and have the backup well adjusted, is that right?
Thank you for your help

Do you do consulting to make a nice setup for me or recommend someone?
 

Dunuin

Famous Member
Jun 30, 2020
6,756
1,571
149
Germany
Raidz1 is still not a great option because you are either...:
1.) use the default 8K volblocksize where you loose 50% of your raw capacity, even if you don't see it. It will show you everywhere that you got 75% of your raw capacity as usable space but that is wrong as everything you write to a virtual disk will consume 150% space. Lets say you got 4x 2TB SSDs in a raidz1 with ashift=12. When writing 4TB to zvols (your VMs virtual disks) it will also write 2TB of parity data and 2TB of padding overhead. So only 50% of your disks are usable and that is the same as using a striped mirror. And a striped mirror of 4 disks would be a better option as you would get the same capacity but double the IOPS performance.
2.) use a 32K volblocksize. Then you could fit 6TB of data on that raidz1 pool because the padding overhead would be negligible so you could store 6TB of data + 2TB of parity. But downside would be that performance and SSD wear would be terrbile for everything using a blocksize of below 32K. So bad for running a Mysql DB.

And by the way. A ZFS pool should always have 20% of free space or it will fragment faster and become slow. So even if you got 4TB of usable capacity you shouldn't use more than 3.2TB. Best would be to also set a 90% quota so that you can't screw up your pool by accident by writing it completely full where it would become unrecoverable as ZFS is a copy-on-write filesystem where empty space is required to actually delete or edit data. Really bad when filling it up completely where oyu would need to delete stuff to recover from the read-only state but nothing can be deleted because that would require free space to write new data first.

Now lets say you want to store 4TB of VMs and these should be part of a HA cluster that uses ZFS replication. And you already got 4x 2TB SSDs. then I would put 6x 2TB SDDs in the first PVe node + 6x 2TB SSDs in the second PVE node and setup both nodes with a 6 disk striped mirror (raid10) with a 16K volblocksize and a ashift of 12 and the same poolname on both nodes. That way you would get 4.8TB of HA storage to store VMs and still get a good performance, even when running MySQL DBs. So in total 12x 2TB to be able to store 4.8TB of virtual disks.
 
Last edited:

gneto

New Member
Jun 21, 2022
10
0
1
Raidz1 is still not a great option because you are either...:
1.) use the default 8K volblocksize where you loose 50% of your raw capacity, even if you don't see it. It will show you everywhere that you got 75% of your raw capacity as usable space but that is wrong as everything you write to a virtual disk will consume 150% space. Lets say you got 4x 2TB SSDs in a raidz1 with ashift=12. When writing 4TB to zvols (your VMs virtual disks) it will also write 2TB of parity data and 2TB of padding overhead. So only 50% of your disks are usable and that is the same as using a striped mirror. And a striped mirror of 4 disks would be a better option as you would get the same capacity but double the IOPS performance.
2.) use a 32K volblocksize. Then you could fit 6TB of data on that raidz1 pool because the padding overhead would be negligible so you could store 6TB of data + 2TB of parity. But downside would be that performance and SSD wear would be terrbile for everything using a blocksize of below 32K. So bad for running a Mysql DB.

And by the way. A ZFS pool should always have 20% of free space or it will fragment faster and become slow. So even if you got 4TB of usable capacity you shouldn't use more than 3.2TB. Best would be to also set a 90% quota so that you can't screw up your pool by accident by writing it completely full where it would become unrecoverable as ZFS is a copy-on-write filesystem where empty space is required to actually delete or edit data. Really bad when filling it up completely where oyu would need to delete stuff to recover from the read-only state but nothing can be deleted because that would require free space to write new data first.

Now lets say you want to store 4TB of VMs and these should be part of a HA cluster that uses ZFS replication. And you already got 4x 2TB SSDs. then I would put 6x 2TB SDDs in the first PVe node + 6x 2TB SSDs in the second PVE node and setup both nodes with a 6 disk striped mirror (raid10) with a 16K volblocksize and a ashift of 12 and the same poolname on both nodes. That way you would get 4.8TB of HA storage to store VMs and still get a good performance, even when running MySQL DBs. So in total 12x 2TB to be able to store 4.8TB of virtual disks.
Friend you helped me pacas I was seeing here in one that only left the 50% as you said and did not understand.
The other I did a raidzO even after I did the replication more this congests the network well.
I appreciate the help and congratulations for the knowledge!
I will try to apply what you said here.
 

gneto

New Member
Jun 21, 2022
10
0
1
Last question just to take the stubbornness out of the beginner.
For this use I will have if using lz4 compression would not be a better alternative then for these 4 ssd in raid z1 or any other suggestions?
I thought putting a trhueNas and putting everything centered on it would be a better alternative?
Thanks
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!