advanced tips and tricks for extreme storage saving?

zenowl77

Active Member
Feb 22, 2024
234
43
28
just wondering what advanced tricks others use to squeeze every gigabyte out of their systems.

i am currently using zfs compression and dedup fast on some datasets (like backups), tossed in a spare 1TB drive for VMs just to offload extra VMs i do not frequently use, i also use SSHFS / WinFSP / SSHFS-Win manager to mount drives in VMs and on my laptop to share files to offload as much as possible from VMs, etc and share things between VMs from the host to eliminate files in the VMs where unnecessary.

considering using more ZFS datasets like this aswell to share between OS's but it is harder with installed files and i have found zfs dedup really just does not work that well vs the memory impact and it does not work at all for many file types like media, ai models, etc that would be the most useful to dedup despite files being identical, tried it with games and some program files but it just didnt work out so well as windows doesnt really like everything not being on C: even if symlinked and dedup, etc doesnt work roughly at all for these types of files.

currently i am shrinking some media files on my main jellyfin drive (originals on a backup so slight quality loss from the duplicates is fine by me) to save some extra space, but i am interested in hearing if anyone has found some really creative or interesting ways to reduce storage usage.
 
we have several storage tiers.

on the lower end there are simple cheap NFS servers where we pack everything not important, archived, "trashed" (before we delete permanent), ISOs, testing, temporary,....

everywhere else we do not "squeeze" (because running on low storage is a pain and neverending nightmare to sysops)
 
  • Like
Reactions: Kingneutron
i am currently using zfs compression
ZFS can utilize several algorithms. I do not play with them, I just use the default. But to squeeze out some percent:
Code:
~# zfs set compression=thisdoesnotexist  rpool/dummy 
cannot set property for 'rpool/dummy': 'compression' must be one of 'on | off | lzjb | gzip | gzip-[1-9] | zle | lz4 | zstd | zstd-[1-19] | zstd-fast | zstd-fast-[1-10,20,30,40,50,60,70,80,90,100,500,1000]'
Details are her: man zfsprops, search for "compression="

and dedup fast on some datasets (like backups)
Build a PBS. There I have ratios of 20 to 40. I doubt that you can (with great success) "ZFS-dedup" compressed backups - as soon as a single byte changes, the complete backup may look "different".

((( And for me it is important to have some redundancy, if possible. I will never run ZFS without it, be it mirrors or RaidzZ2 for specific use cases. So... striped single disks are a no-no. Of course ymmv. )))
 
We use pbzip2 (which can use all cores for zip and unzip) and duperemove or in combination of both depends on usecase and all on raid6'ed xfs on directory basis, both tools run as background jobs. Deduped files in xfs are no slower than undeduped ones which made it even very fast solution after.
But as assuming, yes, it's fully bash with cron and not in the pveui ...
:)
 
  • Like
Reactions: Johannes S
We use pbzip2 (which can use all cores for zip and unzip) and duperemove or in combination of both depends on usecase and all on raid6'ed xfs on directory basis, both tools run as background jobs. Deduped files in xfs are no slower than undeduped ones which made it even very fast solution after.
But as assuming, yes, it's fully bash with cron and not in the pveui ...
:)
Pbzip2 uses lots of CPU but it's still a bit slow. If you get to the point where you need faster backups, look into pigz
 
pigz new to me, will take a look into with lbzip2 too but anyway we do with "-1" for fastest. Doing backup 2.5TB/day (12225 files) which consists of 89 machines (73 vm's and 16 lxc's), the 5 pve's and the fileserver itself in under 2h, it's ok because it's running in the background while for all the vm/lxc it's a combined sleeptime of 11min (going to sleep is biggest part of, then save and wakeup) because we also had 1 broken 10Gb card into cluster yet.
 
  • Like
Reactions: Kingneutron
ZFS dedup for QEMU/KVM VMs has only a very high dedup rate if you go down to 4K volblocksize (or the same size as the guest OS, but most of the time this is 4k), which is non-practical from a performance standpoint and only makes sense if you also have ashift of 9, otherwise you will not have any compression at all. It also requires more ram for the dedup table.
 
  • Like
Reactions: Johannes S
Pbzip2 uses lots of CPU but it's still a bit slow. If you get to the point where you need faster backups, look into pigz

Or zstd with a fitting compression ratio (-19 or -ultra -20 and higher might not be worth it though). xz ist also an option.


Since zstd has up to 19 compression levels it's great to scale for a certain usecase
 
I put all my ISOs on a Samba R/O share (backed by ZFS) that all PVE / hypervisor instances have access to, using soft symlinks

https://github.com/kneutron/ansitest/blob/master/proxmox/symlink-samba-isos.sh
i usually also do something similar to this, i keep a dual bay external with fan and 2x8TB drives connected and use them kind of like live backups and keep all my ISOs on that since they aren't accessed a lot and its usb 3.1 so basically equal to a connected internal dive for throughput at least.
we have several storage tiers.

on the lower end there are simple cheap NFS servers where we pack everything not important, archived, "trashed" (before we delete permanent), ISOs, testing, temporary,....

everywhere else we do not "squeeze" (because running on low storage is a pain and neverending nightmare to sysops)
that is true for sure, im currently pretty low on space and i have to keep moving everything around to fit anything, i need to clear several terabytes out but not sure what to get rid of.
ZFS can utilize several algorithms. I do not play with them, I just use the default. But to squeeze out some percent:
Code:
~# zfs set compression=thisdoesnotexist  rpool/dummy
cannot set property for 'rpool/dummy': 'compression' must be one of 'on | off | lzjb | gzip | gzip-[1-9] | zle | lz4 | zstd | zstd-[1-19] | zstd-fast | zstd-fast-[1-10,20,30,40,50,60,70,80,90,100,500,1000]'
Details are her: man zfsprops, search for "compression="


Build a PBS. There I have ratios of 20 to 40. I doubt that you can (with great success) "ZFS-dedup" compressed backups - as soon as a single byte changes, the complete backup may look "different".

((( And for me it is important to have some redundancy, if possible. I will never run ZFS without it, be it mirrors or RaidzZ2 for specific use cases. So... striped single disks are a no-no. Of course ymmv. )))
i use a mix of ZSTD compression levels 19 on my backups but usually 3-7 on everything else with metadata only to arc since its random rarely accessed data usually.

i have been considering PBS, but i havent got around to it and i only have a spare HP SFF pc with 2x500GB drives i could toss in, ive been debating if its even worth it at all to run a whole PC just for 500GB mirrored zfs as a backup, it doesn't really seem like it will be.
We use pbzip2 (which can use all cores for zip and unzip) and duperemove or in combination of both depends on usecase and all on raid6'ed xfs on directory basis, both tools run as background jobs. Deduped files in xfs are no slower than undeduped ones which made it even very fast solution after.
But as assuming, yes, it's fully bash with cron and not in the pveui ...
:)
with duperemove do you use FDUPES or something to scan for duplicate files to remove?

i've also been looking into x9dedupe and the usage of FICLONE / reflink within zfs that was introduced in v2.2.2 as detailed in the x9dedupe page, just havent got around to it yet, seems like it would be far better than the block level dedup for many files


ZFS dedup for QEMU/KVM VMs has only a very high dedup rate if you go down to 4K volblocksize (or the same size as the guest OS, but most of the time this is 4k), which is non-practical from a performance standpoint and only makes sense if you also have ashift of 9, otherwise you will not have any compression at all. It also requires more ram for the dedup table.
i definitely noticed that one, i got the most out of it at 4k but the slower file transfers, ram usage and all that came with it and nullifying compression kind of nuked any value.
 
i have been considering PBS, but i havent got around to it and i only have a spare HP SFF pc with 2x500GB drives i could toss in, ive been debating if its even worth it at all to run a whole PC just for 500GB mirrored zfs as a backup, it doesn't really seem like it will be.
4TB NAS drives are fairly cheap these days, I wouldn't bother going any lower than that unless SSD

https://www.amazon.com/s?k=4tb+nas+...1:blPtJEizW8qMbPFAH9I4bUCL2pU/AG4wSMSagflxQyw

YMMV with tariffs, taxes, etc and there's a "sweet spot" where you get the most bang for your buck per-TB
 
  • Like
Reactions: Johannes S
with duperemove do you use FDUPES or something to scan for duplicate files to remove?
Yes, using fdupes and no explizit hash files because of everyday there comes a lot of new backup data to compare to previous (25) days and even so 1 day is deleted.