Untar vzdump file for better deduplication

Newlife

Active Member
Sep 27, 2017
12
0
41
Hello everyone,

I'm testing borgbackup to copy our backups to some off-site storage. With the normal tar file and following ones, the deduplication isnt as effective as it could be. When I extract the vzdump and inserting every file one by one its much more space efficient. My question is now, could I run into problems when trying to tar them back into an archive (with tar cvpf) to restore them in the proxmox ui?

Thanks in advance

Greetings
 
I'll stick to just uploading the tar file, the other method is probably just too complicated
 
Hello there,

I wonder if you can do deduplication at the block level instead, is that possible?

Cheers
 
Hello lhorace,

unfortunately my off-site backup is a cloud solution so I'm not able to deduplicate at the block level.

Cheers
 
You mean other off-site options? Not really. because that is as cheap as it can get. I still have the backups on a raid1 server so this is not the only backup. Was just curious how other people may use borgbackup (or other methods) to save on (cloud)storage space.

Cheers
 
Because you say tar, it suggest you're talking about containers. The logic used for qemu-based VMs is vma, which can also be "uncompressed" and then used for deduplication purposes. It works fine if you have a solution that is capable of doing so, but you always have to use the uncompressed backup.

I talked a lot about this already in other threads so here is a quick summary on what we do:
* ZFS on backup server
* Backup is written to backup server via NFS (PVE-way)
* Backup is then uncompressed and synced via rsync (bad), better own block sync program that only writes new blocks and uses then the CoW features of ZFS.
* ZFS snapshot
* delete old backup

This results in a CoW or VM-deduplication-on-used-block-storage setup that really only stores what has changed in between backups, but stores it as full backups. You can further increase the efficiency by using deduplication, but for VMs you need a deduplication that matches the block size of your VMs and all data has to be aligned properly. This means often 4K blocks on your storage and that is extremely inefficient for deduplication and compression.
 
Thank you for your very informative post.

My bad on not saying what I use. I did mean Container backups yes. I mainly use containers and just one or two qemu machines on my proxmox hosts. On the host that runs ZFS, do you explicity use ECC-RAM ? The hosts that I have dont use ECC-RAM, so I'm kinda afraid to run ZFS on them.
 
On the host that runs ZFS, do you explicity use ECC-RAM ? The hosts that I have dont use ECC-RAM, so I'm kinda afraid to run ZFS on them.

The remark about ECC-RAM does apply to every filesystem, not just ZFS - and you should always have a good backup strategy.
Every server I know does come with ECC-RAM, so yes. I also run PVE with ZFS on desktop hardware with ordinary RAM without any problems for years, but those systems are no 24/7.
 
And I do need to use zfs mostly on systems without ECC RAM in 24h/24 environment for many years without problems at all. I know this is not recommended, but I can not do else ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!