How do reduce/optimize backup sizes

thimplicity

Member
Feb 4, 2022
89
14
13
45
Hi everyone,
I am doing local backups via PBS, but I also create vzdump backups that I back up to OneDrive via restic. I have tried to optimize backup sizes with cleaning up docker, fstrim as well as activating "ssd" and "discard" for the respective VMs.

I have multiple VMs that confuse me:
  1. Server: 50GB disk with ubuntu and multiple docker images. According to "df -h" 23GB free out of 48GB. The vzdump backup size: 12 GB
  2. Server: 100GB disk with ubuntu and multiple docker images. According to "df -h" 41GB free out of 98GB. The vzdump backup size: 76 GB
  3. Server: 200GB disk with ubuntu and multiple docker images. According to "df -h" 189GB free out of 197GB. The vzdump backup size: 51 GB
Two questions:
  • Why is there such a big difference between free/used space and the backup size
  • What else can I try to reduce backup sizes?
Thanks!
 
You need to wipe the formerly used disk space as it is not zero'ed out yet. So that means an image backup will still see data there even though the files have been deleted. Added: fstrim should be able to do this, but I know the dd method works for me YMMV.

See https://knowledge.broadcom.com/external/article/340005/reclaiming-disk-space-from-thin-provisio.html

TLDR:
Windows (live.sysinternals.com)
sdelete64 c: -z
Linux (Replace /tmp with other folder if you have multiple partitions/volumes. Be very careful with dd.)
dd if=/dev/zero of=/tmp/zeroes && rm -f /tmp/zeroes
 
Last edited:
  • Like
Reactions: StreetPiet
Can you check the log of vzdump?
It should tell you how many data was found and tell you something about the compress ratio.

Code:
101: 2024-11-30 01:41:32 INFO: backup is sparse: 281.66 GiB (55%) total zero data
101: 2024-11-30 01:41:32 INFO: transferred 512.00 GiB in 2486 seconds (210.9 MiB/s)
101: 2024-11-30 01:41:33 INFO: archive file size: 203.33GB


Code:
INFO: backup is sparse: 290.06 GiB (75%) total zero data
INFO: transferred 384.00 GiB in 965 seconds (407.5 MiB/s)
INFO: archive file size: 59.31GB

Code:
100: 2024-11-30 01:14:53 INFO: backup is sparse: 636.55 GiB (84%) total zero data
100: 2024-11-30 01:14:53 INFO: transferred 750.00 GiB in 886 seconds (866.8 MiB/s)
100: 2024-11-30 01:14:54 INFO: archive file size: 89.61GB

Try a 'fstrim -a' on your guest.

On windows guest, the defrag tool has memory issues in combination with virtio.
You have to manually overwrite the discard granularity for your guest machine.* See article here:
https://www.alldiscoveries.com/how-...frag-on-windows-server-virtualized-with-qemu/



* in case the article will be removed:

cd /etc/pve/qemu-server/
vi <id>.conf
add the following line:
Code:
args: -global scsi-hd.discard_granularity=2097152
 
Last edited:
Hi everyone,
I am doing local backups via PBS, but I also create vzdump backups that I back up to OneDrive via restic. I have tried to optimize backup sizes with cleaning up docker, fstrim as well as activating "ssd" and "discard" for the respective VMs.

I have multiple VMs that confuse me:
  1. Server: 50GB disk with ubuntu and multiple docker images. According to "df -h" 23GB free out of 48GB. The vzdump backup size: 12 GB
  2. Server: 100GB disk with ubuntu and multiple docker images. According to "df -h" 41GB free out of 98GB. The vzdump backup size: 76 GB
  3. Server: 200GB disk with ubuntu and multiple docker images. According to "df -h" 189GB free out of 197GB. The vzdump backup size: 51 GB
Two questions:
  • Why is there such a big difference between free/used space and the backup size
  • What else can I try to reduce backup sizes?
Thanks!
You mentionex you use restic to upload vzdump to onedrive. Did you realize thay de-duplication does not work? So if you did a backup of a vm 5gb today and added 100mb of data tomorrow and do another backup you will upload another 5.1GB tomorrow instead of just 100MB

I am asking cause I did some testing today and found out this problem with my strategy to use restic for off-site backups and I want to see if you have the same issue. A quick search lead me to this https://forum.restic.net/t/backup-vma-proxmox-with-restic/1864 which confirmed my findings
 
You mentionex you use restic to upload vzdump to onedrive. Did you realize thay de-duplication does not work? So if you did a backup of a vm 5gb today and added 100mb of data tomorrow and do another backup you will upload another 5.1GB tomorrow instead of just 100MB

I am asking cause I did some testing today and found out this problem with my strategy to use restic for off-site backups and I want to see if you have the same issue. A quick search lead me to this https://forum.restic.net/t/backup-vma-proxmox-with-restic/1864 which confirmed my findings
Looks like I am having the same issues - did you find a resolution?
 
Yes, I have that locally, but I still would like to have a cloud backup (obviously in another location)
Yoz could use a vserver or pbs cloud provider like tuxis.nl or cloud-pbs.com. i use a cheap vserver at netcup with PBS for vms and restic with a hetzner storagebox for data
 
I am just using rclone to copy the vzdump from pbs to the cloud. And pbs encrypts the data so I dont need to worry about that otherwise I would have used restic
Do you actually use rclone on the PBS datastore or the vzdumps vma-Files made with ProxmoxVEs native backup function? rclone on vma-Files should be fine. Using rclone to sync the PBS datastore is not something I would recommend due to reports about broken datastores after a rclone sync:
 
Do you actually use rclone on the PBS datastore or the vzdumps vma-Files made with ProxmoxVEs native backup function? rclone on vma-Files should be fine. Using rclone to sync the PBS datastore is not something I would recommend due to reports about broken datastores after a rclone sync:
i was uploading the datastore did a wuick restore of one of my vm's and it worked the example you sent seems that they were encrypting the data with rclone which I wasn't doing.

But I am not taking chances I do a copy of my PBS server to a local NAS. So now I will try to upload that to the cloud. And in the case of needing to restore I would restore PBS than restore the VMs from PBS. Thinking It's actual much simpler this way
 
  • Like
Reactions: Johannes S
After more testing I did run into some issue with rclone backup of the datastore. The verify and gc tasks kept failing and I was not able to perform a backup of one VMs due to one of the temp files or folder being missing in the .chunks folder. I couldn't figure our a solution so I restored pbs and decide to give restic a try to backuo the datastore.

Did 5 or so different tests with restic, backing up and restoring the backup, destroying pbs and recreating the datastore tried cutting the connection mid way through the backup process and continued afterwards and upon restoring verify and garbe collectiondworked and was able to restore vms and backup vms as if nothing happened. The plus side of restic is that it keeps permissions not need to do chown -R as in the case of rclone. Plus I have been using restic for some time noe and I feel comfortablr using it.

I'll keep testing over the coming months with more data and report back. But if it works I will be very happy cause restic is amazing and it was my initial plan to backup proxmox but since it cannot dedupe vzdumps I must use pbs as an intermediate.

I am glad I gave pbs a go cause it makes backup and restoring such an easy process and I am still not taking any chances and will be keeping pbs backed up locally cause again realistically it should be enough but it still good to have a cloud backup just in case everything else fails
 
  • Like
Reactions: Johannes S
Quick question related to using restic for deduplication (to implement this was on the top of my backup strategy list as it would allow for a kind of incremental backup without the need to run a Backup Server - a data storage would be sufficient). @ArionWolf and @thimplicity you found out dedup does not work on PVE backups. Did you only try with compressed full backup files? Or also with non-compressed? I'm asking as the compression algorithm might completely change the bit patterns in a backup file, making it impossible for restic to detect unchanged chunks. I'm therefore wondering, if restic dedup could work on uncompressed full backups. If you have tested this already, no need for me to do the same to find out the same :-)
 
Last edited:
Quick question related to using restic for deduplication (to implement this was on the top of my backup strategy list as it would allow for a kind of incremental backup without the need to run a Backup Server - a data storage would be sufficient). @ArionWolf and @thimplicity you found out dedup does not work on PVE backups. Did you only try with compressed full backup files? Or also with non-compressed? I'm asking as the compression algorithm might completely change the bit patterns in a backup file, making it impossible for restic to detect unchanged chunks. I'm therefore wondering, if restic dedup could work on uncompressed full backups. If you have tested this already, no need for me to do the same to find out the same :-)
What do you mean uncompressed backup? I do not believe you can do uncompressed backups with pbs. Are you referring to vzdump?

The problem with vzdump is that it does not do incremental backup it is a full backup so it takes a long time and usually I have maybe 100MB added/deleted everyday so it does make sens to take a completely new backup everyday
 
What do you mean uncompressed backup? I do not believe you can do uncompressed backups with pbs. Are you referring to vzdump?

The problem with vzdump is that it does not do incremental backup it is a full backup so it takes a long time and usually I have maybe 100MB added/deleted everyday so it does make sens to take a completely new backup everyday
On PVE you can chose if you want the backups to be compressed or not.

1746213914980.png

My thought is, if compression = none leads to more or less to a "dd" like result, restic could have a chance to dedup. And deduplicating on a full backup would be kind of a incremental backup. Restic then would compress, encrypt and transfer only the chunks that have changed.
 
Last edited:
On PVE you can chose if you want the backups to be compressed or not.

View attachment 85660

My thought is, if compression = none leads to more or less to a "dd" like result, restic could have a chance to dedup. And deduplicating on a full backup would be kind of a incremental backup. Restic then would compress, encrypt and transfer only the chunks that have changed.
What is you target storage in this screenshot is it set to local? If you choose local than yes you can have an uncompressed backup. But that is different to backing up to a proxmox backup server the '.chunks' created by pbs cannot be deduped by restic.

When you backup to local you are effectively using vzdump which does not so incremental backups as pbs does.

If you are willing to stick to vzdump I believe restic would dedupe not sure of I had tested it but again vzdump takes longer than pbs to do the actual backup and use more space on the disk. So I am using pbs to keep local backups of my VMs and restic backups up my most important date from the vm to the cloud daily. Then periodically I do a vzdump and use restic to sync that to the cloud as well.
 
Yes, that's what i had in mind: backup to a local drive, uncompressed, from PVE. Only keep the last generation local, using restic to dedup, encrypt and transfer to a cloud storage, making snapshots. So I got daily increment snapshots on a cloud storage. Even with using the stop method the downtime of my VMs is short, a coule of seconds plus the reboot time. For me that's perfectly fine.
I have to add, I'm note using PVE to backup files, this job is done by restic directly on the filesystem. So PVE only has to deal with the system disks of the VMs which are rather small and do not change too much.
 
  • Like
Reactions: ArionWolf