Backup size

Kirani · Oct 11, 2022

Hi, I am currently running a backup job from from two PVE 7.2-11 nodes to a PBS 2.2-6 server.

As an example, one VM is running Windows server with one 35GB disk, and one 300GB disk. The 300GB disk is formatted empty.

Storage on the PVE node is LVM-thin and shows minimal usage, in line with the actual data on disks, however a backup of the above VM is 360GB is size.

The 300GB disk has SSD emulation and Discard enabled, and is running Windows 10.

From what I have read I shouldn't need to run any manual trim commands; why is the backup still so large? Or is that size not the true backup size?

This is the backup configuration:

Code:

()
agent: 1
boot: order=scsi0;net0
cores: 2
memory: 4092
meta: creation-qemu=6.2.0,ctime=1657416664
name: DELTA
net0: virtio=CA:15:1C:4C:B5:CD,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: win10
scsi0: local-lvm:vm-105-disk-0,discard=on,format=raw,size=35G,ssd=1
scsi1: local-lvm:vm-105-disk-2,discard=on,size=300G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=f4047ad8-2e28-4e6b-88ff-55c4f1161caa
sockets: 2
vmgenid: 2a0548d4-edac-447f-8a47-6087eb2d704e
#qmdump#map:scsi0:drive-scsi0:local-lvm:raw:
#qmdump#map:scsi1:drive-scsi1:local-lvm:raw:

Thanks for any help!

Dunuin · Oct 11, 2022

Where do you get those numbers? PBS doesn't know how big a backup is and only shows the size of the raw storages you backed up. It just knows the size of the complete datastore where it can't be completely differentiated what data belongs to which guest because everything will be chopped in 4MB chunk files and there will be duduplicated across guests. So multiple guests can use the same data.

For a rough estimation you could have a look at the logs of the backup job after backing up a VM. It should tell you how much of the data doesn't had to be send to the PBS because its already stored there (deduplication) and how much of the data was empty data/zero blocks which won't use up any space.

Kirani · Oct 11, 2022

I was looking at the backup in PVE Datacenter > Node > VM > Backup > Size

My datastore usage in PBS seems high for the data I have on the VMS so trying to narrow down how it is being used

Dunuin · Oct 11, 2022

You can ignore all those size numbers in the webUI except for the total datastore size.

You should check if discard is working and if prune and GC jobs are run regularily. Keep in mind that PBSs GC task will only delete data that was pruned atleast 24 hours and 5 minutes ago. So a GC right after a prune won't free up any space.

Kirani · Oct 11, 2022

What would be the best way to ensure discard is working? I can't see where I can get an accurate measurement of how much disk space is being used for the VMs either in PVE or PBS.

If I total all those webUI numbers for the VMs being backed up it shows around 1.2TB, and my PBS datastore shows 1.07TB used, so not for off. I know you mentioned that they should be ignored, I was just looking to compare.

If I total all the space used inside the VMs - either Windows or Linux - it is only 525GB. I have set retention to only a single backup for the time being, too, to try and help analyze.

From the numbers it looks like the entire raw disk is being backed up for each VM instead of just the data used. I did try running optimize in Windows on the server with the largest disks to try and trim, but it didn't make a difference.

I did remove unneeded disks from one backup last night, so I wonder if the 24 hour wait for GC deletion is playing a part, and I need to wait for that.

Kirani · Oct 11, 2022

I assume this shows discard is working correctly:

Code:

#  lvs -a | egrep 'LV|vm-105-disk-2'
  LV              VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  vm-105-disk-2   pve Vwi-aotz-- 300.00g data        0.03

Kirani · Oct 13, 2022

Doing some more digging on this, I checked the LV information for each of the disks being backed up:

Code:

LV              VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  vm-104-disk-0   pve Vwi-aotz--  32.00g data        69.14                                 
  vm-105-disk-0   pve Vwi-aotz--  35.00g data        79.43                                 
  vm-105-disk-2   pve Vwi-aotz-- 300.00g data        0.39                                   
  vm-110-disk-0   pve Vwi-aotz--  64.00g data        64.47
  vm-102-disk-0   pve Vwi-aotz-- 155.00g data        70.02                                 
  vm-102-disk-1   pve Vwi-aotz-- 305.00g data        57.61                                 
  vm-102-disk-2   pve Vwi-aotz-- 305.00g data        50.90

Calculating those data percentages, this totals to around 530GB, which is the same total I get from looking inside each guest VM as the space used. This suggests to me that discard is working correctly.

I have one backup job that has retention set to keep-last=1, but my repository is showing as 1.12TB used.

If I look at the backup job configuration for each VM and combine the disk information:

Code:

scsi0: local-lvm:vm-104-disk-0,discard=on,format=raw,size=32G
scsi0: local-lvm:vm-105-disk-0,discard=on,format=raw,size=35G,ssd=1
scsi1: local-lvm:vm-105-disk-2,discard=on,size=300G,ssd=1
scsi0: local-lvm:vm-110-disk-0,discard=on,size=64G,ssd=1
scsi0: local-lvm:vm-102-disk-1,discard=on,size=305G,ssd=1
scsi1: local-lvm:vm-102-disk-2,discard=on,size=305G,ssd=1
scsi5: local-lvm:vm-102-disk-0,discard=on,size=155G,ssd=1

This totals 1.2TB.

It looks to me as if the full disk is being backed up, regardless of discard being configured and working. Is this possibly the issue?

Search

Search

Backup size

Kirani

New Member

Dunuin

Distinguished Member

Kirani

New Member

Dunuin

Distinguished Member

Kirani

New Member

Kirani

New Member

Kirani

New Member

We value your privacy