[SOLVED] Done my first sync-job from pbs01 to pbs02 - but targetpool would not be filled with same size

fireon · Dec 27, 2020

Hello all,

i use PBS with version 1.0.6-1. I've configured my first sync of a datastore. So configured a Sync-Job and start the Sync-Job. So on source datastore it look like this:

Code:

CT 8 Groups, 20 Snapshots
Host 1 Groups, 0 Snapshots
VM 19 Groups, 47 Snapshots
Deduplication Factor 10.24

13.27% (1.23 TiB of 9.26 TiB)

So on the second backupserver, the sync job was finish and it looked like this that everything was copied fine. On the webinterface i see the same versions. But the values say something else. Look:

Code:

CT 5 Groups, 20 Snapshots
Host 0 Groups, 0 Snapshots
VM 12 Groups, 47 Snapshots
Deduplication Factor 1.00

(192.09 GiB of 14.08 TiB

Both filesystems are ZFS.

Why is this sooooooooo different? Can that really be true?

Very Thanks

fabian · Dec 28, 2020

the dedup factor will only be updated when the next GC has completed. the snapshot count looks okay, the source seems to have a few 'empty' groups which are not synced since there is nothing to sync there..

fabian · Dec 28, 2020

and the usage is just what is returned by statfs, so the difference in usage might be caused by having more than just the datastore on that dataset on the source side? not sure without knowing specifics of your setup

fireon · Dec 28, 2020

fabian said:
and the usage is just what is returned by statfs, so the difference in usage might be caused by having more than just the datastore on that dataset on the source side? not sure without knowing specifics of your setup

Not really. Only PBS has this dataset for it self.

fireon · Dec 28, 2020

fabian said:
the dedup factor will only be updated when the next GC has completed. the snapshot count looks okay, the source seems to have a few 'empty' groups which are not synced since there is nothing to sync there..

Ok, but GC and prune were running...

I check this @next round. At the moment, 8TB VMs are still verifying. And that takes as much time as the backup itself. I already have 8x2TB in the server. RaidZ10. I know is not the fastest.

fireon · Dec 30, 2020

@fabian you was right. "After GC" looks much more better (needed 1day and 6h). GC was failure before. Over 700GB are now deleted. So i have 2 Questions:

GC purges Datas? I always thought that was just doing Prune? Did I get something wrong? Or is prune only delete whole VM backups?
The backup of a 7TB VM took 25 hours. The Veryprocess over 2 days. Is that normal or did it have something to do with the fact that the GC has not been running for a long time?

These are important points for me. Because this process always takes so long., do I have to build a own backup pool with these own HDDs for this one big VM.

Very thanks

fabian · Dec 30, 2020

removing no longer needed backups is a two-step process:

forget/prune the snapshot(s) (either manually, or with a scheduled prune job) - this will just remove the snapshot directory containing the metadata/indices, and is a very lightweight operation
garbage collection - this will remove the actual data chunks, but only if they are not referenced by any still existing backup snapshot/index. this can cause quite some (random) I/O, first to scan all the indices and mark all the referenced chunks, then to sweep and actually delete all the unreferenced chunks

for safety reason, GC will also only delete after 24hours+ have passed since the last snapshot referencing a chunk was removed/pruned (this has technical reasons).

sync will not transfer chunks not referenced anymore (as you could see

). a backup should not be affected by how often you run GC - it will only look at the last snapshot to generate the delta anyway. for VMs, especially big ones, the dirty bitmap can help a lot with cutting down subsequent backups (but beware, it is cleared if the VM is shut off!).

in general, you will see the following:

client has to download and parse last snapshot's index files
client has to read and hash all data (or just those parts marked as dirty if bitmap is available)
client has to upload chunks that are not contained in last snapshot (and tell server about those it intends to re-use from the last snapshot)
server has to write uploaded chunks
server has to write new index files/metadata for current backup

2-4 happen in parallel until all data is processed, 1 and 5 are fairly cheap and probably not where your bottle neck is. if your backup takes long time, it's either because

step 2 takes a long time (slow VM storage, lots of data or no bitmap, underpowered PVE CPU?)
step 3 takes a long time (slow network, big delta?)
step 4 takes a long time (slow backup storage, big delta, underpowered PBS CPU?)

the backup log usually gives some clue (e.g., it displays the amount of actually transferred delta and whether a bitmap was available to cut down the amount of reading and hashing in step 3)

GC does not affect any of the steps above, except that it can of course cause some I/O and CPU load on the server side

fireon · Dec 30, 2020

Very thanks for really good Explanation

now I finally understand

I have changed now:

Destroyed ZFS RaidZ10 and created new ZFS Raid10 with the 8, 2TB Disks. FIO for that Pool is ok. There is now the normal Backuppool
Pool configured also with automatic verification
8TB HDDs (Seagate Archive HDD) prepared to use for this one big VM. (Single disc oder Mirror, haven't decided yet)

Of course I want to get the best out of here too. So I've checked (singledisk) with FIO, with ZFS, Ext4 and XFS.

Code:

ZFS write bs=1M and an 30GB File = 45MB/s --> read 164MB/s (Without Cache)
Ext4 and ZFS similar write = 15MB/s --> read 177MB/s

Number of Jobs, only one. Because one VM.

What would you recommend for PBS Storage as a single disk file system?

Supplement: Ext or XFS has the advantage that i can fill the filesystem completly. ZFS only to 80/85% on HDD's.

fireon · Jan 4, 2021

I have now decided on a Raid 1 with ZFS. 2x8TB Archive HDD from Seagate 7200 and 256MB cache. The difference is huge. The VM was backuped on the earlier RaidZ10 (Raid5) in a time of 25 hours. On the other hand, the same process required 66h on Raid1. Well, the amount of HDD's makes it. Fortunately, the differential backup continues now

I will not verify the VM, it will probably take a week.

The third HDD is installed in the data center. Singeldisk with Ext4. Then sync the differences. Well then let's see.

And go on

Btw: Would an SSD speed up the verification process significantly?

fabian · Jan 7, 2021

verify needs to read and checksum all the data, so it's as fast as your actual data storage and CPU. pruning and GC benefit a lot more from fast metadata access (or lots of RAM to cache things).

fireon · Jan 7, 2021

Yes. Thanks. Memory should be enough. 32GB I'm still in the middle of the tests. There will be a report at the end of next week

fireon · Jan 15, 2021

Ok, here now some numbers

Test with one VM 7TB, 5.7TB allocated

Two Proxmox Servers:
1. Here are this one VM (UCS 4.4.x) Rootdisk on Mirror Intel SSD DC S4500 with ZFS (32GB), the second virtual hdd is on an Raid10 with 10 WD RED Pro 2TB. All ZFS. Gigabit.
2. This is the Backupserver. Pool RaidZ10 with 8 mixed 2TB HDD's, WD Red, Green, Blue... and a second pool with two 8TB Seagate Archivedisk 5900/128MB.

If i backup the big VM on this pool with RaidZ10 the first time, required the operation 26h. Verification 53h. i/O wait on 90% On the second try I copied the VM on two 8TB disks. This needs 66h. Verification not possible. It needs too long.

The second backup of this big VM (only changes) takes only 9 Minutes for about 50GB. Nice. After this operations i start a sync with all this VM's to another Proxmox Backupserver outside in a datacenter. This works perfectly.

Search

Search

[SOLVED] Done my first sync-job from pbs01 to pbs02 - but targetpool would not be filled with same size

fireon

Distinguished Member

fabian

Proxmox Staff Member

fabian

Proxmox Staff Member

fireon

Distinguished Member

fireon

Distinguished Member

fireon

Distinguished Member

fabian

Proxmox Staff Member

fireon

Distinguished Member

fireon

Distinguished Member

fabian

Proxmox Staff Member

fireon

Distinguished Member

fireon

Distinguished Member

We value your privacy