Proxmox Backup Server Garbage Collection Runtime Increases on Every Run

el5network · May 18, 2023

I’ve started testing out proxmox backup server more seriously over the past few months and I had a question regarding garbage collection. It’s more about confirming my setup and my suspicions of what is increasing garbage collection runtimes with more experienced PBS users. Details of my setup are further down.

From my understanding, garbage collection is done in two steps:
- first to identify the chunks still in use and new chunks added
- second to identify chunks no longer used that can be reclaimed.

I noticed that the first phase takes the longest to complete no matter how much data is stored. If the disks are fairly empty then GC takes minutes to a few hours. But as the PBS datastore fills up with backups and the number of chunks increases, GC takes longer and longer. In all cases, the first phase takes substantially more time to complete, ~30x-100x longer than phase 2, regardless of how long the actual GC takes. The numbers depend on how much data was added and how much data can be freed and vary widely.
For example, the most recent garbage collection took just over 19hours:
phase 1: 18h33min, phase 2: 0h37min (~30x difference)
A fairly early configuration with few backups and much fewer used chunks produced a GC run of 1h41min:
phase 1: 1h38min, phase 2: 0h03min (~33x difference)
There are other examples where the ratio is indeed closer to 100x.

I have a setup that is far from ideal and not recommended for production use:
- PBS is a VM running on PVE bare metal
- TrueNAS is also a VM running on the same PVE
- the datastore is actually a dataset on TrueNAS and shared via NFS to PBS
- the TrueNAS server is running zfs and the default lz4 compression is enabled
- the TrueNAS pool is using hard drives in RaidZ1 instead of enterprise ssd’s so there is a major iops disadvantage
- my budget was constrained for this build and it was mainly for testing.

I’m wondering if filling up the other TrueNAS datasets affects the garbage collection times for PBS since the PBS datastore resides in the same pool as those other datasets (and therefore accesses the pool’s global free space). If PBS were the only data in the pool/dataset, would that change anything? Is it better to have dedicated disks/ssds as a separate pool reserved for PBS, or even configuring/passing through some drives directly to PBS?

I suspect dedicating some disks/ssds exclusively to PBS will probably help things somewhat and I will try this as soon as time permits. I know this won’t resolve the iops issue with spinning drives, unless I get large enterprise ssds. I may choose that route if I’m going to make the effort of moving the datastore to dedicated disks.

Because my configuration is non-standard, I couldn’t find recommendations in the proxmox documentation for my specific setup, only that PBS should use ssds because of the large iops and the amount of processing required by PBS garbage collection and pruning.

Extra details of my system/setup:

- PVE host is installed on one ssd in ext4 format (I later realized a mirrored zfs would have been better)
- PBS is running as a virtual machine inside the PVE host (6 cores as [host], memory 8-32gb ballooning)
- TrueNAS Scale VM is also running on the PVE host (4 cores as [host], memory 32gb balloon=0)
- I’ve passed through 5x4TB drives to TrueNAS as a RaidZ1 pool and created a few datasets, one of which being the pbs datastore
- My PBS datastore used for the backups is shared over NFS from the TrueNAS VM
- The other TrueNAS datasets are used to store other data, some other unrelated backups, some media files
- PVE, PBS and TrueNAS are updated to the latest versions

Hardware:
- PVE is running on an HP z620 workstation
- processors: 2x E5-2620v2 (12 cores/24 threads)
- ram: 96GB DDR3 ECC buffered/registered

dignus · May 18, 2023

First you need to identify your bottleneck. Using tools like top, and more importantly, take a look at iostat while it's running. Run iostat -x 1 to see how heavy your disks are loaded. My bet is that iowait is high while %util on the spinning disks is at 100%.

el5network · May 18, 2023

Thanks for your idea.

I ran

Code:

iostat -x 1

and I see that disk %util can reach up to 85%-90%, but never 100%. Most of the time it’s in the 70% range. This actually surprised me since I was expecting closer to 100% as well.
The iowait (from the cpu line) is mostly at 0.00 seconds, and every 4-5 seconds there is an output in the range of 7-13sec.

I also ran it as

Code:

iostat -x 1 -s

just to make it fit on the screen. Some output is below for the spinning drives. What is an acceptable range for for the iowait time? Maybe there is nothing that could really be done once the usage on disk increases.

My pbs datastore is currently processing 345 index files (phase 1) and On-Disk chunks: 530795. I run the GC every two days. I noticed that as the index files are being processed in phase1, it takes longer and longer as it progresses. It starts off with around 3 minutes per 3-4 index files, then increases to 10-20 minutes per 3-4 chunks once it passes the 30% completion mark.

My PBS summary page states its size is 1.88TB, the used part is 1.35TB and available is 529GB. The deduplication factor is 20x. I’m guessing PBS is restricting itself to the chunks it preallocated when I added the datastore and is not looking into the rest of the disk, except maybe for the free space, but that is not the bottleneck.

Code:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    1.76    0.00    0.00   98.24


Device             tps      kB/s    rqm/s   await  areq-sz  aqu-sz  %util
sda               0.00      0.00     0.00    0.00     0.00    0.00   0.00
sdb             160.00    180.00     0.00    2.29     1.12    0.66  76.40
sdc             159.00    184.00     0.00    1.81     1.16    0.54  68.80
sdd             159.00    184.00     0.00    1.89     1.16    0.56  70.40
sde             160.00    184.00     0.00    2.07     1.15    0.62  72.00
sdf             159.00    180.00     0.00    1.57     1.13    0.46  69.60




avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    1.52   10.86    0.00   87.63


Device             tps      kB/s    rqm/s   await  areq-sz  aqu-sz  %util
sda               0.00      0.00     0.00    0.00     0.00    0.00   0.00
sdb             134.00    388.00     0.00    6.02     2.90    1.30  70.80
sdc             135.00    372.00     0.00    4.83     2.76    1.12  74.00
sdd             134.00    380.00     0.00    5.96     2.84    1.33  73.20
sde             144.00    728.00     0.00    5.85     5.06    1.30  71.20
sdf             146.00    468.00     0.00    8.75     3.21    1.81  76.40

el5network · May 27, 2023

After trying various things with pbs to test effects of adding and removing chunks, e.g., creating more backups (adding chunks), pruning older backups (releasing chunks), I believe the garbage collection takes the time it takes because I’m using spinning hard drives instead of ssds even if they are not at 100% utilization. There is no other apparent bottleneck even though my machine is based on 10 year old technology. The low iops of spinning drives are actually creating the performance hit during garbage collection.

Using the dedicated truenas dataset shared over NFS for the pbs datastore does not seem to be the culprit either, even if I’m running pbs and truenas as VMs, but there is definitely a small performance hit for sure that is difficult to quantify because of the multiple virtualization layers. Also, all other types of data and network operations saturate the network bandwidth to this machine and they max out the rated disk transfer speeds, at least for sustained reads/writes.

In closing, it looks like if I want to radically improve gc speeds, I would need to move to ssds (enterprise grade for longevity/reliability). I might choose to use 2x2tb nvme drives in mirror mode and use that separate storage exclusively for the pbs datastore. I don’t know if this will improve backups by 3 orders of magnitude since ssds have at least that many more iops than spinning drives, but I do expect the improvement to be massive either way.

I guess I can consider my issue closed.

SamFredo · Sep 18, 2023

TLDR; Try restarting the PBS.

You've done a good deal of testing and trying solutions. I happen to have come across a similar situation. My PBS is in an LXC container with the datastores set up as ZFS datasets (the regular way of adding a virtual drive on a ZFS pool).

Regardless, I've noticed in the tasks log that the garbage collection job on the same data store took just a couple of hours, while when I've run it manually it took 18 hours and is still at around 6%.
Clearly there's a heavy writing activity to the HDDs during the GC job.

A restart to the LXC seems to have fixed something and now running the GC manually got it done fast as it should.

el5network · Nov 23, 2023

Thanks for this other suggestion, Sam.

My pbs is a VM. I did restart it and let it run a gc automatically, and the entire pve node was restarted at a later point too, but neither helped in my situation.

It really does look that my limiting issue is the very low iops of spinning hard drives, even though they are enterprise drives. Also, my raidz1 pool where the pbs chunks reside is quite full and over 20% fragmented which is probably not helping either. Right now I'm looking to free up space on the array so that zfs has more room to work with, but eventually the only solution that will probably work in my case seems to be SSDs and is also the general recommendation. I will eventually separate the pbs storage from all other storage when I do this.

fabian · Nov 24, 2023

raidz is also a lot worse than mirrors when many IOPS are desired, like with PBS' chunk store..

Search

Search

Proxmox Backup Server Garbage Collection Runtime Increases on Every Run

el5network

New Member

dignus

Renowned Member

el5network

New Member

el5network

New Member

SamFredo

New Member

el5network

New Member

fabian

Proxmox Staff Member

We value your privacy