PBS storage suddenly (like over night) full!?!

proxwolfe

Renowned Member
Jun 20, 2020
546
66
68
50
Hi,

I have a PBS running for a while (two years?) now with apporx. 25TB worth of storage.

Regularly, I check how much space is left / when the datastore is going to be full. A couple of days ago, it said that it was running full in a bit of 50 days. So I decided to make room and so I combed through all the backups and deleted a bunch. The GC logs show that 2.4TB worth of data were removed two days ago.

Today, it says that storage is full. I did not add any new machines to be backed up and (see above) I actively removed data. And yet, suddenly, it is full. How can that be?

And, more importantly, how do I free up space now? Because while I can still "delete" stuff, GC won't run anymore and actually free up space.

Thanks!
 
Hi!
note that the "Estimated Full" indication is not very accurate (it's just a linear regression), especially if you don't backup regularly.
Are you sure the GC log said "Removed Garbage" and not "Pending removals". Running the GC once won't remove any data, you have to run it another time after 24hours.
Also which filesystem do you use and why can't you run GC anymore?
 
And yet, suddenly, it is full. How can that be?

assuming the datastore is not shared with other, non-PBS usage - the backup delta was bigger than before, and the datastore was filled up as a result? the estimation can always just be an estimation, there is no crystal ball that tells us PBS how much space future backups will need..

you need to free up enough space by pruning (or adding additional space, if that is an option) to allow GC to complete. I would highly recommend ensuring no new backups are made while you are trying to free up space, else you will have to start over.
 
  • Like
Reactions: ggoller
Are you sure the GC log said "Removed Garbage" and not "Pending removals". Running the GC once won't remove any data, you have to run it another time after 24hours.
Hmm, it wasn't the log but rather the status line. Looking at it again, maybe this is the sum total of all collected garbage ever?

In any case, GC ran and did something.

Also which filesystem do you use and why can't you run GC anymore?
The datastore is on ZFS.

Why GC can't run anymore, I don't know. It complained about the disk being full or no space being left. This would suggest that GC needs some space to run but I don't know how it works.
 
assuming the datastore is not shared with other, non-PBS usage - the backup delta was bigger than before, and the datastore was filled up as a result? the estimation can always just be an estimation, there is no crystal ball that tells us PBS how much space future backups will need..
Theoretically, I agree. But in reality there was nothing that should cause a big backup delta. No new VMs, no new drives in VMs, no large data changes on drives in VMs...

you need to free up enough space by pruning
That's what didn't work. It is my understanding that pruning doesn't remove data but only marks it as removable and GC is the one that actually should remove that data. But that did not work (seemingly because it needs some free space to operate).

(or adding additional space, if that is an option)
Yeah, that's what I ended up doing. I replaced two drives (one vdev) with larger drives. Took a couple of days to complete but now I've got a couple of TBs of free space again.
I would highly recommend ensuring no new backups are made while you are trying to free up space, else you will have to start over.
Yes, I suspended all backup jobs to this PBS and set up an interim PBS inside my PVE cluster with a spare drive I had lying around.
 
Theoretically, I agree. But in reality there was nothing that should cause a big backup delta. No new VMs, no new drives in VMs, no large data changes on drives in VMs...

no large data changes doesn't mean that the backup delta hasn't changed - e.g., if trim is not set up / working.. especially Windows VMs have acted up in the past with regards to that..


That's what didn't work. It is my understanding that pruning doesn't remove data but only marks it as removable and GC is the one that actually should remove that data. But that did not work (seemingly because it needs some free space to operate).

pruning does free up some space because it will delete metadata.. whether that is enough to matter depends n how big your backups are, and how aggressive you are willing to prune ;)
 
@proxwolfe remember to set a zfs quote on the datastore, so you won't run into the same problem again!
 
  • Like
Reactions: proxwolfe
pruning does free up some space because it will delete metadata.. whether that is enough to matter depends n how big your backups are, and how aggressive you are willing to prune
Right. Well, I have tried twice and what little space pruning released was not enough to let GC run.

And didn't want to prune everything in order let GC then remove it all because then - what would have been the point in trying to save the datastore. But I get what you're saying.