I can't see backups in the webapp, but files exist...

cypherfox

New Member
Aug 26, 2025
7
3
3
Greetings,
Just to start it off, I'm running it in an unsupported configuration, and I know that. I don't really have the hardware to dedicate to a standalone PBS server, or to afford the enterprise license.

My backup jobs are regularly getting ERROR: VM 105 qmp command 'backup' failed - backup connect failed: command error: http upgrade request timed out as part of their status in Proxmox, although it varies which VM is getting it, so...I get a backup of each VM every few hours, and a bunch of failed ones.

I mention that because it might be related to the real concern I have. If I go into a VM's page on Proxmox VE, select Backup in the sidebar, and choose my Proxmox Backup Server as the Storage area to see backups for...it fails with a spinny and communication failure (0) . This means I can't really restore backups, which makes me nervous.

BUT...if I go to the server where it's installed, and look in the directory where the backups are being saved, I can see (for example):
Bash:
me@hostname:~$ ls -l "/hdd1/backups/vm/105/2025-09-08T10:30:27Z/"
total 16400
-rw-r--r-- 1 backup backup      583 Sep  8 03:30 client.log.blob
-rw-r--r-- 1 backup backup 16781312 Sep  8 03:30 drive-virtio0.img.fidx
-rw-r--r-- 1 backup backup      406 Sep  8 03:30 index.json.blob
-rw-r--r-- 1 backup backup      390 Sep  8 03:30 qemu-server.conf.blob

And there's a LOT of those. So I'm fairly sure it's actually backing it up, but I'm also pretty sure it can't...load the backup list.

My current hypothesis is that I never set up pruning and garbage collecting, and so I have a really, really large amount of backups for all my VMs and containers. Like one for every hour for all of this year. This is backed up by a recent (3 day long) Garbage collect which found nothing to remove, and finished like this:
Code:
2025-09-05T06:27:10+00:00: processed 99% (9511554 chunks)
2025-09-05T06:27:32+00:00: Removed garbage: 0 B
2025-09-05T06:27:32+00:00: Removed chunks: 0
2025-09-05T06:27:32+00:00: Original data usage: 41.305 PiB
2025-09-05T06:27:32+00:00: On-Disk usage: 8.581 TiB (0.02%)
2025-09-05T06:27:32+00:00: On-Disk chunks: 9608028
2025-09-05T06:27:32+00:00: Deduplication factor: 4928.96
2025-09-05T06:27:32+00:00: Average chunk size: 958.992 KiB
2025-09-05T06:27:32+00:00: TASK OK

Which, if I'm reading it right, indicates 31 Petabytes of data, but because of deduplication it's only 8.5 Terabytes. Which...probably means that it's taking a really, really long time to go through all the backups for that VM to be able to display them. Now...I'd really like to suggest that the backup server paginate the result set, as /hdd1/backups/vm/105 has 3,421 entries currently, so maybe just iterating on the first 100 of them, sending that, and then paginating as the user scrolls down might be more performant?

I have set up a prune job that should keep only around 50 (between all the different time scales) of my backups per VM/CT, which should help...eventually. (Running the prune job stopped the service from responding to...anything, and spiked the load average, so that might take a while.)

But what I really want to know is...is my intuition correct, that this is a 'too much data' problem, or is there something else that might be responsible for my issues?

Thanks muchly!
 
Circling back around on this, the answer is...yes, it was a 'too much data' problem. Setting a pretty wide net of prune (keeping around 50 snapshots over time per VM, around 2,000 snapshots currently) cleared ~95,000 snapshots, and the subsequent GC cleared out 5.6TiB of old data. It's doing a 'verification' run now, which still takes a long time, but everything's responding much snappier.

I'm keeping { last: 12, hourly: 12, daily: 7, weekly: 12, monthly: 9, yearly: 3 } which is probably still overkill, but it brought it down to a sane level. It might even be low enough that I can kopia the backups to BackBlaze or something off-site for a solid 3-2-1.

Anyway, this is an example of the phrase, 'When all else fails, at least you can be a bad example.' Always set up pruning and GC.
 
  • Like
Reactions: UdoB