Q: PBS and LXC container backup speeds

JAN.26 BACKUP VM101 - SUCCESS-FAST
While doulbe checking your logs, I did note that this backup did not use the change detection mode. So this leads me to the question, did you ever run a full backup after changing the backup jobs to change detection mode metadata to completion? The speedup can only occur after an initial full backup run, which will not reuse the chunks of the snapshots created using the default mode, as the archive file format is different (split data and metadata).

I think that is your issue here after all.
 
Hi, here is view in PBS for VM101, not sure if this gives insight about anything. The jan-26-2025 is my last good backup, which is this snip view below.

view.png

As I generated this snip, I also wonder

- is there maybe a timeout on client site <> PBS Client <> talking to PBS server <> which allows it to validate the presence-absence of what it wants for a metadata vs no-metadata file present-absent? Because I also note, my PBS server is very 'pokey'. When I interact in the WebUI. Initial look in my 'content' for the datastore. Nothing is visible at first. Then I ignore it and come back in 2 minutes, and it has populated content. But it takes >1 minute to get stuff visible.

Similarly, when I look from a proxmox node > Inside a VM say 101 > look in 'backups' area and then in 'PBS_STORAGE_NAME" backups
it will generate a list of backups that exist on PBS
but it is darn slow - more than a few minutes. often it will just return a 'timeout error' instead of showing me list
which makes me wonder if, my PBS host is a bit weak in CPU_RAM and so sluggish to parse and generate this view?
if this view is re-generated dynamically every time, and is somehow more intensive, if I have lots of objects in the datastore?
and if this might be a factor?
or maybe 100% not related, just wanted to mention

thank you,

Tim



one thing I wonder.
 
Ah sorry, I did overlook your previous message, well this is what I meant above, you will have to place at least each standalone PVE host and cluster (in the sense of a PVE cluster) into dedicated namespaces, otherwise you will run into naming conflicts as you described here, see https://pbs.proxmox.com/docs/storage.html#backup-namespaces

Hi Chris, I am still confused, for clarity.

In this situation I have one proxmox cluster, one backup config panel for this cluster and three jobs exist therein.
I have 7 proxmox node inside this cluster
they all talk to the one PBS host
there is no 'standalone proxmox host' which is outside this cluster which communicates with this PBS
so I think? I have a single coherent namespace in this configuration, and that I should not have problems with collisions.

please can you confirm if I am understanding correctly (or not?)

thanks!

Tim
 
Apologies to those reading this / for my use of sloppy syntax in this thread, ie, I'm calling things here in this thread a "VM" when that means precisely a KVM VM and that is definitely not remotely what I am trying to talk about here. I am speaking only of "Container LXC based guests" and I was being lazy and calling these things "VM" and indeed, that is confusing if you believe "VM == KVM". Sorry for conflating the terms. --Tim
 
So let me try to wrap things up a bit:
  1. You have a PVE cluster, with multiple nodes all backing up to the same PBS. That is fine and should not give you issues. Although, you should consider using namespaces in the future, to avoid naming conflicts if you decide to backup from other hosts/clusters as well in the future. But these might be added later on as well.
  2. Your issue is that you never let a backup with change detection mode set to metadata run to completion. Note that the first, initial backup will be slow and backup all the data of your LXC again, as it cannot share the chunks with the backup of the snapshot create without the change detection mode set. Only the subsequent run can truly take advantage of the metadata comparison speedup. What you observed as fast so far is the incremental backup, meaning that only data chunk not already knonw from the previous backup snapshot are re-uploaded again. The chunks are re-generated however by re-reading the full data on the source side. Latter is the issue the change detection mode metadata was introduced for, to not have to re-read and re-chunk all the data on the source side. So there are 2 optimizations in place, only upload new data chunks (incremental mode) and only read changed file contents (change detection mode metadata). Hope this clarifies your concerns.
  3. The slow listing of the contents is unrelated, see https://bugzilla.proxmox.com/show_bug.cgi?id=3752 for details
 
  • Like
Reactions: Johannes S
Hi Chris, thank you. So just to check and to clarify.

1) Right now my deployment model, is to setup a new PBS host any time I setup new related proxmox/cluster requirement. This gives me clean separation of resources (ie, disk, bandwidth, CPU-RAM) between any given Proxmox cluster and PBS that services those backups. Hence for my deploy model, I prefer to not make multiple namespace on single PBS, since by design I am using OneClusterWorkload<>OnePBS-Instance.Only.. But I do appreciate that if I had one PBS host which is shared with multiple PBS cluster, then multiple namespace / one each per cluster / would be important. (!!)

2) My confusion I guess is - my backup job on Jan.26 took ~2 hours. And then 2 attempts since then, have both taken >8 hours and still not finished. I don't understanding why things were faster on Jan.26 yet >4x slower (and not completing) in later attempts.

Towards this end > I think what you are saying - is that if I wait patiently, and tolerate a ~?24hour? maybe? downtime on my LXC Container101. And allow it to successfully complete a metadata backup mode task (say this upcoming weekend). Then I may hope for all subsequent backups to proceed "more in the timeline of 2 hours not 24 hours" to complete (?)

3) re slow listing - thank you for confirming this is definitely unrelated. Sorry for muddying the water.

Tim
 
Last edited:
arrgh, sorry chris. I am re-reading thread now and I see I had missed your one point. a few posts back.

You are suggesting the job on Jan.26 ran successfully and fast, without metadata mode. And it took ~2 hours approx.
And then I am trying to do metadata backup jobs since then.
I was under the impression my job from jan.26 was a metadata-based backup job, but your look at logs - you say "no that is not the case, it is a regular-mode backup".

arrgh. So, I am both confused, and happy we have progress.
-- I don't understand how the 'normal' backup went so quickly on Jan.26
-- I do appreciate, that you are saying, "first time of doing metadata mode, is starting from scratch with no data re-use from prior, due to different archive format"

So, long and short of all this> I think

-- I need to patiently schedule a 24 hour outage for this big LXC_CONTAINER to run a metadata-based backup
-- it will take a long time, maybe between 12-24 hours, I am not certain
-- and then in future, subsequent backup jobs will be faster. (ie, less than 2 hours maybe)
-- and I can update the thread in a week or two more likely once I have update to confirm endgame.

sorry for all the drama in sorting out this mess.

I do appreciate your help!

Tim
 
  • Like
Reactions: Johannes S
Yes, you can calculate a rough estimate: according to the task log you posted, the backup job uploaded about 175GiB in ca 8h, so 602GiB should take about 27.5h? But you do not strictly need to perform the first backup run in stop mode, you could use the suspend mode instead, if you have enough storage space for a tmpdir of the required size, see https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_backup_modes

On a side note: You should also consider if this is acceptable for your recovery times, as a restore will also have a similar order of magnitude in duration I would guess.
 
  • Like
Reactions: Johannes S
Hi, thank you for this added info. I agree, it is not fast. (ie, ~>24 hours to do a full restore). I believe it is acceptable for the recovery time for this client.
Drama for backup - I don't have enough free space local cache to do tempdir - suspend mode. So I am kind of stuck here with my choices for this backup.
hence my attempt to optimize things with the metadata mode backups
 
  • Like
Reactions: Johannes S