Q: PBS and LXC container backup speeds

fortechitsolutions

Renowned Member
Jun 4, 2008
469
56
93
Hi, I am just posting to make sure I have my understanding on this topic correct.

Core issue - relates to backup speeds for LXC containers using proxmox and PBS/latest.
Most of the time, most LXC Containers I am running are ~relatively small (50gig or less; often smaller - 20gig etc)
I have one proxmox host I manage with a single LXC container that is a lot bigger than this (500gig)
and the backups are really slow to PBS. (ie, 12 hours approx)

This is arguably ~meh spec equipment, but KVM VM on same setup/location are working much more smoothly. So I think in part I'm just seeing the simple thing that

KVM backups with PBS <> have "CBT change block tracking" style backups / more efficient / deltas can be captured and backed up (only)
LXC backups - don't have any such measure, so you always end up doing a multi-step rsync > topup rsync > then compare this latest backup data blob vs the bobs in PBS and sending over deltas. But you basically need to look at all the blocks, prep the backup (local cache dir) and then do the diffs (effectively) to see what needs to be backed up > and then it happens. And so - for a box with - ~meh performance local cache disk space (ie, SATA raid mirror) you basically get ~meh performance for PBS<>LXC Backups for a big VM like this. Even if the nightly deltas appear to be trivial once the dust all settles (ie, looking at logs I think I am averaging about 650megs of new data for each daily backup for this 500gig VM)

So. Kind of just sanity checking this.
1) is there any near-term feature inbound for example upstream LXC "change block tracking" similar feature on the horizon? which presumably would make LXC backups in this scenario a lot faster if that was an option?
2) just give up on large LXC containers, basically assume once I am above 100gig disk footprint, go with a KVM based VM and that way I'll get block change track and backups are faster and life is just better?
3) any other suggestion? I did find a few things via mister google search, I think at least one person has made a sneaky wrapper script that will do snazzy things with under-the-hood snapshot (ie, LVM snapshot on underlying storage where LXC container storage resides) and doing sneaky things this way for making the backup performance not suck. But it feels like I'm opening a new can of worms if I start trying such methods. My preference is generally 'keep it simple" to minimize the risk of human 'oops, broke it' etc situations.

for clarity in case it is helpful, this specific setup in question
3 physical hosts - Proxmox small cluster - 2 proxmox nodes, and one PBS node
first older proxmox host is a dell - rackmount - has blend of perc hardware raid, plus bcache SSD to make SATA data pool less-slow (in theory)
newer-ish proxmox host is a supermicro rackmount, has pure SSD (2.5" drives) hardware raid LSI controller - basically - but a smaller storage pool size for VM
PBS host has a big honking raid5 sata SW raid storage config
they are all connected (ie, cluster comms / PBS<>backups comms) - with ~commodity cheap (ie, 'best you can get on amazon for modest price point") 10gig gear
so the 10gig performance is - better than 1gig but not 'best of class' performance 10gig by any means.

KVM backup performance - is fine on this hardware (ie, most VM backups take minutes not hours)
Proxmox performance is generally fine. Clearly the proxmox host with big honking SATA raid / bcache - is not as fast as pure SSD raid, But it is sufficient for suitable workloads (ie, general purpose file server which has a modest pool of client computers talking to it at 100-1000mb speeds via vanilla gig ether public interface)
LXC backups for small VM - is ~fine
LXC backup for big VM is - really quite painful. So this is the thing I am guessing I must change.

thank you if reading has made it this far!

Tim
 
oh, my, that looks very interesting. Thank you! Can you clarify where-how I might make the config tweak to use/try this new metadata flag? I can't quite tell if I might enable it for backup for just one VM, vs all backup on a given proxmox node, vs all backups on a given PBS host ? Sorry if I'm missing super obvious thing?! thank you. -Tim
 
I can't quite tell if I might enable it for backup for just one VM, vs all backup on a given proxmox node, vs all backups on a given PBS host ? Sorry if I'm missing super obvious thing?! thank you. -Tim
If I recall correct in the options of the backup job in PVE. Should work in the lxc as well as the backup job settings
 
Hi, just to followup on the thread, after poking at settings and config for a few days. Endgame summary from this story, in case it helps anyone else.

  • the settings are made in proxmox - backup - job config for the job in question - there are tabs along the 'backup job' webUI settings I had not noticed before, they must be 'new' (ie, in the last few years?) so - I didn't see them despite them being in plain sight. So, turning on the 'metadata' backup mode is easy in here via advanced backup settings config tab / right most tab I believe.
  • once that feature was turned on for the backup job, then it does change how the backup LXC VM task behaves. in a good way.
  • for my particular VM in question on this thread - first tweak I made - just to turn on metadata backup based method - it had only very modest help because my VM Backups are hampered by other factor. Basic scenario - VM 500gb lives on ~slow SATA Backed storage. Without snapshot support. Doing backup in anything but 'stop' mode - results in a 2-step rsync where we have (a) initial rsync while VM is online. then (b) pause VM to do a top-up rsync and then (c) resume VM and then kick along the final data xfer from local-cache-tank-place this intermediate rsync-pbs-backup-blob was stored -> push it up to PBS server. This process works fine; but; it is SLOW on my setup since I had a local temp dump dir configured on - drum roll - the same slow sata back storage pool where the lxc vm lives. So general IO stress means this rsync-rsync 2-step thing takes >10 hours. for a 500gig VM. Presumably with a boat load of tiny files inside the VM, is why the horrid little thing is so (!!!!) Slow
  • so, the next fun step which is giving me a better outcome. Give up on the 'zero or minimial downtime - because end users in this case are not working inside the VM in the middle of the night. So based on workload of this VM. I have the luxury of trying and having better overall success with this adjusted backup config > I changed finally the VM to backup in STOP mode. This allows the PBS Backup task to proceed - more simply - less juggle of data. We don't now fuss with an intermediate rsync to local temp dump dir. Instead the PBS Backup job with metadata in stop mode - just -- kicks off the VM STOP; once stopped moves forward - looks at prior PBS Backup job data to get a metadata hint on prior. Then chugs along happily on the new status of the VM <> and sort of does its new meta magic for 'deltas backup'. Then pushes up new changed data to PBS server. Then once finished starts up my VM, downtime total was in the ballpark of 45minutes.
  • Contrary to KVM / CBT-backup stuff where "turn off VM means you lose the deltas CBT backup cache' - the LXC metadata backup method is not hampered by a 'stop' - because it is not actually using a CBT deltas cache file like KVM does. More or less. I think. So. Doing a 'stop' simply means less drama and juggle work for the PBS job. And we get the icing on cake with a metadata - compare - prior-vs-current- to speed up the lxc backup task. more or less I think.
  • So endgame, my before vs after.
  • BEFORE = suspend mode, minimal downtime on the VM (1-ish? minute?) but the price to be paid is - 12+hours backup job time even with 'metadata'mode enabled, because my disk storage pool is sluggish, and it gets double-slow because SRC and TARGET are same thing for the (VM active copy) and (PBS TEMP BACKUP DIR). So. Modest downtime with price of a really really long backup job runtime.
  • AFTER = STOP mode, Metadata LXC PBS Backup mode enabled, and now we get full backup job done in a 45m backup window (ie, vm is STOP for 45min). The trade off is that the wall clock time for the backup job is much much shorter (under 1 hour vs 12+ hours the other way) but the flip side payment is - this is a stop mode backup. in this case.
  • so, for my situation based on the user workload / my preference for shorter vs really long backup runtime, it is a no brainer, and I like the new metadata based backup option for this situation / in stop mode / is a good config for this particular setup for me.
  • note for other people, your situation will vary, I think other factors you may want to consider are (A) if your underlying VM storage is supporting SNAPSHOT then the picture is different (B) If your VM Primary storage pool vs the PBS_TEMP_BACKUP_DUMPDIR are on different disks / filesystems then your speed of 'create a temporary intermediate rsync blob' is probably less punishing slow than my situation here (C) possibly if you are using nicer 10gig NIC it might be a good thing but I still have the feeling that is really not a big factor - main problem here I think is the combination of a - pretty large (500gig) VM filesystem / with a lot of small files / and then the VM living on a ~meh performer SATA disk raid set / and then double-penalty on the backup job runtime / due to using the slow disk for both the (Primary read) and the (intermediate write buffer tank for the backup).
  • so. lots of fun. Huzzah for team @ proxmox for getting the metadata feature into proxmox / PBS for LXC backups - definitely seems like a win
  • clearly I need to play again with running LXC Containers on snap-supporting-backed-storage to see if that will be a better baseline. I just like my boring plain easy VM storage filesystem still :-) as a baseline. Old school I guess.
  • Perfect world for me I wonder/wish that LXC VM could use QCOW2 VM disk images not just raw. Would mean I can keep using my boring filesystem for VM storage and get snap-based-backups via QCOW2 integration of snap feature. I think. But. Not happening I am guessing or at least there are probably reasons discussed elsewhere-above my pay grade about why this is not the case presently. :-)
  • greetz to all who read this far, happy days for things that work. And thank you again to Johannes S for poking me along in the right direction on this topic (!)
 
Great writeup, thanks :)

Just some remarks: As far I know the metadata Mode only works for lxc backups, not vm. Regarding snapshots: For kvm/qemu PBS always work with qemus internal snapshots. As far I know the consequence is that the snapshot support of the storage doesn't matter at all.
To be fair: I don't know the snapshot situation in regards to lxc.
 
Last edited:
Hi, thank you! :-) yes - Metadata mode is definitely only for LXC Containers. Not KVM hosts. I believe KVM will do the 'block deltas - change block tracking' automatically, so long as you do not power off your VM between backups / and do snapshot style backup method of the KVM VM. I these cases I think it is best-simplest to use a QCOW2 VM storage image format, IFF your underlying VM storage is NOT snap-capable (ie, LVM or ZFS). In my case I often like to use boring EXT4 filesystem for VM storage, hence I am happy that KVM VM with QCOW2 have 'snap support' via QCOW2. Basically. I think. (!)
 
Hi, thank you! :-) yes - Metadata mode is definitely only for LXC Containers. Not KVM hosts. I believe KVM will do the 'block deltas - change block tracking' automatically, so long as you do not power off your VM between backups / and do snapshot style backup method of the KVM VM.

Yes excatly, in KVM/qemu this is called "dirty bitmap" and works quite fine.
I these cases I think it is best-simplest to use a QCOW2 VM storage image format, IFF your underlying VM storage is NOT snap-capable (ie, LVM or ZFS).

ZFS can take snapshots, maybe you are confusing it with another file system? For LVM I agree. However as said before: For vm the snapshot support of the filesystem/storage backend doesn't matter since qemus/kvms internal snapshots (which are the one used by PVE and PBS) work even without them and Proxmox VEs backup function only uses them.

In my case I often like to use boring EXT4 filesystem for VM storage, hence I am happy that KVM VM with QCOW2 have 'snap support' via QCOW2.

Yes, exactly, the same is true for something like a NFS or CIFS network storage. However this snapshot function has another purpose than the one used for the backups.
 
Hi, thanks for clarify footnotes. I agree, I mis-typed and said the opposite of what I meant, more or less, I think , ie,

LVM and ZFS have built-in snapshot support as a "filesystem feature"
EXT3/EXT4 does not
CIFS and NFS mount filesystems do not

and there are uses of snaps other than backups
I think I am not / was not 100% clear about snapshots, when using RAW VM disk image format, if not using a filesystem which does have inherent snapshot feature

ie, scenarios:

1) RAW VM disk on ext3 or on ext4 or on CIFS/NFS mounted filesystem for a KVM VM > I think in this case maybe we have no snap feature, even with KVM VM (?)
2) QCOW2 VM disk on ext3/4/etc > we do have snap feature with KVM
3) QCOW2 VM disk cannot be used with LXC
4) RAW VM disk on ext3/4 for LXC Container > we have no snap feature. We do have 'metadata backups' now for PBS backups though.
5) RAW VM disk for LXC container / on LVM or ZFS > has underlying snap feature > allows PBS Backups to be more graceful with snapshot backup mode (I think?) compared to a EXT4 filesystem holding the raw VM disk image > cannot snap, just 'suspend' mode and 2-step rsync backup via intermediate dump dir. OR we do stop-mode backup which bypasses the need for an intermediate dumpdir copy-before-backup.

all sorts of fun.


Tim
 
  • Like
Reactions: Johannes S
Hi Everyone, I just wanted to loop back to this thread for a little more input if possible.

Context piece: I am trying to get a better understanding of how LXC Backups - duration-speed - is impacted by the 'metadata' mode for LXC Container backup to PBS Backup server. In my specific cases here, I'm using VM that are stored on primary storage which is vanilla EXT4 so this means I do not have underlying filesystem support for snapshot.

I thought? I had a pretty good idea about how this was all going as of Jan.31 - and now I am a bit confused. Hence this post-back to the thread.

So - one situation I am debugging presently. I've got a client with a ~stupid-large VM-LXC Container based. The thing has ~1000gb / 1Tb of disk space allocated to the single LXC container. The physical host is a classic OVH rental with 2x2TB Sata disk / stock OVH style proxmox / with metadata-SW-raid based disk redundancy raid1-mirror config. VM storage is a classic /var/lib/vz filesystem is ext4 which is where all VM primary storage lives.

I have a separate PBS server elsewhere in different data centre / and both hosts have approx 250mb public ISP bandwidth (ie, the proxmox node and the PBS to which backups are sent)

I had one good successful PBS backup job in early January. for the big 1Tb LXC Container VM.
I had another good successful PBS backup job, for sure using 'metadata' based method - on Jan.26
I made my happy thread update post on Jan.31
Then the next backup job ran on Feb/2 and then again on Feb/9 (ie, this thing does only a weekly backup, not nightly)
I am doing these LXC VM Backup as STOP-BACKUP-START mode since we have no underlying snapshots in primary storage.

So, the backup jobs on feb.2 and feb.9 took way - way - longer than the job on Jan.26
ie, on Jan.26 it took about 2 and a quarter hours total - it seems to have successfully identified 14gb of changed data which needed backup / on a filesystem with ~686gigs actual data / of the 1Tb total vm disk filesystem size. Based on what my discussions with my client tell me- this is all about correct, ie, the VM has a large disk but the bulk of the data is static, there is a modest-tiny-ish amount uploaded here and there, so approx 14gigs of new data in a backup is good-and-reasonable.

on the feb.2 backup job - the thing was still running after 12 hours, it was about 200gigs into pushing data over to the PBS server, and we needed the VM to be back online.

on the feb.9 backup job - we had basically the same pattern - I had gotten it to start a bit earlier in the night, but we hit the 14 hour mark approx, it was again above 200gigs of data-push on the backup job. we ran out of time, had to abort the backup.

So, I'm puzzled about - what has changed, why the different behaviour

we did succeed on Jan.26 with a backup in just over 2 hours, it seems to have worked precisely the way I hoped it would. Metadata mode appears to have made things go faster, it didn't try to backup all the data, just found the 'changed files' and sent them over to PBS

now my two more recent attempts, both have same outcome - it is acting like it is just grinding through the entire VM backup as a full disk image push - we give up after 12-14 hours with manual human intervention to kill the backup job / after it has pushed in excess of 200gig over to the PBS server

I'm curious if anyone is familiar with how metadata backup mode is working 'under the hood' could comment on "Ah yes this makes sense, because ... some clever reason".

OR should I expect this to work more smoothly if I migrate to a setup with VM storage on snap-supporting storage under the hood?

For example: Deploy a new proxmox node. Use different setup for storage.

Reading the docs (gasp, Tim reads docs? never!) > https://pve.proxmox.com/wiki/Linux_Container

I get the feeling I have a few choices
1) what I am doing now - RAW LXC disk image on EXT4 filesystem, not so great for snapshots. (ie, no snapshots)
2) ZFS subvolume - allows under-the-hood-snapshot via ZFS I believe
3) size=0 special scenario forces LXC Container to create 'not a raw image but a directory instead which holds the LXC Container files-system-stuff
and the example in the https://pve.proxmox.com/wiki/Linux_Container DOCS I think hints to me this could be a filesystem that supports snapshot. But then as I write this I think that means I would need to use either ZFS or BTRFS as the underlying filesystem. So possibly that means if I want 'something more similar to EXT4 and less similar to ZFS that means BTRFS is your only pick Tim".

Anyhow. I'm running on too long here. I guess end of the day I am wondering

- has anyone else seen cases where metadata-LXC-container backup on VM with EXT4 underlying storage holding a RAW LXC VM Disk - seems to exhibit wide range of backup speed/efficiency/behaviour for the backup job to PBS?
- any comments/thoughts/knowledge on why this is the case?
- and possibly any advice on how I can-should-maybe change my config - deploy new proxmox host, migrate this VM over there - so that I'll have better backups of this thing going forward?

If you made it this far - Thank you for reading!

Tim
 
Hi,
now my two more recent attempts, both have same outcome - it is acting like it is just grinding through the entire VM backup as a full disk image push - we give up after 12-14 hours with manual human intervention to kill the backup job / after it has pushed in excess of 200gig over to the PBS server
do you maybe backup to the same backup group, but different archives or containers? Another user reported a similar case, mentioning that the previous backup snapshot was not used as reference. In that particular case it turned out to be that there were 2 interwoven backup jobs, but each of them containing different archives. Therefore, each archive was completely backed up again, as the previous snapshot never contained a valid metadata reference archive. Can you exclude that? Can you please share a backup task log for the backup runs taking longer than expected?
 
Last edited:
Hi, thank you for the reply. Greatly appreciated. I am pretty certain I've got just one PBS archive target. One detail I neglected to mention.

The PBS target is being used by a group of proxmox nodes in a shared-nothing proxmox cluster (7 proxmox hosts in total, and a single PBS host. The PBS host has approx 8Tb of storage pool space, most of the proxmox nodes are ~400gig of VM storage and then one of the seven has the bigger ~1.6Tb approx of Sata backed VM storage pool).

I do know for sure
-- contention on single proxmox host will stall backup jobs (ie, if there are 3 x VM on prox-host-one, then they are backed up in sequence to the PBS target, not concurrently, to avoid resource contention, this is good)
-- I believe we do have multiple parallel PBS backup jobs active at once, ie,
every proxmox node 1--7 will run a single backup task at once in parallel
so the PBS target will have 7x active inbound PBS backup tasks active
in theory ~all of these backups are mostly "quite light amount of data deltas" and most of the VM being backed up are quite small (ie, 20gig VM disks which are under 50% full).

I will grab the relevant log for the long-running-backup task / and also try to get relevant comparison of the quick-running-task
in case there are obvious smoking gun hiding in plain sight which will become apparent
thank you!

will post more info in a few moments.

Tim
 
Ok, here is info from log. Good-fast at top, then bad-slow at bottom. Thank you!
(* for clarity, I have obscured the IP address of the PBS server to be shown as 1XX.1XX.2XX.171 - it is a real valid public IP address which I just prefer to not post to the forum in the full non-obscured version. but it is the same PBS server in all jobs shown here.)

Code:
JAN.26 BACKUP VM101 - SUCCESS-FAST
===============================================

INFO: starting new backup job: vzdump 101 --mode stop --fleecing 0 --quiet 1 --mailnotification always --storage backup --notes-template '{{guestname}}' --mailto systems@ADMINHINTTIMHEREFAKEADDRESS.ca
INFO: Starting Backup of VM 101 (lxc)
INFO: Backup started at 2025-01-26 01:00:05
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: webapp
INFO: including mount point rootfs ('/') in backup
INFO: stopping virtual guest
INFO: creating Proxmox Backup Server archive 'ct/101/2025-01-26T05:00:05Z'
INFO: set max number of entries in memory for file-based backups to 1048576
INFO: run: /usr/bin/proxmox-backup-client backup --crypt-mode=none pct.conf:/var/lib/vz/tmp_backup/vzdumptmp2244217_101//etc/vzdump/pct.conf root.pxar:/mnt/vzsnap0 --include-dev /mnt/vzsnap0/./ --skip-lost-and-found --exclude=/tmp/?* --exclude=/var/tmp/?* --exclude=/var/run/?*.pid --backup-type ct --backup-id 101 --backup-time 1737867605 --entries-max 1048576 --repository root@pam@1XX.1XX.2XX.171:backup
INFO: Starting backup: ct/101/2025-01-26T05:00:05Z
INFO: Client name: prox1
INFO: Starting backup protocol: Sun Jan 26 01:00:25 2025
INFO: Downloading previous manifest (Sun Feb 25 01:00:06 2024)
INFO: Upload config file '/var/lib/vz/tmp_backup/vzdumptmp2244217_101//etc/vzdump/pct.conf' to 'root@pam@1XX.1XX.2XX.171:8007:backup' as pct.conf.blob
INFO: Upload directory '/mnt/vzsnap0' to 'root@pam@1XX.1XX.2XX.171:8007:backup' as root.pxar.didx
INFO: root.pxar: had to backup 14.127 GiB of 616.82 GiB (compressed 11.407 GiB) in 7621.45 s (average 1.898 MiB/s)
INFO: root.pxar: backup was done incrementally, reused 602.693 GiB (97.7%)
INFO: Uploaded backup catalog (23.898 MiB)
INFO: Duration: 7682.55s
INFO: End Time: Sun Jan 26 03:08:28 2025
INFO: adding notes to backup
INFO: prune older backups with retention: keep-last=30, keep-monthly=12, keep-yearly=2
INFO: running 'proxmox-backup-client prune' for 'ct/101'
INFO: pruned 0 backup(s)
INFO: restarting vm
INFO: guest is online again after 7710 seconds
INFO: Finished Backup of VM 101 (02:08:30)
INFO: Backup finished at 2025-01-26 03:08:35
INFO: Backup job finished successfully
TASK OK




BAD SLOW EXAMPLE RECENT
=========================



INFO: trying to get global lock - waiting...
INFO: got global lock
INFO: starting new backup job: vzdump 101 --mailto systems@ADMINHINTTIMHEREFAKEADDRESS.ca --storage backup --quiet 1 --notes-template '{{guestname}}' --mode stop --fleecing 0 --mailnotification always --pbs-change-detection-mode metadata
INFO: Starting Backup of VM 101 (lxc)
INFO: Backup started at 2025-02-08 03:10:45
INFO: status = running
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: webapp
INFO: including mount point rootfs ('/') in backup
INFO: stopping virtual guest
INFO: creating Proxmox Backup Server archive 'ct/101/2025-02-08T07:10:45Z'
INFO: set max number of entries in memory for file-based backups to 1048576
INFO: run: /usr/bin/proxmox-backup-client backup --crypt-mode=none pct.conf:/var/lib/vz/tmp_backup/vzdumptmp2647810_101//etc/vzdump/pct.conf root.pxar:/mnt/vzsnap0 --include-dev /mnt/vzsnap0/./ --skip-lost-and-found --exclude=/tmp/?* --exclude=/var/tmp/?* --exclude=/var/run/?*.pid --backup-type ct --backup-id 101 --backup-time 1738998645 --change-detection-mode metadata --entries-max 1048576 --repository root@pam@1XX.1XX.2XX.171:backup
INFO: Starting backup: ct/101/2025-02-08T07:10:45Z
INFO: Client name: prox1
INFO: Starting backup protocol: Sat Feb  8 03:10:58 2025
INFO: Downloading previous manifest (Sun Jan 26 01:00:05 2025)
INFO: Upload config file '/var/lib/vz/tmp_backup/vzdumptmp2647810_101//etc/vzdump/pct.conf' to 'root@pam@1XX.1XX.2XX.171:8007:backup' as pct.conf.blob
INFO: Upload directory '/mnt/vzsnap0' to 'root@pam@1XX.1XX.2XX.171:8007:backup' as root.mpxar.didx
INFO: Previous manifest does not contain an archive called 'root.mpxar.didx', skipping download..
INFO: Previous manifest does not contain an archive called 'root.ppxar.didx', skipping download..
INFO: processed 144.759 MiB in 1m, uploaded 0 B
INFO: processed 158.731 MiB in 2m, uploaded 4.964 MiB
INFO: processed 274.188 MiB in 3m, uploaded 147.96 MiB
INFO: processed 301.084 MiB in 4m, uploaded 158.965 MiB
INFO: processed 468.129 MiB in 5m, uploaded 269.093 MiB
..skip-ahead....
INFO: processed 32.451 GiB in 34m, uploaded 29.192 GiB
....SKIP....
INFO: processed 43.426 GiB in 1h 3m, uploaded 39.271 GiB
INFO: processed 64.614 GiB in 1h 30m 0s, uploaded 58.068 GiB
....SKIP AHEAD...
INFO: processed 201.855 GiB in 8h 7m 1s, uploaded 175.568 GiB
INFO: processed 202.147 GiB in 8h 8m 1s, uploaded 175.875 GiB
KILL JOB HERE AND BACKUP STOP AND VM RESTART
 
Last edited:
INFO: Previous manifest does not contain an archive called 'root.mpxar.didx', skipping download..
So, as expected the previous backup snapshot does not contain the archive, therefore it cannot be used as metadata reference. Are you performing multiple backups to the same backup group?
 
I agree, I see that also. Can you please tell me more verbosely what you mean with "multiple backups to same backup group" ?
(I then also don't understand how it worked with success even once for metadata mode?)

and for more detail-clarity. My Proxmox cluster has 3x jobs defined in the "Backups", concisely those are

(1) every night we do at 1:00am, a backup of 2 particular VM (102,202)
(2) Saturday night at 100am we do backup of this big VM 101
(3) Saturday night at 130am we do backups for (all the rest of VM in the cluster, exclude 101,102,202)

all of these jobs use the same / single PBS target which has a single PBS datastore.

Possibly then it means I want?
-- dedicated Datastore for VM101 backups?
-- only one backup job in proxmox UI for my cluster?
-- something else? (which I am clearly missing I think?)
(* edit - I think big picture, I don't understand what is going on under-the-hood with root.mpxar.didx - how it is created, where it persists, how it can be available vs not available in different scenarios / related to 'backup group' it seems. I maybe foolishly assumed the 'metadata' involved in this process was stored in the PBS server / associated with the PBS_BACKUP_BLOB for the VM / and if it existed properly then a "NEXT" backup will take advantage of it. basically. which I think is maybe oversimplified perception of how this really works).

thank you!

Tim
 
Last edited:
I agree, I see that also. Can you please tell me more verbosely what you mean with "multiple backups to same backup group" ?
What I meant here was a backup job on PVE host/cluster A backing up a container ct/101 to the same namespace as a PVE host/cluster B backing up a container to the same ct/101 group.

But now that you mention multiple backup jobs: what about the previous backup snapshot, does it contain a split root.mpxar/.ppxar or a self contained root.pxar archive?

Maybe you have multiple backup jobs, but not all of them are using the change-detection-mode metadata, but some still the default mode? When you say you run the job manually, does this mean the backup job or a manual backup by clicking the Backup button for the container? Note that latter will not use the change detection mode metadata. A patch to allow selection of the change detection mode for one-shot backups has already been send to the mailing list but is still under development, see https://lore.proxmox.com/pve-devel/20241129150013.323432-1-c.ebner@proxmox.com/T/

Possibly then it means I want?
-- dedicated Datastore for VM101 backups?
-- only one backup job in proxmox UI for my cluster?
-- something else? (which I am clearly missing I think?)
Cannot conclude that just yet, let's first find out why the previous snapshot seemingly does not contain the metadata archive to be used as reference.
 
  • Like
Reactions: Johannes S
The PBS target is being used by a group of proxmox nodes in a shared-nothing proxmox cluster (7 proxmox hosts in total, and a single PBS host. The PBS host has approx 8Tb of storage pool space, most of the proxmox nodes are ~400gig of VM storage and then one of the seven has the bigger ~1.6Tb approx of Sata backed VM storage pool).
Ah sorry, I did overlook your previous message, well this is what I meant above, you will have to place at least each standalone PVE host and cluster (in the sense of a PVE cluster) into dedicated namespaces, otherwise you will run into naming conflicts as you described here, see https://pbs.proxmox.com/docs/storage.html#backup-namespaces
 
Hi, thank you for all the very detailed / clarification and infomation

- in this situation I have just the one proxmox cluster (7 proxmox nodes) talking to this one PBS host. There is no other proxmox cluster connected to this PBS host.
- I realize as I read your notes most recent - I am quite sure when I did my metadata-testing, I had only enabled it for the one backup job related to the VM101. But then yesterday when I was reviewing things - I have now enabled metadata mode on all (3) backup jobs that exist in the proxmox datacentre view.
- I am not entirely clear that metadata backup mode status (enabled vs no/defaultmode) for OTHER vm can impact this VM.
- I will now try to dig into what I see in my PBS content view for VM101 in case I can see hints about what is/not there related to the root PXAR archive / if it is a 'split' vs 'self-contained'.
- I am pretty sure the job on Jan.26 which was success, was kicked off by scheduled task. and also in same way the job which "failed" (ie, ran slowly, no metadata success) most recently - was also kicked off in identical manner, via the scheduled backup task.

I will post back more info here shortly after I review my PBS ct101 storage content.

Thank you!
Tim
 
I might be wrong but I'm quite sure that the metadata mode only works for lxc containers and maybe host backups, NOT vms:
https://pbs.proxmox.com/docs/backup-client.html#change-detection-mode
https://forum.proxmox.com/threads/pbs-client-change-detection-mode.150538/
https://pbs.proxmox.com/wiki/index.php/Roadmap#Proxmox_Backup_Server_3.3

VM backups work different than "file-based backups", changes are detected with dirty bitmaps on them. For that reason the detection mechanism for fast backups only work with running vms, but even with a stopped vm only the actual changes are sent to PBS. But the detection will take longer on the stopped vm.

As said: I might be wrong so take this with a grain of salt.
 
I might be wrong but I'm quite sure that the metadata mode only works for lxc containers and maybe host backups, NOT vms:
https://pbs.proxmox.com/docs/backup-client.html#change-detection-mode
https://forum.proxmox.com/threads/pbs-client-change-detection-mode.150538/
https://pbs.proxmox.com/wiki/index.php/Roadmap#Proxmox_Backup_Server_3.3

VM backups work different than "file-based backups", changes are detected with dirty bitmaps on them. For that reason the detection mechanism for fast backups only work with running vms, but even with a stopped vm only the actual changes are sent to PBS. But the detection will take longer on the stopped vm.

As said: I might be wrong so take this with a grain of salt.
You are fully right that the change detection mode setting has no effect on VMs, but that is not the issue here.
As you can clearly see from the task log outputs above, the backups are referring to LXC backups. So while @fortechitsolutions does not cleanly uses the correct terminology here, this thread does talk about an LXC backup, at least the last part here.
 
Last edited:
  • Like
Reactions: Johannes S