Sync based on prune settings

Qlii256

New Member
May 29, 2023
8
1
3
I have a secondary PBS on an offsite location. It has less disk space then currently used on the main PBS. However, I only want to keep the last 3 months on there. I think this should reduce the size of the overall backup to make it fit.

When a sync job executes, it looks at the last available backup for the current vm it's syncing and then start syncing the next in line even if this backup should not be there based on the prune settings. Therefor, it keeps just syncing one backup (the next one) every day (I garbage collect and prune once a day) of the same vm. It never gets to any other vm or other backup because the disk is full.

Is it not possible to have it sync only the backups based on the prune settings. Or do I need to do a hourly sync, prune, gc repeat? Garbage collect has to wait 24h 5m before it even removed data.
 
Hi,
When a sync job executes, it looks at the last available backup for the current vm it's syncing and then start syncing the next in line even if this backup should not be there based on the prune settings.
Which prune settings are you referring to here? The one on the sync source or the sync target? Why not simply prune more aggressively on the target if you are already rather low on storage?
Sync jobs will only ever consider backup snapshots newer than the last one present on the target for the sync. Also, with the transfer last setting, you can decide how many of the available snapshots will be synced. Further, with the remove vanished flag you can also instruct the sync job to prune snapshots vanished on the source side from the target during sync.

Is it not possible to have it sync only the backups based on the prune settings. Or do I need to do a hourly sync, prune, gc repeat? Garbage collect has to wait 24h 5m before it even removed data.
That would probably get very confusing, as the retention depends on what snapshots are already there to begin with. So e.g. if you have a prune job with different retention settings than what would be set in the sync job, the sync job will see a different set of snapshots before and after the prune job was executed, leading to a different set of snapshots which would be synced.

What could work if I did understand your request correctly is:
  1. Create a local sync job which syncs the available snapshots from your current source into a dedicated namespace
  2. Create a more aggressive prune job to be run after the local sync job setup in step one, to limit the available snapshots
  3. Setup a remote sync job with the namespace containing the aggressively pruned snapshots as source, so only these are ever considered for being synced.
Since de-duplication is performed on datastore level, the sync job setup in step 1 will also not take up additional space on your source datastore (except for the index files and metadata files which are rather small, but not for chunks).
 
Hi,

Which prune settings are you referring to here? The one on the sync source or the sync target? Why not simply prune more aggressively on the target if you are already rather low on storage?
Sync jobs will only ever consider backup snapshots newer than the last one present on the target for the sync. Also, with the transfer last setting, you can decide how many of the available snapshots will be synced. Further, with the remove vanished flag you can also instruct the sync job to prune snapshots vanished on the source side from the target during sync.


That would probably get very confusing, as the retention depends on what snapshots are already there to begin with. So e.g. if you have a prune job with different retention settings than what would be set in the sync job, the sync job will see a different set of snapshots before and after the prune job was executed, leading to a different set of snapshots which would be synced.

What could work if I did understand your request correctly is:
  1. Create a local sync job which syncs the available snapshots from your current source into a dedicated namespace
  2. Create a more aggressive prune job to be run after the local sync job setup in step one, to limit the available snapshots
  3. Setup a remote sync job with the namespace containing the aggressively pruned snapshots as source, so only these are ever considered for being synced.
Since de-duplication is performed on datastore level, the sync job setup in step 1 will also not take up additional space on your source datastore (except for the index files and metadata files which are rather small, but not for chunks).
When I create a namespace on the datastore, I cannot create a sync job that will pull the contents of the datastore to the namespace on said datastore. The source datastore option is blank.
 
When I create a namespace on the datastore, I cannot create a sync job that will pull the contents of the datastore to the namespace on said datastore. The source datastore option is blank.
Ah, yeah you are right. Local syncs to the same datastore are a bit dangerous because of possible recursion. You can however setup a remote called e.g. local using localhost as remote address. Then setup a remote pull sync job using the new sub-namespace as target, using the local remote as source.

ATTENTION: Please make sure to set the max depth to 0, otherwise you will sync recursively...

In general, this was more of a workaround for the time being, not a suggestion to run such a sync setup in production. I would recommend to adapt/expand the available storage on the sync target to have enough headroom for your backup snapshots.
 
Ah, yeah you are right. Local syncs to the same datastore are a bit dangerous because of possible recursion. You can however setup a remote called e.g. local using localhost as remote address. Then setup a remote pull sync job using the new sub-namespace as target, using the local remote as source.

ATTENTION: Please make sure to set the max depth to 0, otherwise you will sync recursively...

In general, this was more of a workaround for the time being, not a suggestion to run such a sync setup in production. I would recommend to adapt/expand the available storage on the sync target to have enough headroom for your backup snapshots.
Thank you for the response. I think you are right and I'll be upgrading my storage.
One additional question. I've been having some problemns trying to create an join two nodes into a cluster. However, I'm wondering if both systems are on separate locations, would I benefit for having them both in a cluster? I know CoroSync does not go well with "high" latency.

I don't plan on being able to migrate vm's from one node to another because I don't have shared storage, and on different locations that would be hard to keep them in sync.
 
One additional question. I've been having some problemns trying to create an join two nodes into a cluster. However, I'm wondering if both systems are on separate locations, would I benefit for having them both in a cluster? I know CoroSync does not go well with "high" latency.
No, running a cluster on a high latency network will not work, this will cause you issues. Further, a cluster requires at least 3 nodes or 2 nodes + qdevice to work reliably.

You might want to have a look a the new Proxmox Datacenter Manager for managing multiple Proxmox VE hosts.