Prune and Retention policy

snagles

Active Member
Aug 22, 2019
4
0
41
50
Hi,
I am trying to configure correctly the retention policy of my PBS. I want to have the following config:
- to keep the last one
- to have 72h per some of the LXCs which is backed up on hourly basis
- to have 7 days backup of all LXC's even this which is backed up on an hourly basis. The others were taken on a daily basis
When implementing this configuration it was normal after 72h to keep the last archive for the day. For example, after 72h must have only one backup for the next 7 days. On day 14 must stay only the last one which is marked as keep last 1. The last one should be removed manually.
Actually, the prune-simulator shows exactly the same.
The reality is different based on what I see.
All archives which are on a daily basis was marked as hourly archives. This means that the first clean should happen after 72 + 7 archives. This means that the data will be kept for 2.5 months.
Another option that I cannot understand is whether it is possible to configure purging per container or VM. Do I need to configure the clean options from PVE or everything must happen on the PBS side?

Here is an example from the log
Code:
2023-06-15T07:18:21+02:00: retention options: --max-depth 0 --keep-last 1 --keep-hourly 72 --keep-daily 7
2023-06-15T07:18:21+02:00: Pruning group :"vm/74109"
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-05-27T02:00:01Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-05-28T02:00:04Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-05-29T02:00:02Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-05-30T02:01:37Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-05-31T02:00:33Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-06-01T02:00:28Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-06-02T02:00:59Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-06-03T02:53:59Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-06-04T02:51:41Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-06-05T02:52:32Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-06-06T02:51:24Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-06-07T02:53:09Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-06-08T02:48:42Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-06-09T02:49:15Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-06-10T02:52:29Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-06-11T02:50:05Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-06-12T02:52:15Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-06-13T02:49:20Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-06-14T02:50:27Z
2023-06-15T07:18:21+02:00: would keep vm/74109/2023-06-15T02:50:40Z
2023-06-15T07:18:21+02:00: TASK OK

These log is generated from Prun ALL simulation from the PBS datastore section.

The pictures attached show Prun per container what will happen if 72h is present and with no hourly retention value.
Please for some advice on how to make this config useful in my case.
 

Attachments

  • image_2023-06-15_08-43-29-466.jpg
    image_2023-06-15_08-43-29-466.jpg
    218.5 KB · Views: 65
  • image_2023-06-15_08-43-29-539.jpg
    image_2023-06-15_08-43-29-539.jpg
    226.2 KB · Views: 59
Hi,
yes, you need different prune schedules to be able to implement your solution. This can be done by using different datastores or configuring the retention on the Proxmox VE side as part of the backup job. See also the prune simulator for trying it out and for a description of the algorithm: https://pbs.proxmox.com/docs/prune-simulator/index.html

EDIT: as @VictorSTS suggested below, it's actually better to use the same datastore but different namespaces, because of deduplication.
 
Last edited:
Hey Fiona,

Yes, I already used this simulator but the simulator is valid only for a single LXC. Actually, I have to separate all types per different datastore. This is hard work and serious change.
Am I able to move already taken backup to a different datastore without re-becup everything from scratch
Thanks for your answer of course :)
 
A side note: "keep last" does not mean "keep the oldest backup". What "keep last" does is "keep this many of the last taken backups", that is, the newest ones. This is useful to keep some amount of backups for some time regardless of when they were taken.

Also, if you don't add new backups (i.e. removed the LXC), current backups won't be purged unless they really should. That is, a keep 10 daily means "keep the last backup taken each day for the last 10 days we have backups for" and not "keep the last backup taken each day of the last 10 days". I mention this because I don't get what you mean with "The last one should be removed manually."

In your case I would just:
- Create one PBS datastore with two namespaces: hourly, daily
- Add both as PBS storages to Proxmox.
- Setup two backup tasks: hourly saving backups to the hourly PBS storage and daily saving to the daily PBS storage. Include all LXC into the daily backup task but place just that few LXC that need hourly backups in the hourly backup task.
- In PBS, configure Prune like:
· hourly namespace: keep 72 hourly
· daily namespace: keep last 1, keep 14 daily

LXCs with hourly backups will have a "duplicated" daily + hourly backup for 3 days, but as data is deduplicated within all namespaces of the same datastore it should not be an issue from a backup storage usage perspective.
 
  • Like
Reactions: fiona
Hi Victor,

Thanks for your explanation and suggestion. I will implement precisely the same as you advised me.
Meaning or my understanding of "The last one should be removed manually" refers to "keep last". If you have 10 daily backups and remove the LXC after 10 days, the prune will purge all related backups of that LXC. I thought if there is a rule "keep last" the very last taken backup will be preserved even after the retention period and you should validate and remove it manually as a physical backup from PBS.

Still asking if am I able to transfer backup from one to another data storage, or need to start backup after creating a data store from the beginning.
One more question related to deduplication. If I have 2 data stores hen take 1 daily and 1 hourly later, the consumed physical space will be - the amount of LXC taken daily + only differences taken in an hourly backup. For example, if the container is 100 MB when taking daily the consumed space will be 100MB in the daily datastore and after 1 hour if I have some change of 5 MB the hourly backup will take only this difference of 5 MB and put it into the hourly datastore.
When restoring the hourly taken backup PBS will collect all data from hourly + daily and will restore the LXC with full data collected from both datastores.
 
No, if you remove an LXC, purge will *not* remove any backup. Also, "last" does not work the way you think. I explained that before:
A side note: "keep last" does not mean "keep the oldest backup". What "keep last" does is "keep this many of the last taken backups", that is, the newest ones. This is useful to keep some amount of backups for some time regardless of when they were taken.

Also, if you don't add new backups (i.e. removed the LXC), current backups won't be purged unless they really should. That is, a keep 10 daily means "keep the last backup taken each day for the last 10 days we have backups for" and not "keep the last backup taken each day of the last 10 days".


Regarding deduplication, PBS does *not* deduplicate backups in different datastores. It does, however, deduplicate backups on different namespaces of the same datastore. This is what I suggested before:

- Create one PBS datastore with two namespaces: hourly, daily


Regarding backup transfers, fiona already gave you instructions on using local sync jobs for that:
There is currently no local pull support, but you should be able to add the local PBS itself as a remote and create a sync job between your local datastores: https://pbs.proxmox.com/docs/managing-remotes.html


Regarding the restores: all backups in PBS are "full" in the sense that they hold all data you backed up at that time regardless of how much of it is deduplicated in the datastore. So yes, if you restore any hourly backup it will restore all data as it was at that time.

I suggest you to install a test PBS and play around with a few small LXC and practice all these concepts we've exposed here.
 
  • Like
Reactions: snagles
Hi all,
Finally, all jobs passed and everything looks as described above.
Big thanks for your support.
 
Hi. I know this thread is slightly old but thought I would rather ask here as it's related rather than open a new one.

I am trying to reorganize my prune policy, but need to understand the sync and particularly the namespaces before.

As on a local server we have a few testing CT I am considering creating a namespace system that allows me to retain only a few copies of "everything" at top level namespace and then increasing them on some sub-namespaces. All this, ideally without having to have different storages.

So the idea is to create storages in PVE and prune policies like:
  • Storage "generic" dest. namespace "local" ---> only a weekly copy of "all" CT is made to this top level namespace. The prune policy will only retain, say, the last 3, so unimportant containers would only have those.
  • Storage "generic-priority" same destination with namespace "local/priority" in this case ---> daily (or hourly) copies are made to this storage. The prune policy will maintain, hourly, weekly and monthly copies.
  • Storage "generic-archive" again same destination but with namespace "local/archive" ---> only manual copies will be made here. I could not define a policy (generic would apply keeping 3) or create policy to keep only 1.
Is this correct? Would as I understand it, sub-namespace policies "override" the generic one for either additional or fewer copies retained? Or opposite, the top level one would override the sub-ones with additional copies?

Also regarding syncs between servers, can namespaces be used too to optimize this? I mean I notice I can "filter" syncs by type or CT ids, but I understand not by namespaces and as I (try) to understand it this means that if I sync one server with "more" copies to another and then have a more restrictive policy for purging on the destination, it seems like I am copying data to then prune it and remove it in some cases.

Thanks.