Full Backup of VM

lago

New Member
Dec 9, 2021
9
0
1
44
Hello everyone,

My PBS make every night a incremental backup of all VM. First backups of the machines were full, next are incremental. I want to do a full backup once a week, and the rest of the time incremental.
Can anyone explain how to set up Backup in my cluster?
Thank you.

I use Mode "Snapshot" in Backup Job.
1648140053556.png
 
All those Backups are already full backups!

Backups are stored in a deduplicated way. That means that backups are split into so-called chunks. Each backup by itself is just referencing the used chunks. That means, backups that share a large part with other backups will reference the same chunks and that is how we can save on a lot of space as these chunks are only stored once, but referenced many times.

An incremental backup of a VM makes use of the chunks and additionally by tracking which parts of the disk have changed since the last backup. Parts of the disk that have not changed are skipped and the chunks from the previous backup are used.
 
Hi aaron,

thank you for the explanation. It is explained clearly and very precisely, but i need to know something else. Is it possible to specify what kind of backup it is necessary to make a PBS - full or incremental? You wrote, that is a full backup? I want to set up PBS to do a full backup on Sunday, for example, in rest of time incremental. Is this possible?
Do you know what is difference between snapshot mode, stop mode, suspended mode?

Thank you and regards
 
s it possible to specify what kind of backup it is necessary to make a PBS - full or incremental? You wrote, that is a full backup? I want to set up PBS to do a full backup on Sunday, for example, in rest of time incremental. Is this possible?

Try to stop thinking in full or incremental backups when you back up to a PBS :)
Each backup that you see on the PBS is a full backup. Period.
Each backup references the chunks it needs to restore successfully. There are no different backup types.

When creating the backups to a PBS, there are two modes which you cannot really control. If the VM is backed up the first time or after it has been stopped, all the disk will be read. Data that has not changed will not take up additional space on the PBS as there are already chunks with that data.

On each backup, a so-called "dirty bitmap" is created in the virtualization layer (Qemu) which tracks the parts of the disk that the VM writes to. So on the next backup job, we only need to read those parts and can omit all the parts that have not changed. For the not changed parts, we can reference the chunks used in the previous backup. This is the fast incremental mode. The incremental refers to how the backup is created by only reading the potentially changed parts of the disk. But the resulting backup is still a backup that is complete by itself.


I hope this explains it better. It does take some time to wrap your head around the different concepts. Especially if you come from a background without deduplication where incremental backups depend on the previous backups until the lest recent full one.

Do you know what is difference between snapshot mode, stop mode, suspended mode?
This is explained in the docs: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_backup_modes
The modes mean something different for VMs and containers! I also posted about this recentlty: https://forum.proxmox.com/threads/contradictory-backup-documentation.105918/#post-455911
 
What if some chunks get lost, will the other be restorable?
That is the question that bother me too. As far as I understand if some chunks or initial backup are corrupted for some reason, the whole backup chain is corrupted.
So you need to provide extra redundancy for your backup storage.
It would be nice to have some of this chunks to be stored twice for redundancy (storage space is pretty cheap this days) ;)
 
Last edited:
Sorry for not replying last summer. If you lose a chunk, then the backup will be broken. That is why we recommend using a data store that can handle the loss of a disk or corruption on a disk -> ZFS ;)

Ideally, you have your backups in more than just one place (3-2-1 rule). Another PBS for example with which you use the remote sync feature.

(storage space is pretty cheap this days)
Cheaper, but since HDDs are quite unsuited for the random IO you have due to the many chunks, you should opt for datacenter SSDs, which are still quite a bit more expensive. Though, they have come down quite a bit over the past year.
 
@aaron

Try to stop thinking in full or incremental backups when you back up to a PBS :)
Each backup that you see on the PBS is a full backup. Period.

from the point of view of the backup server and "what it looks like" this is true.

but please be careful insisting on a "period" here.

from the point of view of the pve server and from the point of view of a backup administrator who has data safety and integrity in mind, i think this is NOT true.

the backup (which appears like a full) on the pbs is clobbed together by an initial full backup and a number of incremental changes of unknown interval count, determined on the source via "changed block tracking" (dirty bitmap) feature and then being applied to the backup target, layer by layer.

if there is a bug in changed block tracking - you're lost.
if there is corruption in a saved block on pbs - you're lost.
if there is a bug of tracking/applying the changes - you're lost.

in veeam backup, you can force a full backup at specific intervals.

having "real" full backups from time to time is a good strategy, also 3-2-1 rule is a good strategy for backup.

that's why i don't use pbs as a single solution but always combine this with weekly vzdump full backups.


anyhow, i like the idea of having some sort of "artificial full backup" (i.e. drop dirty bitmap) in pve, i.e. to be able to force a full backup (read of every block at source without dirty bitmap) at specific intervals.

this could not only be useful for data consistency but also for making sure you won't have silent bitrot in you vm infrastructure. it's good to "patrol read" all data blocks on your storage from time to time, and if you don't have such patrol read at the storage/raid level, you could have that via backup.
 
Last edited:
anyhow, i like the idea of having some sort of "artificial full backup" (i.e. drop dirty bitmap) in pve, i.e. to be able to force a full backup (read of every block at source without dirty bitmap) at specific intervals.
You could backup to two different datastores or do a sync to localhost, as deduplication is per datastore. That way you get two copies of every chunk.

silent bitrot in you vm infrastructure
ZFS will detect and fix these. And PBS verify jobs will make sure no chunk is missing.
 
from the point of view of the pve server and from the point of view of a backup administrator who has data safety and integrity in mind, i think this is NOT true.
Good point. Given that only parts of the disk that are marked dirty are read for follow-up backups and for all the other parts, the backup references the chunks from the previous backup.

A cold boot of the VM will drop the dirty bitmap (since we don't know what might have happened to the disk image in the meantime) and therefore the full disk is read.

While restore tests should be done on a regular basis to make sure that they are good, the real life situation is often not that ideal. And I speak from my personal experience and motivation ^^.

anyhow, i like the idea of having some sort of "artificial full backup" (i.e. drop dirty bitmap) in pve, i.e. to be able to force a full backup (read of every block at source without dirty bitmap) at specific intervals.
While a cold (re)boot of a VM does that, it could be something useful to do every X backups or in some similar fashion. Feel free to open a feature request in our bugracker. I only could find feature requests to make the dirty-bitmap persistent between cold boots of a VM ;)
 
will do that, thank you !

veeam has such "force a full backup every now and then" feature and also introduced the term" synthetic full backup", which imho comes somewhat closer to what pbs "virtual full backups via incremental forever" really are.

can we perhaps introduce some official wording/term for what pbs full backups really are ?

virtual full backups?
dbb-(dirty bitmap based)-virtual-backup?
i think it would be even good/helpful to differenciate/mark snapshots being created bitmap-based or being a real full backup, as it can help to recognize where dirty bitmap backup gone wrong or where virtual machines silently cold boot, for example because of oom-killer,i.e. I guess it could be valuable to mark a pbs snapshot as being created "full" or "incremental via dirty bitmap" in pbs gui. will add that ide to the RFE

it could empower the backup admin to recognize inefficiencies in backup or problems in infrastructure, without even need to analyze that on the client/pve side.
 
While a cold (re)boot of a VM does that, it could be something useful to do every X backups or in some similar fashion. Feel free to open a feature request in our bugracker. I only could find feature requests to make the dirty-bitmap persistent between cold boots of a VM ;)
I do snapshot-mode backups from monday to saturday and stop-mode backups on sunday. So once a week it will drop the dirty-bitmap and hashes every chunk again. But yes, such an option sounds useful, especially in case you can't tolerate the short downtime of the stop-mode backup and you want a no-downtime snapshot-mode backup just without the dirty-bitmapping.
 
Last edited:
  • Like
Reactions: RolandK
if there is a bug in changed block tracking - you're lost.
true, but then also live migration and moving disks are broken, so that would be a rather dramatic qemu bug.
if there is corruption in a saved block on pbs - you're lost.
yes and no.

if you verify your backups (you probably should, at least from time to time ;)), the snapshot referencing the bad chunk will be marked as "failed verification", and the bitmap will be cleared.

there are two possible cases that follow:
- the current data still produces that chunk, the corruption will be corrected by the next backup
- the current data doesn't produce that chunk anymore (source data has changed), the corruption cannot be corrected, but is also no longer relevant for this or any future backup snapshots

as you can see, you are not lost in this case, only snapshots actually referencing the corrupt chunk in the absence of verification between snapshot attempts are affected.

if you don't verify PBS doesn't know about the corrupt chunk (yet). no matter whether the bitmap is used or not, the new snapshot will also be corrupt if the chunk that got corrupted is still part of the active set:
- with bitmaps, the data is not even read if it wasn't changed, so the corruption is propagated
- without a bitmap (e.g., because it's a stop mode backup) the client would still get the same digest for the chunk, and will not upload it, but only "register" it with the server, since it already exists

in this case you are lost, because neither the client or the server know about the corruption, and thus cannot handle it either.

even if you made a (from the client's point of view) non-incremental backup (e.g., to a different namespace or otherwise into a new group, so that no previous snapshot exists), the server might discard the uploaded chunk if the existing, corrupt chunk has the same size (e.g., the corruption is a bit flip). only if the corruption is a truncation or otherwise causes the size to no longer match will the uploaded chunk overwrite the existing one, and the corruption will be corrected.

so, to avoid all these issues we'd need to implement
- a way to force a completely full backup including overwriting existing chunks on the server side
- the end result is exactly equivalent to verifying, then doing a non-bitmap incremental backup, except that you have a lot more network traffic (PVE->PBS) and write ops (PBS)

so IMHO, the only thing that might make sense is to somehow have a check box that says "clear bitmap" (note, that you can already kinda have that since the bitmaps are manageable over QMP, so you could write a hookscript that clears the bitmap(s) based on some criteria of your choice, e.g., if the current day of the week is Sunday ;)). this checkbox has the single purpose of allowing to downgrade a "fast incremental" backup to a "regular incremental" backup in case you don't (want to) trust the changed block tracking.

if there is a bug of tracking/applying the changes - you're lost.

this is the part where @aaron said "all backups are full backups", which might be better phrased as "all snapshots are equal". there is no tracking (other than qemu's bitmap) or applying of changes, since there are no "changes" on the PBS side. the only difference between the first "full" and subsequent incremental backups is the following happening on the client side:

Code:
// client_thinks_chunk_is_on_server is either true because the bitmap says it hasn't changed
// or because we read, chunked and hashed the data and got a chunk digest that is already referenced by the last snapshot of the same group
if (client_thinks_chunk_is_on_server) {
    register_chunk_with_server();
} else {
    upload_chunk_to_server(); // this includes the same chunk registration on the server side
}

there is literally no difference in the resulting snapshot metadata or chunks in any fashion, PBS doesn't do "differential" backups like other backup solutions (where incremental snapshots need to be applied, possibly in a chain, to get the actual backup data). the "chaining" (or rather, possible propagation of a corrupt chunk) happens entirely transparently because of the underlying, deduplicated chunk store and the way incremental backups work. the only solution to this is verification, and the result of the verification influencing subsequent backups - which PBS already does.
 
wow, thank you for this explanation. that's why i LOVE this product and this company !!!
Jup and great that you guys spend so much time writing long detailed answers. Always nice to get some in-depth looks how things work in detail.
Everyone can easily have a quick look at the source code but that alone won't help much, as you are missing the bigger view unless you spend a lot of work and time trying to understand everything. So I really like those summaries of you developers to better understand how stuff is done under the hood, that the more general documentation won't tell us.
 
  • Like
Reactions: RolandK and aaron
it also saves me from writing the same explanation again and again (but shorter / less complete, since obviously writing half a book for every question is not possible :) ) if I can link back to one longer answer ;)
 
  • Like
Reactions: RolandK
it also saves me from writing the same explanation again and again (but shorter / less complete, since obviously writing half a book for every question is not possible :) ) if I can link back to one longer answer ;)
I know that too well. But my problem always is to find my posts again to link them, so I give up and write it again.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!