if there is a bug in changed block tracking - you're lost.
true, but then also live migration and moving disks are broken, so that would be a rather dramatic qemu bug.
if there is corruption in a saved block on pbs - you're lost.
yes and no.
if you verify your backups (you probably should, at least from time to time
), the snapshot referencing the bad chunk will be marked as "failed verification", and the bitmap will be cleared.
there are two possible cases that follow:
- the current data still produces that chunk, the corruption will be corrected by the next backup
- the current data doesn't produce that chunk anymore (source data has changed), the corruption cannot be corrected, but is also no longer relevant for this or any future backup snapshots
as you can see, you are not lost in this case, only snapshots actually referencing the corrupt chunk in the absence of verification between snapshot attempts are affected.
if you don't verify PBS doesn't know about the corrupt chunk (yet). no matter whether the bitmap is used or not, the new snapshot will also be corrupt if the chunk that got corrupted is still part of the active set:
- with bitmaps, the data is not even read if it wasn't changed, so the corruption is propagated
- without a bitmap (e.g., because it's a stop mode backup) the client would still get the same digest for the chunk, and will not upload it, but only "register" it with the server, since it already exists
in this case you are lost, because neither the client or the server know about the corruption, and thus cannot handle it either.
even if you made a (from the client's point of view) non-incremental backup (e.g., to a different namespace or otherwise into a new group, so that no previous snapshot exists), the server might discard the uploaded chunk if the existing, corrupt chunk has the same size (e.g., the corruption is a bit flip). only if the corruption is a truncation or otherwise causes the size to no longer match will the uploaded chunk overwrite the existing one, and the corruption will be corrected.
so, to avoid all these issues we'd need to implement
- a way to force a completely full backup including overwriting existing chunks on the server side
- the end result is exactly equivalent to verifying, then doing a non-bitmap incremental backup, except that you have a lot more network traffic (PVE->PBS) and write ops (PBS)
so IMHO, the only thing that might make sense is to somehow have a check box that says "clear bitmap" (note, that you can already kinda have that since the bitmaps are manageable over QMP, so you could write a hookscript that clears the bitmap(s) based on some criteria of your choice, e.g., if the current day of the week is Sunday
). this checkbox has the single purpose of allowing to downgrade a "fast incremental" backup to a "regular incremental" backup in case you don't (want to) trust the changed block tracking.
if there is a bug of tracking/applying the changes - you're lost.
this is the part where
@aaron said "all backups are full backups", which might be better phrased as "all snapshots are equal". there is no tracking (other than qemu's bitmap) or applying of changes, since there are no "changes" on the PBS side. the only difference between the first "full" and subsequent incremental backups is the following happening on the client side:
Code:
// client_thinks_chunk_is_on_server is either true because the bitmap says it hasn't changed
// or because we read, chunked and hashed the data and got a chunk digest that is already referenced by the last snapshot of the same group
if (client_thinks_chunk_is_on_server) {
register_chunk_with_server();
} else {
upload_chunk_to_server(); // this includes the same chunk registration on the server side
}
there is literally no difference in the resulting snapshot metadata or chunks in any fashion, PBS doesn't do "differential" backups like other backup solutions (where incremental snapshots need to be applied, possibly in a chain, to get the actual backup data). the "chaining" (or rather, possible propagation of a corrupt chunk) happens entirely transparently because of the underlying, deduplicated chunk store and the way incremental backups work. the only solution to this is verification, and the result of the verification influencing subsequent backups - which PBS already does.