Full Backup of VM

RolandK · Mar 31, 2023

in zulip, you enter "sender:me" into the search box for displaying your own posts - and you're done

Neobin · Mar 31, 2023

RolandK said:
in zulip, you enter "sender:me" into the search box for displaying your own posts - and you're done

In the advanced search: [1], you can also achieve this with the "Posted by:" field. Simply put your own or someone else's username in there.

Or, if you want all your (or someone else's) posts, go to your (or someone else's) profile and click on the numbercount behind/under "Messages".
Alternatively provide only the desired username in "Posted by:" in the advanced search, without giving any search term ("Keywords:").

[1] https://forum.proxmox.com/search

khuffmanjr · Apr 1, 2023

I've started teaching training courses for an up-and-coming backup vendor and we have to have the same full vs incremental-forever backup discussion with EVERYONE. My product has the ability to select "periodic full" but we explain to the learner that it is only from the perspective of the source and the network; the backup storage only stores unique blocks, ever (dedupe can be disabled if desired). So we pitch periodic fulls in the context of "well, if it makes you feel better...". We can also replicate and archive to cloud our backups so you CAN duplicate your backup data for DR purposes.

Globally deduped, incremental forever backups are a thing now and people are going to need to get their heads around it sooner or later.

I will not be divulging the solution to which I refer; this was not a sales pitch. Thanks!

RolandK · Jan 27, 2024

bugs in changed block tracking are real, and history is even repeating (cbt is the vsphere equivalent of qemu block/dirty bitmap)

https://kb.vmware.com/s/article/95940
https://kb.vmware.com/s/article/2136854
https://kb.vmware.com/s/article/2090639

there is a reason why veeam has that feature of "force full backup by resetting cbt at regular interval"

https://helpcenter.veeam.com/docs/backup/vsphere/changed_block_tracking.html?ver=120

https://helpcenter.veeam.com/docs/backup/vsphere/images/vm_backup_job_settings_cbt.png

EvertM · Mar 22, 2024

Hi y'all,

Like several people above, I'm also trying to get used to the chunk & 'every backup is a full backup' concepts.

Let's say I have a VM in my PVE of which I have taken five backups, snapshot mode, to a PBS.
The first one is what's comparable to a full backup in the days of yore: the entire disk of the VM has been turned into chunks, which have been deduplicated where possible.
Backups 2-5 are only the chunks which contain parts of the VM disk that changed since the previous backup, again deduplicated.

What I'm wondering about is: What happens if I were to delete backups 2,3 & 4? And does it make any difference whether I delete them from the client (PVE) or from PBS?

(side-question: Not all chunks are the same size, right? How is chunk size determined?)

fabian · Mar 22, 2024

EvertM said:
Hi y'all,

Like several people above, I'm also trying to get used to the chunk & 'every backup is a full backup' concepts.

Let's say I have a VM in my PVE of which I have taken five backups, snapshot mode, to a PBS.
The first one is what's comparable to a full backup in the days of yore: the entire disk of the VM has been turned into chunks, which have been deduplicated where possible.
Backups 2-5 are only the chunks which contain parts of the VM disk that changed since the previous backup, again deduplicated.

no, backups 2-5 are also all the chunks needed to recreate the data at that point time. there are three optimizations/points of deduplication in place:
- the server will always only keep a single copy of a chunk, so if the client at backup 2 uploads chunk A which was already contained in backup 1, it will not be stored a second time (server-side deduplication)
- within a backup group, the client will download the previous snapshot's indices, and if the current backup contains a chunk already referenced by the previous snapshot, it will not even upload it, since it knows the server already has it (client-side deduplication)
- for VM backups, if the VM has been running (including live-migration) since the last snapshot, it also knows which parts of the disk haven't changed since then, and can skip generating the chunks for those parts (fast incremental mode, this skips reading the data where the client already knows that a certain chunk hasn't changed)

EvertM said:
What I'm wondering about is: What happens if I were to delete backups 2,3 & 4? And does it make any difference whether I delete them from the client (PVE) or from PBS?

if you purge the snapshots 2, 3 and 4, then you only remove the metadata of those snapshots (including the indices that tell PBS which chunks are making up the backed-up data). if a chunk is then no longer referenced by *any* snapshot, the next Garbage Collection run will remove the chunk.

EvertM said:
(side-question: Not all chunks are the same size, right? How is chunk size determined?)

for VMs (and block devices, and other "image" type backups) the chunk size is fixed (4M of input data). for containers/directory backups, the chunk size is determined by a sliding window algorithm (that tries to create re-usable boundaries, so that the next run will find similar chunks if the input stream hasn't changed too much).

EvertM · Mar 22, 2024

We sure have come a long way since storing our files on the IBM Model 350 Disk File!

Thank you for the corrections/clarifications, @fabian

Dunuin · Mar 22, 2024

fabian said:
for VMs (and block devices, and other "image" type backups) the chunk size is fixed (4M of input data). for containers/directory backups, the chunk size is determined by a sliding window algorithm (that tries to create re-usable boundaries, so that the next run will find similar chunks if the input stream hasn't changed too much).

And all chunks are compressed via zstd. So the actual resulting size on the storage might be way smaller, depending on how well that data is compressible. Not uncommon that those 4MB of data of the chunk actually only needs something like 2MB on the datastore.

fabian · Mar 25, 2024

yes, that's true. I was just talking about the "logical" chunk size (how much data does it contain/represent), not the on-disk usage, which can be a lot lower if the data is highly compressible (or a tiny bit higher if not, since there is also a header attached to each chunk

)

Search

Search

Full Backup of VM

RolandK

Renowned Member

Neobin

Distinguished Member

khuffmanjr

New Member

RolandK

Renowned Member

EvertM

Renowned Member

fabian

Proxmox Staff Member

EvertM

Renowned Member

Dunuin

Distinguished Member

fabian

Proxmox Staff Member

We value your privacy