backup vm pve to pbs, how to enable incremental

In ZFS you cannot simply rollback to an arbitrary snapshot, you can clone it, but that what you suggest is not possible - at least if you are not willing to destroy all snapshots in-between. Snapshots also have to be a list, they can not be an arbitrary tree. PBS does not care about this at all and is as flexible as one can imagine.

I just have to decide whether I need the newer snapshots at all. If not, I can do a "revert", if I need then, I have to clone. So everything is possible.

You can use our QEMU Proxmox Backup Server block-backend to access any image

Thanks, this is the part I was missing.

you could also just use Proxmox VE Replication (or the more general pve-zsync), which can do this for years.

Replication is very nice and I use it. Unfortuntely it currently does not work with encrypted ZFS (unlike znapzend). If it would be possible to keep more than the latest snapshot (which can be accessed from the UI) and have a generation scheme (like znapzend) it would even be better. pve-zsync also has no generation scheme and does not tidy up old snapshots.

as if they ain't synced somewhere safe its worth nothing)

With Znapzend snapshots are a backup also, because it keeps old backups and e.g. if you have two servers, you also have a generation scheme backup at all time. I also sync the snapshots to an offsite remote server.
 
Does incremental reading of the VM only occur, if the VM was running at the last *and* the current backup? i.e. doing an initial backup with the VM turned off and powering it on afterwards and doing a backup will not use dirty bitmap? I did have that with a VM running Windows 10 20H2.
 
Does incremental reading of the VM only occur, if the VM was running at the last *and* the current backup? i.e. doing an initial backup with the VM turned off and powering it on afterwards and doing a backup will not use dirty bitmap? I did have that with a VM running Windows 10 20H2.
Yeah, the incremental dirty-bitmap only lives in memory (RAM), stopping the VM will drop it (but it will be migrated on live-migrations).
 
  • Like
Reactions: chriss
A backup solution which is not able to do fast incremental backups for unchanged and switched of VMs (the safest method), is not really a good solution. There should be a solution.
well i suppose we all have a different need then, I have no big need of incremental backups of switched off unchanged VMs by definition.
 
we want to let pbs backup job run several times a day for better RPO, as dirty bitmaps make backups very fast - but shutodwn VMs are a real blocker for this, as this would put too much load on disk/cpu/network/pbs.

i can understand the safety aspect , so always backing them up if there is no safe "change detection" is really safe method.

wouldn't we need extension of qemu to handle this safely/properly ?
sombody knows if htere has been discussion in qemu commnity (dont find some) ?

but what if the VM is backed by file, i.e. raw/qcow2 or vmdk ?

isn't mtime of that file a reliable indicator for "has not changed" and could we perhaps at least skip backups when VMs being backed by virtual disk image file (as a compromise / interim solution) ?
 
whats further coming to my mind is some semaphores in /etc/pve/nodes/<host>/qemu-server (or wherever) , where proxmox api could document each VMs start/stop operations and backup task could take a look in there and skip backup if there is no change in a VMs semaphore file(s).

why should a VMs virtual disk change without a VM being started/stopped !?

ok, maybe there are special scenarios where this happens, but such "skip offline VMs for backup"-feature could be off by default - and enabling it could print some "you have been warned, you know what you are doing" message.
 
  • Like
Reactions: UdoB
but shutodwn VMs are a real blocker for this, as this would put too much load on disk/cpu/network/pbs.
This would only stress the disk the VM is on. There is no additional traffic for VMs that don't have dirty bitmaps, it just takes longer and needs to read the entire disk.
 
>This would only stress the disk the VM is on

yes, and since VMs use disks in a shared fashion, very often with network attached storage, it would also have impact on all the other VMs residing on that disk/storage.

VM infrastructure is very sensible to IO load - and, it's not even really perfectly "robust" if there is too much load on the disks (see here for example: https://bugzilla.proxmox.com/show_bug.cgi?id=1453 ).

so, you don't want to have your disks loaded with IO, if it's avoidable.
 
Last edited:
thats true :) You don't want to have too much io wait.
Just wanted to point that part with network traffic out as a lot of people think that if the dirty bitmap is destroyed it will also upload all the chunks to the PBS server again. That is not what is happening.
 
why should a VMs virtual disk change without a VM being started/stopped !?
Snapshots, rollbacks, some other process opening the disk and doing whatever (automated provisioning or the like can do that), someone manually starting the VM, circumventing the PVE stack completely, there are lots of ways ;-)

For lots of production workload, like ours too (and we have quite some), this is a non-issue in general as all VMs are running anyway, so the dirty-bitmap is always used..

i can understand the safety aspect , so always backing them up if there is no safe "change detection" is really safe method.
As cookiefamily already mentioned: We do not backup what didn't change, so we have a change detection already, only blocks changed are sent over the wire and actually written.

wouldn't we need extension of qemu to handle this safely/properly ?
When QEMU runs we already use dirty-bitmaps which we integrated in the PBS backup.

At the time when you'd like to see optimizations no QEMU process runs for that VM, so it's does not matters..

Any how, improving this is on our long term roadmap, but "just using a semaphore in pmxcfs" won't cut it, as at that level we have no issues synchronizing anything anyway...
 
  • Like
Reactions: RolandK
For lots of production workload, like ours too (and we have quite some), this is a non-issue in general as all VMs are running anyway, so the dirty-bitmap is always used..
Really? Do you know that from other users or do you just guess based on your own use case?

I am not a subscriber (thank you for making this possible!), so I can not speak for "real" users of course, but for me:
  • at home: less than a third (!) of the configured VMs is up and running 24*7
  • more important: at work (unfortunately not Proxmox) it's approximately two third
Just my 2€¢...
 
thank you @Thomas Lamprecht for pointing out the use cases regarding disk modification, that was mostly out of my sight.

>For lots of production workload, like ours too (and we have quite some), this is a non-issue in general as all VMs are running
>anyway, so the dirty-bitmap is always used..

mh, i need to disagree here. i think that highly depends of the environemnt and that does NOT apply in general to software development companies.

i have been administering server based virtual machines for a long time (since Vmware GSX) and different plattforms/hypervisors, i can tell that this is very different where i work(ed), as usually quite a number of VMs always are (and have been) in a disabled state. i think it's been at least 1/4th of all VMs being permanently offline.

it's that typical "can we delete that or is it still neded? oh, not sure, power it off and look if anybody complains"-thingie.

when we're not really sure if a vm is really safe for deletion, we shut it down and let it exist offline for a while, then annoying the project maintainer or a responsible person again and again if it's ready for deletion or archiving. and, based on the experience with these people, i can tell, that it's not always easy to get a decision...

we typically have backup still active on such type of VMs, as it's too dangerous that somebody put it back into production/development process and forgets activating backup...

i don't know how other's handle their vm lifecycle management, but i think it's quite common practise in project driven companies.

2 out of 3 jobs i began working at had a zoo of shutdown VMs, because no admin hat time to clarify , if deletion/archiving was ok. time is precious, disk is cheap, you know...
 
thank you @Thomas Lamprecht for pointing out the use cases regarding disk modification, that was mostly out of my sight.

>For lots of production workload, like ours too (and we have quite some), this is a non-issue in general as all VMs are running
>anyway, so the dirty-bitmap is always used..

mh, i need to disagree here. i think that highly depends of the environemnt and that does NOT apply in general to software development companies.

i have been administering server based virtual machines for a long time (since Vmware GSX) and different plattforms/hypervisors, i can tell that this is very different where i work(ed), as usually quite a number of VMs always are (and have been) in a disabled state. i think it's been at least 1/4th of all VMs being permanently offline.

it's that typical "can we delete that or is it still neded? oh, not sure, power it off and look if anybody complains"-thingie.

when we're not really sure if a vm is really safe for deletion, we shut it down and let it exist offline for a while, then annoying the project maintainer or a responsible person again and again if it's ready for deletion or archiving. and, based on the experience with these people, i can tell, that it's not always easy to get a decision...

we typically have backup still active on such type of VMs, as it's too dangerous that somebody put it back into production/development process and forgets activating backup...

i don't know how other's handle their vm lifecycle management, but i think it's quite common practise in project driven companies.

2 out of 3 jobs i began working at had a zoo of shutdown VMs, because no admin hat time to clarify , if deletion/archiving was ok. time is precious, disk is cheap, you know...
Indeed it would be good to have a checkbox for online VMs in PVEs backups. However, we use pools for that, for example one production pool and one production-offline. When we decide to take a prod vm offline for a while, we move it to the offline pool. The backups do pool based meaning the online pool backups more often and since the pool has only running VMs it's fast. This keeps a good track for us what VMs are supposed to be online, and we don't forget to also backup the offline production ones.
 
Really? Do you know that from other users or do you just guess based on your own use case?
Via enterprise support we see thousands of use cases, that's where this statement is derived from. Naturally not all are uniform there, and I count >= 3/4 of the guests as permanent online as most of the workload being handled nicely by the very fast online dirty-bitmap tracking, assuming the rest doesn't need high frequent backups anyway.
at home: less than a third (!) of the configured VMs is up and running 24*7
Sure, the home use case maybe different and especially varying so wide that's hard to make a general statement there.
For example, mine home setup doesn't have all guests always up too, but as it's not too critical so for those which run only occasionally I do weekly backups, not daily/hourly. Further, I have < 8 TiB of data there so can still use SSDs (which for my personal time cost calculation is a saver over spinners, I know that not all see it this way with current prices yet) so the read-load doesn't really impact anything else going on.

i have been administering server based virtual machines for a long time (since Vmware GSX) and different plattforms/hypervisors, i can tell that this is very different where i work(ed), as usually quite a number of VMs always are (and have been) in a disabled state. i think it's been at least 1/4th of all VMs being permanently offline.

it's that typical "can we delete that or is it still neded? oh, not sure, power it off and look if anybody complains"-thingie.
Sure, I can imagine that, we have some cases of that too here, and def. see it in the wild. What I do for them is just to reduce backup frequency, either matching roughly the period it does run at, or just using weekly, or a frequency like that, for stale "just keep them in case" guests, which avoids any bigger impact completely. Using pools, like @oversite mentioned, can be a nice way to implement this, as pools can be used as backup selection.

But as said, that's just my local case here, and as already stated: We plan to look into improving this area too in the longer term, we know it is important for lots of users, so I see no point in further discussing the "convincing us" part :)
 
Last edited:
  • Like
Reactions: RolandK

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!