Contradictory backup documentation

Cronax

New Member
Mar 3, 2022
2
0
1
29
Hey community,
having a look at the documentation about backup modes, I stumbled across the following quote:
After the backup is started, the VM goes to full operation mode if it was previously running. Consistency is guaranteed by using the live backup feature.

Having a look at the documentation about live backup feature (two paragraphs underneath) it says:
Also please note that since the backups are done via a background Qemu process, a stopped VM will appear as running for a short amount of time while the VM disks are being read by Qemu. However the VM itself is not booted, only its disk(s) are read.

For me, those two statements are contradictory. The first quote states that the VM will boot into the OS (that's what I experience in real use). Due to the fact that it behaves contradictory to the statement about live backup feature, I am very confused. Why is the VM being started after starting the backup job, and how is the consistency guaranteed then? Backing it up while the VM is running seems to be a bad practice to me. Is there any way to force Proxmox to leave the VM shutdown while backing up if it was running when starting the backup job?
 
As far as I see when backuping a shutdown VM it won't boot that VM. But it will start it for a very short time so it can be backed up (which for example makes backups not possible if a passed through device isn'T available so the VM can't be started). I've never seen any guest logs that the VM was actually booted for a backup.
 
As far as I see when backuping a shutdown VM it won't boot that VM. But it will start it for a very short time so it can be backed up (which for example makes backups not possible if a passed through device isn'T available so the VM can't be started). I've never seen any guest logs that the VM was actually booted for a backup.
Hi Dunuin,
that's what I encounter too. But I am curious about the behavior when the VM was running when the backup job was started. In this case, the VM will be shutdown ⇾ Backing up starts ⇾ VM starts AND boots into OS. How is data consistency guaranteed, then?
Let's imagine the following scenario:
1.) The backup of the VM needs some time due to a slow network connection, let's say 30 mins.
2.) While the backup is running, the OS started and users are working on it
3.) User A changes a big file at time point A
4.) At time point B the backup job is backing up the bytes of the big file the user A was changing at time point A.

This backup will then include all the original bytes of the system at backup start time except the big file, due to the fact that the backup required some time to get to the big file to back it up. In this time, user A changed the big file because the system booted up again. There is no guarantee that all backed up bytes are from the same time point, which is crucial in many cases.
Backing up a VM should behave the same as backing up a LXC container, at least when specifying "stop" mode. If a user sees "stop" mode, she/he assumes that the VM will stop until the backup is finished.
 
Last edited:
A few words on how the backup of VMs works. The backups (for VMs) are always done by hooking into the Qemu layer to read the disks of the VM.

If the VM is running, you basically have two options (stop, snapshot) to make sure any data that the VM still has in the cache is written down to disk prior to the backup. In stop mode, the VM is shutdown, and therefore any data in cache will be written to disk. After the shutdown, it is started again right after. In snapshot mode you ideally have the guest agent installed and enabled. If so, then you will see some `fsfreeze-freeze` and `fsfreeze-thaw` lines in the backup log. This means, the guest agent tells the guest OS to flush any outstanding disk operations.

Now, whichever way we made sure that the data on disk is consistent, the disks are being read and stored in the backup. Since the VM will want to write some data while the backup is created, PVE catches those write operations, checks if that part of the disk has already been backed up, and if it isn't yet, will delay the write operation and will back up that part of the disk out of order. Once it is backed up, the write operation is allowed to continue.
This is a rough overview of how backups work and how we can make sure that we have a backup of the disk which is consistent to the disk as it was when the backup started.

Now, since we always need Qemu to access the VMs disk, we come to the second part. A VM that is powered off, will be started to the point where we have Qemu up and running and all the disks attached. But the execution of the VM is halted, so it does not actually boot. But, it could still be shown as running during the backup, as we do have a running virtualization process.

I hope this explains it well enough :)

Backing up a VM should behave the same as backing up a LXC container, at least when specifying "stop" mode. If a user sees "stop" mode, she/he assumes that the VM will stop until the backup is finished.
From the documentation about the VM stop mode:
After the backup is started, the VM goes to full operation mode if it was previously running. Consistency is guaranteed by using the live backup feature.
CTs and VMs are very different technologies and therefore offer different possibilities when it comes to creating consistent backups.
 
If the VM is running, you basically have two options (stop, snapshot) to make sure any data that the VM still has in the cache is written down to disk prior to the backup. In stop mode, the VM is shutdown, and therefore any data in cache will be written to disk. After the shutdown, it is started again right after. In snapshot mode you ideally have the guest agent installed and enabled. If so, then you will see some `fsfreeze-freeze` and `fsfreeze-thaw` lines in the backup log. This means, the guest agent tells the guest OS to flush any outstanding disk operations.

Now, whichever way we made sure that the data on disk is consistent, the disks are being read and stored in the backup. Since the VM will want to write some data while the backup is created, PVE catches those write operations, checks if that part of the disk has already been backed up, and if it isn't yet, will delay the write operation and will back up that part of the disk out of order. Once it is backed up, the write operation is allowed to continue.
This is a rough overview of how backups work and how we can make sure that we have a backup of the disk which is consistent to the disk as it was when the backup started.
Hmm… about that delaying of write operations to parts of the disks that have not been backed up: How is consistency guaranteed?

Consider that the VM guests write a file after the backup started. The filesystem creates an inode and writes the data. In order to do that it accesses blocks at different locations on the disks. What happens if the filesystem access locations that have already been backed up and locations that have not yet backed up. For those not yet backup up, it delays the write. What does it do for those it already has backed up? Rewrite that part of the backup?

Also how does delaying the write affect the running VM? Does for example the Linux kernel see that a write operation is stalled? Or is does Qemu tell Linux "write is done, go ahead"? In the later case how would Qemu guarantee the consistency of an fsync operation inside the disk in case of a power outage or similar? I did a short research on the Internet but I did not find anything conclusive regarding these aspects of KVM live backup.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!