[SOLVED] Snapshot backup mode, inconsistency risk doubts

H25E

Member
Nov 5, 2020
68
4
13
33
Hello,

I'm reading the proxmox documentation chapter 16, about backups and the VM backup - snapshot mode says:
This mode provides the lowest operation downtime, at the cost of a small inconsistency risk. It works by performing a Proxmox VE live backup, in which data blocks are copied while the VM is running. If the guest agent is enabled (agent: 1) and running, it calls guest-fsfreeze-freeze and guest-fsfreeze-thaw to improve consistency.
It's talking about a inconsistency risk that I don't know if it's properly explained. At least for me it has left a lot of doubts after reading it (probably because I'm a noob):
  1. What is a SMALL risk of inconsistency? 1 of 10? 1 of 10 thousand? 1 of 10 millions?
  2. What about when qemu agent it's enabled?
  3. Those data inconsistency can corrupt files and/or databases leading to total or partial data loss?
  4. If I'm unlucky and my backup has inconsistencies and later I need it, there will be something that says me "Hey, this backup is faulty, try with another one" or it can fail silently?
  5. Snapshot backup for containers says nothing about inconsistency risk. Are containers free of this risk?
  6. It's snapshot mode backup unrecommeded for VM running databases?
  7. Does snapshot mode backup on VM with databases eliminates the need of specific DB backup tools like mysqldump or pgdump? What about backup with stop mode?
In the proxmox host I'm using ZFS, does this affect to the inconsistency risk in any way?

Thanks for your time.
 
Last edited:
What is a SMALL risk of inconsistency? 1 of 10? 1 of 10 thousand? 1 of 10 millions?
depends on your applications inside. if they are written to handle sudden power loss (what a snapshot backup without a guest agent is like) then it's fine. otherwise it will snapshot exactly what is on disk, despite the guest kernel having in-flight writes to it for example

  1. What about when qemu agent it's enabled?
then guest-fsfreeze-freeze will be called. what exactly happens depends on the guest os, e.g. on windows it triggers a VSS snapshot, on linux afaik a filesystem freeze (and potentially a sync?)

  1. Those data inconsistency can corrupt files and/or databases leading to total or partial data loss?
again depends on the applcation inside how they handle that

If I'm unlucky and my backup has inconsistencies and later I need it, there will be something that says me "Hey, this backup is faulty, try with another one" or it can fail silently?
those 'inconsistencies' are invisible to the host (it just backups the virtual disk)

  1. Snapshot backup for containers says nothing about inconsistency risk. Are containers free of this risk?
there we can call 'fsfreeze' ourselves (because of the shared kernel) so yes, that should not happen there

  1. It's snapshot mode backup unrecommeded for VM running databases?
  2. Does snapshot mode backup on VM with databases eliminates the need of specific DB backup tools like mysqldump or pgdump? What about backup with stop mode?

again depends if the guest-agent is active and on how the database handles such scenarios


In the proxmox host I'm using ZFS, does this affect to the inconsistency risk in any way?
no, qemu snapshot backups function independently of the storage layer (iow. it does not use zfs snapshots)
 
  • Like
Reactions: H25E
Thank you so much for the fast, nice & detailed answer point by point.

So, in a linux VM with qemu agent running, fsfreeze is going to be called like in the container case. Does this make snapshot backups of Linux VM with qemu agent running as safe as containers snapshot backups?

EDIT: Also, as the VM disks are stored in a ZFS fs, and ZFS is COW, wouldn't imply that the qemu image data is, in the end, safer in a sudden power loss situation?
 
Last edited:
Does this make snapshot backups of Linux VM with qemu agent running as safe as containers snapshot backups?
it should, yes

EDIT: Also, as the VM disks are stored in a ZFS fs, and ZFS is COW, wouldn't imply that the qemu image data is, in the end, safer in a sudden power loss situation?
that would be the power loss situation of the *host*, for the vm disk it does not make a difference

however, regardless which backup solution is used (this includes non proxmox tools also), you should always do restore tests regularly, since there can always be something wrong. for example it can be wrongly configured and bugs are also always possible
 
  • Like
Reactions: H25E
that would be the power loss situation of the *host*, for the vm disk it does not make a difference

Sorry, I haven't understood this bit. The virtualized disk rests in top of a ZFS fs, so in order to update a file in the virtualized disk it ultimately has to be updated in the physical disk by ZFS, and this update it's COW.

From my understanding, if the host FS is COW, then all the writes in the guests virualized disks are COW. Because the real writes can only be COW.

So, like you know better than me, where I'm wrong here in my assumptions?
 
So, like you know better than me, where I'm wrong here in my assumptions?
your assumptions are all correct, but hey don't have anything to do with snapshot backups

what i wrote about 'power loss' was that if you boot a vm from a backup, the state of the disk is like somebody power off the vm (regardless of host storage)

maybe it's better explained here what the backup exactly does: https://git.proxmox.com/?p=pve-qemu...10fc76baaa19716aeb06259c2bcd196e4949c;hb=HEAD
 
  • Like
Reactions: H25E
Got it now. Thank you for the conversation. Very instructive.

As a last point: In the case of a snapshot backup of a Windows guest with quemu agent running and calling for VSS snapshot, we can't say it's a inconsistency risk free case like the linux guest one?
 
It always depends how the services are programmed. Like already said, restoring a snapshot mode backup is like booting a VM after an power outage. So the OS and all services will crash while operating, loosing all data they worked on and kept in RAM.
So the services need to be programmed in a way that they can recover from such a hard reset without loosing too much data.