Considerations regarding snapshot backup consistency

philten

New Member
Feb 5, 2009
26
0
1
www.philten.com
I recently discovered Promox VE and made many tests with it. I really like it, and very probably will go in production with it (Windows and Linux VMs).

I am now testing VM backups and disaster recovery scenarios. I tried all three backup modes.

I would run VMs with web servers therefore I am concerned by downtimes.
Of course --snapshot (lvm) is interesting thanks to the no downtime.

But, despite what is stated, it does not seem correct to me that it
provide consistency. For exemple:

- if datas are cached in some disk write buffer while the snapshot occurs,
these datas would be lost.

- furthermore with a database, it seem to me that the snapshot could cut
a transaction and may save a database in an inconsistent state.

In other words, I would say that the snapshot dump provide only an O/S crash consistency
(VM image will boot). No filesystem consistency, non database consistency.

My idea is to add a fourth backup mode to vzdump: snapshot+suspend.

I mean : suspend the VM, make the snapshot, resume the VM, then save the snapshot.
This way, there should be a small downtime (a few seconds I hope), a filesystem consistency, and (I think) a database consistency.

Regarding the downtime, if not too long, connected clients may not even notice.

All comments or suggestions would be appreciated.

Thanks for this great product to Promox Team.

Phil Ten
 
But, despite what is stated, it does not seem correct to me that it provide consistency. For exemple:

- if datas are cached in some disk write buffer while the snapshot occurs,
these datas would be lost.

Yes, but normal filesystems can revover to a consistent state easily (journaling).

- furthermore with a database, it seem to me that the snapshot could cut
a transaction and may save a database in an inconsistent state.

No - the transaction is simply not commited. The database is still consistent (thats the whole point of having transactions).

In other words, I would say that the snapshot dump provide only an O/S crash consistency
(VM image will boot). No filesystem consistency, non database consistency.

It is a snapshot, and it will work in almost any situation. But I am not aware of any better solution for online backup.
(tell me one method, and I will show you that it is not consistent at all).

My idea is to add a fourth backup mode to vzdump: snapshot+suspend.

I mean : suspend the VM, make the snapshot, resume the VM, then save the snapshot.
This way, there should be a small downtime (a few seconds I hope), a filesystem consistency, and (I think) a database consistency.

Suspend simply stops the processes - so you do not get more consistent backups.

Instead you need to stop the VM to make a 100% consistent backup.

However, in practice, snapshots work quite well when you have fast disks and disk utilization is low. Else suspend mode is better.

- Dietmar
 
Thanks you for your post. I googled a little more and your statement
appear confirmed from what I found : snapshot backups seem to works well in practice.
And actually, much better then physical server backups.

However, please allow me to continue this thread because I think
it's some interesting questions, in particular for me at a point where I need
to decide for a backup strategy.

OK, for the file system, with journaling, it should recover an consistent state.

Regarding, the database consistency, are you sure ?
The main purpose of transactions is to handle SQL failure.
From the databse point of view, the snapshot is more like
a disk hardware failure (some datas expected on the disk
do not exist). What if the snapshot occurs in the middle
of a "commit" processing ?
Would you say that a database would ALWAYS recover a consistent
state in case of a power cut off ? To my opinion it is the same
question. I would say that a snapshot is similar to power cut off.

I say that a suspend/snapshot/resume provide better consistency
(then snapshot alone) because I assume (I may be wrong ?) that
when a VM is suspended all it's RAM is saved to disk, and this way
the RAM will be included in the VM image in the backup.

If the backup is restored and the guest resumed (not started)
it will continue it's processing. For exemple finish a database
"commit" processing, save write caches...

In other words, it is no longer a context similar to a power cut off.
Seem safer to me.

Phil Ten
 
Would you say that a database would ALWAYS recover a consistent state in case of a power cut off ? To my opinion it is the same question. I would say that a snapshot is similar to power cut off.

Yes, its similar to power cut off.

I say that a suspend/snapshot/resume provide better consistency
(then snapshot alone) because I assume (I may be wrong ?) that
when a VM is suspended all it's RAM is saved to disk, and this way
the RAM will be included in the VM image in the backup.

but the current vzdump code does not include RAM when using --suspend mode.

If the backup is restored and the guest resumed (not started)
it will continue it's processing. For exemple finish a database
"commit" processing, save write caches...

And hoepfully it does not depend on system time!

In other words, it is no longer a context similar to a power cut off.
Seem safer to me.

Maybe, but if you want to be sure use stop/start instead.

For OpenVZ stop/start is sometimes even faster than suspend/resume (when you use much RAM).

- Dietmar
 
Thank you very much for your replies.

I will definitely use Proxmox to replace 3 (maybe 4) servers we have today.
Regarding my backup strategy I am not sure yet.
Probably snapshot backups on a daily basis, doubled with some
other method once a week or so.

Great product !
 
Why not do a mysqldump (or equivalent for your database) inside the vz container then snapshot backup that container some time after that's finished? That's what I do.
 
You can partially get it

Philten,
You can backup the active state of your KVM Guest using KVM's Live Migration to file function. It isn't a part of Proxmox (yet) but you can create the state file through the KVM console.

With your KVM guest running, in the Proxmox web interface, select the 'Monitor' tab for your VM.
Type "stop"
- note: this will effectively 'pause' your running KVM Virtual
Type "migrate file:////pathtofile/filename"
- This will create a VM Statefile, roughly equal to the amount of RAM you have assigned to the Virtual

You now need to run a 'snapshot' of the VM's disks. Hopefully you are using a SAN or understand LVM to snapshot the local disks.

Once you have create the statefile and storage snapshots, you can un-pause your VM. Back in the 'Monitor' tab:
Type "c" (for continue)

---

As far as using the statefile, you would need to start the VM manually, either from the command line, or from an different machine entirely (a pain). Plus you would have to point it to the snapshot versions that you created.

It isn't pretty, but it gets the desired results. A state-saving backup of your system - which is certainly an advantage of having a virtualized guest.

---

When migrating between server's in a cluster in Proxmox, I have to think this process already occurs (system is paused, state is saved, drives are synced) when moved to another host. To bad there isn't a way to just do the first part, and not migrate?

I am looking forward to KVM's true Live Migration being implemented in the next Proxmox release, and perhaps some hook to make the process described above automated?
 
Why not do a mysqldump (or equivalent for your database) inside the vz container then snapshot backup that container some time after that's finished? That's what I do.

+1

This is one of the key reasons why databases are used; they are an abstraction of the underlying file system. If you want to consistently backup a database you need to figure out how to do that in the context of the database you are running. Granted, restoring from a snapshot when everything works fine is easier than setting up the database/server software and then restoring the database data from a "dump". However, if you truly want to be safer (or approaching safest) then you will want to both backup the server image as a whole AND backup the database schema/data on its own.
 
Most databases allow you to flush the caches to disk and prevent writing when the snapshot is taken.

Our backup strategy is to run a separate backup script for our MySQL VM's that uses Lock-Tables to ensure consistent MyISAM tables (http://dev.mysql.com/doc/refman/5.0/en/lock-tables.html)

It does the following:
1. mysql> flush tables with read lock;
2. mysql> flush logs;
3. take the LVM snapshot
4. mysql> unlock tables;
5. Create backup archive from snapshot.

This only applies to MyISAM tables. However, InnoDB tables are safer to backup with snapshots since it uses transactions and other mechanisms, not unlike a filesystem journal, to recover from failure.

I am yet to implement this on proxmox, but I have used it successfully on other linux mysql deployments and expect it to work just as well in this scenario. Just make sure that you have a mysql client installed in the Dom0 (or whatever its called in OpenVZ-world) and that it is allowed to connect to your mysql server to run the commands.
 
Yes, but normal filesystems can revover to a consistent state easily (journaling).
- Dietmar

There is a classical article[1] on LWN about Barriers and Journaled File Systems; it appears that without barriers chances are that the journal itself could become corrupted in some situations. Barriers are usually off by default in the filesystem for performance reasons, and furthermore they were not honoured by device mapper until very recent kernels (2.6.33).

Are you aware of any further consistency assurance in the LVM snapshot mechanism other than barriers?

Many Thanks,
Roberto



[1] http://lwn.net/Articles/283161/
 
I recently discovered Promox VE and made many tests with it. I really like it, and very probably will go in production with it (Windows and Linux VMs).

I am now testing VM backups and disaster recovery scenarios. I tried all three backup modes.

I would run VMs with web servers therefore I am concerned by downtimes.
Of course --snapshot (lvm) is interesting thanks to the no downtime.

But, despite what is stated, it does not seem correct to me that it
provide consistency. For exemple:

- if datas are cached in some disk write buffer while the snapshot occurs,
these datas would be lost.

- furthermore with a database, it seem to me that the snapshot could cut
a transaction and may save a database in an inconsistent state.

In other words, I would say that the snapshot dump provide only an O/S crash consistency
(VM image will boot). No filesystem consistency, non database consistency.

My idea is to add a fourth backup mode to vzdump: snapshot+suspend.

I mean : suspend the VM, make the snapshot, resume the VM, then save the snapshot.
This way, there should be a small downtime (a few seconds I hope), a filesystem consistency, and (I think) a database consistency.

Regarding the downtime, if not too long, connected clients may not even notice.

All comments or suggestions would be appreciated.

Thanks for this great product to Promox Team.

Phil Ten

I asked for the quite same with one difference: I think stop+snapshot will be better than suspend+snapshot
http://forum.proxmox.com/threads/2349-Backup-KVM-with-stop-shapshot
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!