ZFS - consistent replication of VMs with database applications

rholighaus

Well-Known Member
Dec 15, 2016
97
8
48
60
Berlin
I want to migrate existing Windows Server 2012R2 VMs from Hyper-V to Proxmox. The VMs in question are either based on Oracle or MS SQL databases running within the VM. Currently, they are backed up via a commercial backup solution that backs up the VM disks on the Hyper-V host. As those VMs are mission crtitical, I cannot afford tomtake any risk of inconsistency in backups.

I‘m also running containers with Mysql inside and wonder if pve ZFS replication would be consistent as I can only see a file system freeze & thaw but no interaction with Mysql.

I love the way ZFS snapshots can be used for incremental backups on file system level but haven’t found anything about consistency of pve ZFS replication with VMs (nor containers).

Has anyone made experience and can share?
 
Best practice is to do a logswitch in your database on the VM, sync the changes to disk (with sync on Linux, maybe something different works on Windows too) and then do the ZFS snapshot. This is similar what you would do with qemu-agent and backup.
 
Hi,

I am some bad and good experiences regarding let say mysql (i guess perconadb is almost the same like mysql).
If your mysql app have a good design, all querry that make a write will use sql transactions. So from mysql perspective, a write is ok if all write operations are OK. This opperations are also recorded in intendlog if I remember ... So let say that in a middle of a transaction, you will start a snapshot, who will be replicated on a remote box. Then your mysql it will be broken. You will start on the remote host your replicated VM/container. Mysql will see that some write operations are not committed to the disk (intendlog). In worst case, mysql will not have on disk your last sql transaction. But any other data is ok.

Now in real world, I use this zfs send/receive in a non Proxmox environment, and I do not have any inconsistent mysql problem(and without any linux flush before snapshot). And I have all the problems that someone can imagine (electricity problems, bad networks, .... ). But I set the zfs flushing time from 5 sec default to 3 sec. So even I power off from server button it was ok , luky me ;). I do zfs send-recive at 5 minutes.

But is one big risk : you must be 100% that your zfs replication is working ... ;)

Another option if you want to be 110% sure, is to use a perconadb cluster insted of mysql( is a perconadb-cluster install and copy/paste mysql db files). In this case nothing can go wrong, but you need at least 3 mysql hosts. I use this scenario for 3 years when I can, because zfs+perconadb-cluster is for me "best of the best", and I can survive for a long time ... even for days when 1 perconadb host is down. This last scenario need many hours until you finish, but you do it one time and then forget.


Last note: the biggest problem is to "make pece" between mysql, proxmox and zfs :) Any others problems are only small details ;)


PS: I am sorry for my astonished english language - it is a nightmare for me :)
 
Last edited:
I want to migrate existing Windows Server 2012R2 VMs from Hyper-V to Proxmox. The VMs in question are either based on Oracle or MS SQL databases running within the VM. Currently, they are backed up via a commercial backup solution that backs up the VM disks on the Hyper-V host. As those VMs are mission crtitical, I cannot afford tomtake any risk of inconsistency in backups.

I‘m also running containers with Mysql inside and wonder if pve ZFS replication would be consistent as I can only see a file system freeze & thaw but no interaction with Mysql.

I love the way ZFS snapshots can be used for incremental backups on file system level but haven’t found anything about consistency of pve ZFS replication with VMs (nor containers).

Has anyone made experience and can share?


If you want to run any DB in a windows enviroment/ VM you can use ... urbackup. More about this if you realy want to use this scenario; )
 
Hi guletz, thank you for your recommendation. I like the idea of urbackup and will look into it.

But my main problem is: Does Proxmox ensure that the snapshots created for the new replication feature will be consistent? If you have a look at znapzend for example (https://github.com/oetiker/znapzend), they provide hooks for scripts to be called before and after a snapshot, allowing you to e.g. "FLUSH TABLES WITH READ LOCK" in MySQL inside the container/VM.

All I can see in the PVE logs is that PVE freezes and thaws the file systems, which may or may not trigger a database flush in e.g. Windows VMs - I don't know enough about what the guest-fsfreeze-freeze / guest-fsfreeze-thaw command do to a database running like e.g. Oracle inside a Windows Server VM.
 
All I can see in the PVE logs is that PVE freezes and thaws the file systems, which may or may not trigger a database flush in e.g. Windows VMs - I don't know enough about what the guest-fsfreeze-freeze / guest-fsfreeze-thaw command do to a database running like e.g. Oracle inside a Windows Server VM.

The freeze and thaw invokes vss on Windows, so if Oracle does hook into that, you're fine. Just check the alert log if a log switch occurred at the time of the snapshot. For a Oracle database, it is generally the best to use it on Oracle Linux, which has the best test coverage due to being the main development platform. In Linux you can simply call hook scripts if a freeze or thaw occurs.
 
I know how to execute commands inside containers and use that when I take snapshots by myself, but PVE takes snapshots automatically before starting replication and I have (yet) not control of how to ensure consistent snapshots for my LXC containers running databases inside... A hook would be great...
 
but PVE takes snapshots automatically before starting replication and I have (yet) not control of how to ensure consistent snapshots for my LXC containers running databases inside... A hook would be great...

Two solutions if you're using pve-zsync
a) just write a wrapper for it that does execute your hook just before your original command
b) file a feature request for pve-zsync to implement a hook mechanism. I skimmed over the source code and did not find anything related to that.
 
ZFS snapshots are always crash consistent. Any application that can't recover from a crash consistent snapshot would be incapable of recovering from anything that caused the server to halt unexpectedly (power failure, kernel panic, hardware fault, etc.).

Databases use sync, fsync, etc. operations to guarantee that data is on disk before a transaction is completed. If you haven't broken sync guarantees at a hardware level (eg. a writeback buffer with no battery backup) or software level (eg. setting sync=disabled on ZFS, disabling sync/fsync in the database config, etc.), then committed data will be present in the snapshot. Transactions that were in progress during the snapshot will be rolled back when the database recovers from the crash consistent state.

Quiesce operations are useful for backup tools that can't capture a filesystem in a crash consistent state, where you want all in-progress transactions to have completed, where you have taken liberties with the integrity of sync, where you want to minimize recovery operations, etc.
 
Last edited:
Hi,

@denos have explain again almost the same process that is happening with zfs snapshot. If you do not care, you could use a hook, like in this link:
https://github.com/proxmox/qemu/blob/master/scripts/qemu-guest-agent/fsfreeze-hook

and you can put your own sql flush/whatever....


But in the end you will get the same consistent DB as without any hook, with only one difference : your full snapshot process will take more time to finish.

And now think at a catastrophic scenarios. You will finish your zfs snapshot now, and after 5 minutes, your DB VM is crash (before your next schedule snapshot let say after 10 minutes). So without any DB hook you will miss 100 sql transactions. Maybe with a sql hook you will miss only 50. The problem is that in any case you will louse something, more/less.
In both cases you will need to see what was lost. If this operation is a must to do, then you will need to reconsider ANY asynchronously replication option. For example if you run a webmail (who use a sql engine) nobody care if you lost some sql inserts (mail heders cache, what users are log on, and so on)
Like I wrote if you Do Not Want Any successfully sql transaction to be lost, then the best solution is to use a SQL cluster.

As a final note, you will need to ask yourself what is ok for you: to lose some or to not lose anything? After this answer you will know what you need.

Good luck !
 
  • Like
Reactions: rholighaus

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!