Can't backup a MariaDB server without breaking it

drjaymz@

Member
Jan 19, 2022
124
5
23
102
I've asked a similar question but didn't get any response.
I have a container with a large 250Gb busy database. Its Ubuntu 22.04 fully up to date and running latest MariaDB 10.6.
Every 20 mins there are many commits to that database totalling a few GB per day.

If I create a snapshot of that CT and then clone it at some point during the cloning process the MariaDB is unable to write, the first you notice is that commits start to backup. The syslog also seems to show nothing from this point so it looks like the disk of the live container becomes either read only or just doesn't return from writes. I have had this happen EVERY time I have tried this so it is 100% "a thing" and not a one off.

Last night I ran a backup at the data centre level at 3am and used snapshot mode. Sure enough when I check this morning all the commits are stuck. Again NOTHING in the logs at all - in fact nothing written to logs since 3.17am.

So - creating snapshots is ok but then doing anything with them interferes with the running container. Is this expected behaviour - something everyone else knows about or have I don't something wrong?
The underlying filesystem is ZFS. Proxmox-v-e 7.2-3
 
I dont have a solution for you. What is clear - the snapshot/clone operation is disruptive enough to the application to stop working properly. PVE uses standard technologies for LXC, ZFS, etc - so I my guess is that this is not PVE specific. Additionally, since linked(backend) clones are only supported via templates in PVE, I suspect you are cloning the snapshot out of band of PVE? What is the purpose of the clone? You already should be able to access the read-only snapshot which should be good enough for backup.

You should consider that every major (and probably minor) backup solution vendor has invested in a Database agent that ensures quiescing of the database and consistent backup. Here is a good description from, arguably, backup market leader at this time: https://helpcenter.veeam.com/docs/agentforlinux/userguide/mysql_backup.html?ver=50

I suggest you implement best practices as a starting point, i.e. try to lock DB before snapshot and clone. I realize that clone operation should be completely out of band of original volume, yet you where you are.
https://serverfault.com/questions/805257/backing-up-a-mysql-database-via-zfs-snapshots
https://www.pingdom.com/blog/zfs-backup/

You should also be able to enter container when its in bad shape and poke around to learn what really happened (or not happened) to the disk/app.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I dont have a solution for you. What is clear - the snapshot/clone operation is disruptive enough to the application to stop working properly. PVE uses standard technologies for LXC, ZFS, etc - so I my guess is that this is not PVE specific. Additionally, since linked(backend) clones are only supported via templates in PVE, I suspect you are cloning the snapshot out of band of PVE? What is the purpose of the clone? You already should be able to access the read-only snapshot which should be good enough for backup.

You should consider that every major (and probably minor) backup solution vendor has invested in a Database agent that ensures quiescing of the database and consistent backup. Here is a good description from, arguably, backup market leader at this time: https://helpcenter.veeam.com/docs/agentforlinux/userguide/mysql_backup.html?ver=50

I suggest you implement best practices as a starting point, i.e. try to lock DB before snapshot and clone. I realize that clone operation should be completely out of band of original volume, yet you where you are.
https://serverfault.com/questions/805257/backing-up-a-mysql-database-via-zfs-snapshots
https://www.pingdom.com/blog/zfs-backup/

You should also be able to enter container when its in bad shape and poke around to learn what really happened (or not happened) to the disk/app.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

Using the UI, select the container, go to snapshots and then create snapshot. This is pretty quick no disruption to anything. Next right click on the still running container then click clone. You cannot clone a running container but you can clone from a previous snapshot. Do this and it breaks your running container. It could well be that it disrupts the filesystem only momentarily but a busy instance of MariaDB it basically stops writing and connections pile up until you can't do anything with it. If it were quiet you might not notice and all may be well.

Next, and this is much more of a problem - if you create a backup job for your container - same thing, you have option to select snapshot - and it creates a temporary snapshot, then it squeezes that into a backup to somewhere else. But in the process the exact same thing happens - your running container breaks.

I can reproduce both.
For now I have secondary Galera cluster nodes and I backup those but select stop - this stops the container does the backup and then restarts the container which then joins the cluster and catches up so all is well. In a way its better because its guaranteed to be consistent because its taken from a gracefully shutdown recently shut down cluster node. This avoids locking the DB which I cannot really do but does mean we have a point it time.

However its a bug that you can't backup a snapshot without breaking the container. I don't know if its a proxmox, some underlying part or mariadb that is at fault. The logs tell you nothing at all.
You can of course do exactly this any other virtualisation tech. What I was trying to do was bring it to the luddites I work for, sticking with the QEMU/KVM that they have used for years and just making it much more manageable.

I'll check those links you sent.
 
I did miss the clone functionality of the CT, but I am 99.999% sure that I am not wrong about clone operation not being a "Linked" one.
A linked clone is one based on snapshot, using backend storage functionality that takes seconds or even milliseconds.
A full clone is done client side - full copy of the disk.

Linked clones can only be done for VM or CT template. Any other clone operation, from snapshot or not is a full client side copy.

Grep for "rsync" on the PVE host during your clone operation.
So a logical candidate to your issue is slow storage or some other underlying storage issue.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: LnxBil
indeed the clone isn't linked.
I don't know if this problem occurs on VM's or is specific to containers.

Really the problem is squarely down to something that the clone / backup causes to interfere with MariaDB. By the time I realise there is a problem and mysql connections are stacked up I can reveal that there doesn't appear to be a problem writing to the disk, I can touch a file in var/lib/mysql. So it must be that writes fail momentarily and MariaDB doesn't recover (or something like that).
 
yes, linked clones are only ever done for template guests, which by definition cannot be running. a full clone is just copying the volume's data, so it seems that the clone causes more I/O than your storage can handle (in combination with the base I/O of the guest).
 
yes, linked clones are only ever done for template guests, which by definition cannot be running. a full clone is just copying the volume's data, so it seems that the clone causes more I/O than your storage can handle (in combination with the base I/O of the guest).
I doubt it- its an ultrawide bandwidth 8 SSD arrangement. That and cloning doesn't affect any other containers only the one the snapshot is from and it does it consistently.

I have found a few other similar issues which relates to snapshotting and ZFS. Which could be the actual problem. But I'm having issues googling what I mean.
 
are the other containers also running busy database workloads? because if not, then "other containers work fine" is not really much of a data point ;)

you can check here what the code in question does, but a full clone of a container (no matter whether from a snapshot or not) consists of:
- creating target volume
- mounting source volume (or snapshot)
- mounting target volume
- running rsync to copy contained data
- unmounting again

repeated for every volume, and some config locking and handling around that (that obviously cannot influence mysql inside the container in any way). so literally the only potential source of problems here is "lots of read I/O from snapshot" or "lots of write I/O to target".
 
are the other containers also running busy database workloads? because if not, then "other containers work fine" is not really much of a data point ;)

you can check here what the code in question does, but a full clone of a container (no matter whether from a snapshot or not) consists of:
- creating target volume
- mounting source volume (or snapshot)
- mounting target volume
- running rsync to copy contained data
- unmounting again

repeated for every volume, and some config locking and handling around that (that obviously cannot influence mysql inside the container in any way). so literally the only potential source of problems here is "lots of read I/O from snapshot" or "lots of write I/O to target".

That code does pretty much what you'd expect - which looks fine to me. There are 4 containers in total and 1 vm. of the 4 their are two MariaDB clusters a primary and secondary.

To put in perspective the busy workload is a central till journal that 350 retail shops report back to - so anything in the business that happened in the last 20 minutes comes in via 8 worker threads that is 90% write that that point. What we see is that those threads get stuck and the inserts they do just stays on commit indefinitely. if you try to write a single row using SQL same issue it just can't write. If you check is its read-only it says no.
There is also another database container which handles reporting which an almost identical dataset size but its far more bursty in nature. It also has maybe 100 simultaneous connection looking at reports via a web front end, some of those queries can be heavy but its 90% read like any other database app.
And yes, you can break each one.

I'd agree with you about IO potentially causing something to time out... up to a point, but I'd consider that a bug if you're average user creates a backup and it breaks stuff in a non obvious way. Particularly if they can't limit the IO or stop it breaking using the provided tools.
Against it being IO bottlenecking: The backup also breaks the container if its using snapshot. Our backup is going to a synology NAS which struggles to write much more than 50Mb/s. Now, I'm not sure exactly what the backup is doing in terms of IO from the snapshot, but I do see ionice is used which presumably means that someone thought that the backup should share nicely its IO. Thats a what I'd expect but the problem remains.

To get around it, I have clustered the databases and I then stop the secondary then clone snapshot or back it up - then there is no problem. Which again is a bit weird because those operations are on the same zfs storage and don't break it. The intention was to squirt those secondary's into another proxmox server and then create a 3rd node, then have another 3 offsite.

I'm hoping I haven't got something horribly misconfigured somewhere but I didn't try to do anything clever at any point.
I still think that MariaDB experiences a write failure from which it never recovers. I was just reading about ZFS bugs and MariaDB atomic writes.
 
Our backup is going to a synology NAS which struggles to write much more than 50Mb/s.
Hopefully you mean MB/s

You can limit a lot with bandwidth options while backing up with e.g. KVM, but copying data from one dataset to another is unfortunately not easily 'cappable'. One could argue, that e.g. pv is able to do that, but that has to be integrated everywhere. It may be one thing you could try to patch PVE yourself to check if that mitigates the problem.

BTW: I have to say, that we're also having sporadic problems with MariaDB inside of an LX(C) container on ZFS. It's the last MariaDB and we will migrate it soon, because no other database system we tried (PostgreSQL or Oracle) has this kind of problem.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!