Backup MySQL VM with PBS

TwiX

Renowned Member
Feb 3, 2015
311
23
83
Hi,

I noticed that during a PBS backup of a large MySQL VM (with guest agent installed), Mysql is not able to perform requests (select, update, insert)...

I thought that the FS lock triggered by qemu-ga guest-fsfreeze takes only few seconds in order to have a consistent FS.

Is it the normal behavior ? If yes, does it mean that every time I run a PBS backup for that kind of VM, we must facing downtime of production ?

Thanks in advance !

Antoine

1654074300451.png
 
can you post the task log of that specific backup?
how long the fsfreeze takes depends on what exactly is configured inside the guest, if it needs to flush outstanding caches to a slow disk, this could take a bit for example
 
Thanks
Here are the logs

INFO: Starting Backup of VM 20181 (qemu)
INFO: Backup started at 2022-05-31 20:00:02
INFO: status = running
INFO: VM Name: dc-doli-merc01
INFO: include disk 'scsi0' 'kvm_pool:vm-20181-disk-0' 95G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/20181/2022-05-31T18:00:02Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: enabling encryption
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'ab9af76b-caa2-412a-a65f-3a2ede6f7a33'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: 1% (1.2 GiB of 95.0 GiB) in 3s, read: 425.3 MiB/s, write: 86.7 MiB/s
INFO: 2% (2.0 GiB of 95.0 GiB) in 7s, read: 192.0 MiB/s, write: 19.0 MiB/s
INFO: 3% (2.9 GiB of 95.0 GiB) in 12s, read: 187.2 MiB/s, write: 32.0 MiB/s
INFO: 4% (3.9 GiB of 95.0 GiB) in 21s, read: 112.4 MiB/s, write: 17.3 MiB/s
INFO: 5% (4.8 GiB of 95.0 GiB) in 26s, read: 191.2 MiB/s, write: 22.4 MiB/s
INFO: 6% (5.8 GiB of 95.0 GiB) in 36s, read: 102.0 MiB/s, write: 30.4 MiB/s
INFO: 7% (6.8 GiB of 95.0 GiB) in 43s, read: 137.7 MiB/s, write: 8.6 MiB/s
INFO: 8% (7.6 GiB of 95.0 GiB) in 49s, read: 142.7 MiB/s, write: 17.3 MiB/s
...
...
INFO: 90% (85.7 GiB of 95.0 GiB) in 19m 4s, read: 90.4 MiB/s, write: 38.0 MiB/s
INFO: 91% (86.7 GiB of 95.0 GiB) in 19m 19s, read: 71.5 MiB/s, write: 28.3 MiB/s
INFO: 92% (87.7 GiB of 95.0 GiB) in 19m 29s, read: 99.6 MiB/s, write: 38.4 MiB/s
INFO: 93% (88.4 GiB of 95.0 GiB) in 19m 38s, read: 80.4 MiB/s, write: 31.6 MiB/s
INFO: 94% (89.3 GiB of 95.0 GiB) in 19m 52s, read: 68.0 MiB/s, write: 25.7 MiB/s
INFO: 95% (90.3 GiB of 95.0 GiB) in 20m 5s, read: 80.3 MiB/s, write: 23.4 MiB/s
INFO: 100% (95.0 GiB of 95.0 GiB) in 20m 7s, read: 2.3 GiB/s, write: 0 B/s
INFO: backup is sparse: 13.78 GiB (14%) total zero data
INFO: backup was done incrementally, reused 60.74 GiB (63%)
INFO: transferred 95.00 GiB in 1207 seconds (80.6 MiB/s)
INFO: Finished Backup of VM 20181 (00:20:10)
INFO: Backup finished at 2022-05-31 20:20:12
Result: {
"data": null
}

...
May 31 20:00:02 dc-doli-merc01 qemu-ga: info: guest-ping called
May 31 20:00:02 dc-doli-merc01 qemu-ga: info: guest-fsfreeze called
 
i mean generally the backup works by saving the blocks before they get written to by the vm (so each write may be a bit delayed during backup),
so if the target storage is not fast enough, it'll slow down the vm writes, but it should not completely stop it from doing its job...

are there any more hints in the guest log maybe? in the mysql logs maybe?
 
Logs are related to external PBS Backup (Tuxis)

However, local PBS backup (@3:00 am) with pretty fast PBS datastore (2 stripped raidz : 10x 2TB SSD) with 10Gb network, shows locks of Mysql too
But backup takes 10 min instead of 20 min via Tuxis.

Mysql logs
lots of :
2022-05-31 20:00:00 140157609531136 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:00 140156477593344 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:12 140159866181376 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:12 140159866181376 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:12 140161208596224 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:17 140156270819072 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:17 140156270819072 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:17 140156478822144 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:17 140156478822144 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:17 140156477183744 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:17 140156268975872 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140157609735936 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140159866590976 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140161209210624 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140161179649792 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140160998610688 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140156268361472 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140161179035392 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140156479846144 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction


VM FS ext4 with nobarrier
hypervisor PVE 7.0 Ceph 16.2.6 (32 fast SSD on 20 Gb dedicated network)
 
honestly that looks like a weird mysql behaviour, why is it deadlocking? is there something at the start of the backup in the mysql logs that maybe gives a hint?
slow disk access should not lockup mysql...
 
Don't know what to say :/
This weird behavior is clearly triggered by PBS snapshots.

I can also mention that it affects large MySQL databases, (same issue on Zabbix Mysql database (~20 Gbytes)) without table partitioning.
 
could you double check whether the guest agent is hooked up to mysql? the freeze action (inside the guest) is hookable, and there are people that flush and lock the DB on freezing for extra consistency..
 
qemu-guest agent is installed with default set-up.

So I don't think guest agent is hooked up to mysql...
 
A workaround could be to uncheck guest agent in PVE GUI for that VM ?
 
A workaround could be to uncheck guest agent in PVE GUI for that VM ?
that would then skip both the freeze and any potential hook scripts, but yeah, as a quick check whether the agent integration/freeze is the culprit that would work as well.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!