Backup MySQL VM with PBS

TwiX · Jun 1, 2022

Hi,

I noticed that during a PBS backup of a large MySQL VM (with guest agent installed), Mysql is not able to perform requests (select, update, insert)...

I thought that the FS lock triggered by qemu-ga guest-fsfreeze takes only few seconds in order to have a consistent FS.

Is it the normal behavior ? If yes, does it mean that every time I run a PBS backup for that kind of VM, we must facing downtime of production ?

Thanks in advance !

Antoine

dcsapak · Jun 1, 2022

can you post the task log of that specific backup?
how long the fsfreeze takes depends on what exactly is configured inside the guest, if it needs to flush outstanding caches to a slow disk, this could take a bit for example

TwiX · Jun 1, 2022

Thanks
Here are the logs

INFO: Starting Backup of VM 20181 (qemu)
INFO: Backup started at 2022-05-31 20:00:02
INFO: status = running
INFO: VM Name: dc-doli-merc01
INFO: include disk 'scsi0' 'kvm_pool:vm-20181-disk-0' 95G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/20181/2022-05-31T18:00:02Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: enabling encryption
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'ab9af76b-caa2-412a-a65f-3a2ede6f7a33'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: 1% (1.2 GiB of 95.0 GiB) in 3s, read: 425.3 MiB/s, write: 86.7 MiB/s
INFO: 2% (2.0 GiB of 95.0 GiB) in 7s, read: 192.0 MiB/s, write: 19.0 MiB/s
INFO: 3% (2.9 GiB of 95.0 GiB) in 12s, read: 187.2 MiB/s, write: 32.0 MiB/s
INFO: 4% (3.9 GiB of 95.0 GiB) in 21s, read: 112.4 MiB/s, write: 17.3 MiB/s
INFO: 5% (4.8 GiB of 95.0 GiB) in 26s, read: 191.2 MiB/s, write: 22.4 MiB/s
INFO: 6% (5.8 GiB of 95.0 GiB) in 36s, read: 102.0 MiB/s, write: 30.4 MiB/s
INFO: 7% (6.8 GiB of 95.0 GiB) in 43s, read: 137.7 MiB/s, write: 8.6 MiB/s
INFO: 8% (7.6 GiB of 95.0 GiB) in 49s, read: 142.7 MiB/s, write: 17.3 MiB/s
...
...
INFO: 90% (85.7 GiB of 95.0 GiB) in 19m 4s, read: 90.4 MiB/s, write: 38.0 MiB/s
INFO: 91% (86.7 GiB of 95.0 GiB) in 19m 19s, read: 71.5 MiB/s, write: 28.3 MiB/s
INFO: 92% (87.7 GiB of 95.0 GiB) in 19m 29s, read: 99.6 MiB/s, write: 38.4 MiB/s
INFO: 93% (88.4 GiB of 95.0 GiB) in 19m 38s, read: 80.4 MiB/s, write: 31.6 MiB/s
INFO: 94% (89.3 GiB of 95.0 GiB) in 19m 52s, read: 68.0 MiB/s, write: 25.7 MiB/s
INFO: 95% (90.3 GiB of 95.0 GiB) in 20m 5s, read: 80.3 MiB/s, write: 23.4 MiB/s
INFO: 100% (95.0 GiB of 95.0 GiB) in 20m 7s, read: 2.3 GiB/s, write: 0 B/s
INFO: backup is sparse: 13.78 GiB (14%) total zero data
INFO: backup was done incrementally, reused 60.74 GiB (63%)
INFO: transferred 95.00 GiB in 1207 seconds (80.6 MiB/s)
INFO: Finished Backup of VM 20181 (00:20:10)
INFO: Backup finished at 2022-05-31 20:20:12
Result: {
"data": null
}

...
May 31 20:00:02 dc-doli-merc01 qemu-ga: info: guest-ping called
May 31 20:00:02 dc-doli-merc01 qemu-ga: info: guest-fsfreeze called

dcsapak · Jun 1, 2022

i mean generally the backup works by saving the blocks before they get written to by the vm (so each write may be a bit delayed during backup),
so if the target storage is not fast enough, it'll slow down the vm writes, but it should not completely stop it from doing its job...

are there any more hints in the guest log maybe? in the mysql logs maybe?

TwiX · Jun 1, 2022

Logs are related to external PBS Backup (Tuxis)

However, local PBS backup (@3:00 am) with pretty fast PBS datastore (2 stripped raidz : 10x 2TB SSD) with 10Gb network, shows locks of Mysql too
But backup takes 10 min instead of 20 min via Tuxis.

Mysql logs
lots of :
2022-05-31 20:00:00 140157609531136 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:00 140156477593344 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:12 140159866181376 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:12 140159866181376 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:12 140161208596224 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:17 140156270819072 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:17 140156270819072 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:17 140156478822144 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:17 140156478822144 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:17 140156477183744 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:17 140156268975872 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140157609735936 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140159866590976 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140161209210624 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140161179649792 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140160998610688 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140156268361472 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140161179035392 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
2022-05-31 20:00:18 140156479846144 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction

VM FS ext4 with nobarrier
hypervisor PVE 7.0 Ceph 16.2.6 (32 fast SSD on 20 Gb dedicated network)

dcsapak · Jun 1, 2022

honestly that looks like a weird mysql behaviour, why is it deadlocking? is there something at the start of the backup in the mysql logs that maybe gives a hint?
slow disk access should not lockup mysql...

TwiX · Jun 1, 2022

Don't know what to say :/
This weird behavior is clearly triggered by PBS snapshots.

I can also mention that it affects large MySQL databases, (same issue on Zabbix Mysql database (~20 Gbytes)) without table partitioning.

fabian · Jun 1, 2022

could you double check whether the guest agent is hooked up to mysql? the freeze action (inside the guest) is hookable, and there are people that flush and lock the DB on freezing for extra consistency..

TwiX · Jun 1, 2022

qemu-guest agent is installed with default set-up.

So I don't think guest agent is hooked up to mysql...

TwiX · Jun 1, 2022

A workaround could be to uncheck guest agent in PVE GUI for that VM ?

fabian · Jun 1, 2022

TwiX said:
A workaround could be to uncheck guest agent in PVE GUI for that VM ?

that would then skip both the freeze and any potential hook scripts, but yeah, as a quick check whether the agent integration/freeze is the culprit that would work as well.

TwiX · Jun 1, 2022

Seems that Debian + qemu-guest-agent + mariadb = trouble

https://forum.proxmox.com/threads/s...est-agent-fs-freeze-gets-timeout.99887/page-2

Search

Search

Backup MySQL VM with PBS

TwiX

Renowned Member

dcsapak

Proxmox Staff Member

TwiX

Renowned Member

dcsapak

Proxmox Staff Member

TwiX

Renowned Member

dcsapak

Proxmox Staff Member

TwiX

Renowned Member

fabian

Proxmox Staff Member

TwiX

Renowned Member

TwiX

Renowned Member

fabian

Proxmox Staff Member

TwiX

Renowned Member