Snapshot backup not working ( guest-agent fs-freeze gets timeout)

NoLine

Member
Oct 7, 2020
16
2
8
43
Trying to take a snapshot backup does not work.
it's looks like this:
https://forum.proxmox.com/threads/vm-hang-during-backup-fs-freeze.80152/

So far I've only seen this on the guest with Debian 11 (and MariaDB from mariadb.org)


Code:
INFO: starting new backup job: vzdump 144 --node kvm02 --remove 0 --mode snapshot --storage local --compress zstd
INFO: Starting Backup of VM 144 (qemu)
INFO: Backup started at 2021-11-17 16:29:37
INFO: status = running
INFO: VM Name: NFY-isengard
INFO: include disk 'scsi0' 'local-zfs:vm-144-disk-0' 60G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-qemu-144-2021_11_17-16_29_37.vma.zst'
INFO: issuing guest-agent 'fs-freeze' command
and freeze

in guest syslog:
Code:
Nov 17 16:29:36 isengard qemu-ga: info: guest-ping called
Nov 17 16:29:37 isengard qemu-ga: info: guest-fsfreeze called
Nov 17 16:32:06 isengard kernel: [  363.556779] INFO: task qemu-ga:370 blocked for more than 120 seconds.
Nov 17 16:32:06 isengard kernel: [  363.556814]       Not tainted 5.10.0-9-amd64 #1 Debian 5.10.70-1
Nov 17 16:32:06 isengard kernel: [  363.556829] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 17 16:32:06 isengard kernel: [  363.556852] task:qemu-ga         state:D stack:    0 pid:  370 ppid:     1 flags:0x00004000
Nov 17 16:32:06 isengard kernel: [  363.556861] Call Trace:
Nov 17 16:32:06 isengard kernel: [  363.556881]  __schedule+0x282/0x870
Nov 17 16:32:06 isengard kernel: [  363.556886]  schedule+0x46/0xb0
Nov 17 16:32:06 isengard kernel: [  363.556888]  percpu_down_write+0xd2/0xe0
Nov 17 16:32:06 isengard kernel: [  363.556891]  freeze_super+0x7f/0x130
Nov 17 16:32:06 isengard kernel: [  363.556893]  __x64_sys_ioctl+0x62/0xb0
Nov 17 16:32:06 isengard kernel: [  363.556895]  do_syscall_64+0x33/0x80
Nov 17 16:32:06 isengard kernel: [  363.556897]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 17 16:32:06 isengard kernel: [  363.556903] RIP: 0033:0x7f26d5183cc7
Nov 17 16:32:06 isengard kernel: [  363.556905] RSP: 002b:00007ffc4a00b2f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Nov 17 16:32:06 isengard kernel: [  363.556907] RAX: ffffffffffffffda RBX: 0000559616e42790 RCX: 00007f26d5183cc7
Nov 17 16:32:06 isengard kernel: [  363.556916] RDX: 0000000000080000 RSI: 00000000c0045877 RDI: 0000000000000006
Nov 17 16:32:06 isengard kernel: [  363.556917] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000002a
Nov 17 16:32:06 isengard kernel: [  363.556918] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Nov 17 16:32:06 isengard kernel: [  363.556927] R13: 00007ffc4a00b3f8 R14: 00000000c0045877 R15: 0000000000000006
Nov 17 16:36:07 isengard kernel: [  605.220798] INFO: task qemu-ga:370 blocked for more than 120 seconds.
Nov 17 16:36:07 isengard kernel: [  605.220837]       Not tainted 5.10.0-9-amd64 #1 Debian 5.10.70-1
Nov 17 16:36:07 isengard kernel: [  605.220860] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 17 16:36:07 isengard kernel: [  605.220893] task:qemu-ga         state:D stack:    0 pid:  370 ppid:     1 flags:0x00004000
Nov 17 16:36:07 isengard kernel: [  605.220896] Call Trace:
Nov 17 16:36:07 isengard kernel: [  605.220904]  __schedule+0x282/0x870
Nov 17 16:36:07 isengard kernel: [  605.220906]  schedule+0x46/0xb0
Nov 17 16:36:07 isengard kernel: [  605.220909]  percpu_down_write+0xd2/0xe0
Nov 17 16:36:07 isengard kernel: [  605.220911]  freeze_super+0x7f/0x130
Nov 17 16:36:07 isengard kernel: [  605.220914]  __x64_sys_ioctl+0x62/0xb0
Nov 17 16:36:07 isengard kernel: [  605.220925]  do_syscall_64+0x33/0x80
Nov 17 16:36:07 isengard kernel: [  605.220926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Nov 17 16:36:07 isengard kernel: [  605.220929] RIP: 0033:0x7f26d5183cc7
Nov 17 16:36:07 isengard kernel: [  605.220932] RSP: 002b:00007ffc4a00b2f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Nov 17 16:36:07 isengard kernel: [  605.220934] RAX: ffffffffffffffda RBX: 0000559616e42790 RCX: 00007f26d5183cc7
Nov 17 16:36:07 isengard kernel: [  605.220935] RDX: 0000000000080000 RSI: 00000000c0045877 RDI: 0000000000000006
Nov 17 16:36:07 isengard kernel: [  605.220935] RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000002a
Nov 17 16:36:07 isengard kernel: [  605.220936] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Nov 17 16:36:07 isengard kernel: [  605.220936] R13: 00007ffc4a00b3f8 R14: 00000000c0045877 R15: 0000000000000006

Host:
Code:
proxmox-ve: 6.4-1 (running kernel: 5.4.143-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-helper: 6.4-8
pve-kernel-5.4: 6.4-7
pve-kernel-5.4.143-1-pve: 5.4.143-1
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve4~bpo10
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve1~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.6-pve1~bpo10+1


any suggestion ?
 
Hi,

Same problem here, VM on PVE 7.5.1, Debian 11, mariadb 10.6.5 installed with the mariadb repos. Nothing else except an NFS mount point to my NAS.
EDIT : there is also filebeat installed to push logs to an ELK stack

Proxmox fully up-to-date (according to the repos, anyway), Debian 11 fully up-to-date too, including qemu guest-agent (unless there is a specific repo to enable?) : qemu-guest-agent 1:5.2+dfsg-11+deb11u1
No configuration tweak on qemu-guest-agent

No other Debian 11 or Ubuntu VMs has the issue, only this one.

I've set up manual mysql backups in the mean time but help would be very appreciated to be able to re-include this VM in the PBS backup jobs :)
 
Last edited:
From what I could read from other threads, this seems to work, yes. However, I did not try it as this VM hosts a database.
I would not trust a backup run on a database VM without properly freezing the FS first. As far as I'm concerned, I prefer disabling the PBS backups and run mysql-dumps manually on a regular basis.

Furthermore, disabling the guest agent is a workaround, I'm asking if anyone has an idea of a proper fix or what information someone would need to come up with a proper fix.
 
I understand and agree with you.
Maybe you can do both things:
  • Database backups over a directory.
  • Backup of the VM with PBS (make sure that this is launched after the database backup).
This way, you will have in the PBS an historic of consistent database backups that you can restore if you need it.
 
Actually, you're describing my current setup :)

Still, workarounds have already been discussed in other threads and I don't think it's of any interest here.
We're looking for a permanent definitive fix if possible or some advise on how to investigate further and find that fix, please.
 
Last edited:
Another piece of information: I've upgraded PVE yesterday, all packages. I restarted the node and started only the mariadb VM.

I tried to freeze it, it worked. Thawed it, it worked. I started a PVE backup job on this VM only and it worked as well.
I thought that the update fixed the issue, re-enabled this VM in the global backup job that runs twice a day but the VM froze again at 2am tonight, during the job.

So, apparently, there is something that "changed" in the VM and that made the problem appear. I have no idea what... Maybe mariadb client connections? But why would it hang the VM when PVE tries to freeze it, there I'm lost...
 
I gave up on debian 11.

Now I'm testing Centos 8 Stream and MariaDB from mariadb.org
Currently I see no problems.
 
I gave up on debian 11.

Now I'm testing Centos 8 Stream and MariaDB from mariadb.org
Currently I see no problems.
Good to know it works well on CentOS...
Have you tried to open a ticket / forum post with mariadb? Maybe they could explain what's wrong as well, as it looks like other Debian 11 VMs are not affected?
 
If there are no problems for a few days, I will try to submit a ticket to Debian and MariaDB.

Has anyone tested debian 11 and mariadb from the debian repo?
 
I have the exact same problem, Debian and MariaDB from their own Repo.
Debian 11
MariaDB 10.7

Only setup the VM last week, so far no other Debian VM have been effected. I did reinstall the agent on the MariaDB VM, which got it working for a while but eventually issue comes back.

If there are no problems for a few days, I will try to submit a ticket to Debian and MariaDB.

Has anyone tested debian 11 and mariadb from the debian repo?
Have you reported it? I can also add to the ticket if you have :)

Edit: I've reported the issue upstream too https://gitlab.com/qemu-project/qemu/-/issues/881


Backup Log, I had to unlock the VM from qm and then force a reset for the VM to become functional again. Which I did at 8am
Code:
103: 2022-02-21 07:00:25 INFO: Starting Backup of VM 103 (qemu)
103: 2022-02-21 07:00:25 INFO: status = running
103: 2022-02-21 07:00:25 INFO: VM Name: mariadb1
103: 2022-02-21 07:00:25 INFO: include disk 'scsi0' 'local-zfs:vm-103-disk-0' 10G
103: 2022-02-21 07:00:25 INFO: backup mode: snapshot
103: 2022-02-21 07:00:25 INFO: ionice priority: 7
103: 2022-02-21 07:00:25 INFO: creating Proxmox Backup Server archive 'vm/103/2022-02-21T07:00:25Z'
103: 2022-02-21 07:00:25 INFO: issuing guest-agent 'fs-freeze' command
103: 2022-02-21 08:00:25 ERROR: VM 103 qmp command 'guest-fsfreeze-freeze' failed - got timeout
103: 2022-02-21 08:00:25 INFO: issuing guest-agent 'fs-thaw' command
103: 2022-02-21 08:00:35 ERROR: VM 103 qmp command 'guest-fsfreeze-thaw' failed - got timeout
103: 2022-02-21 08:00:35 INFO: started backup task '5c3798d2-4a18-45f9-8120-46c47b27ad69'
103: 2022-02-21 08:00:35 INFO: resuming VM again
103: 2022-02-21 08:00:35 ERROR: VM 103 qmp command 'cont' failed - Resetting the Virtual Machine is required
103: 2022-02-21 08:00:35 INFO: aborting backup job
103: 2022-02-21 08:00:35 INFO: resuming VM again
103: 2022-02-21 08:00:35 ERROR: Backup of VM 103 failed - VM 103 qmp command 'cont' failed - Resetting the Virtual Machine is required
 
Last edited:
Experiencing a similar issue on machines with Ubuntu 20 and MariaDB. Backups will lock up the machine.
 
mysqld Ver 10.3.34-MariaDB-0ubuntu0.20.04.1 for debian-linux-gnu on x86_64 (Ubuntu 20.04)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!