Problem with fsfreeze-freeze and qemu guest agent

werenzo

Active Member
May 21, 2018
6
1
43
47
Hello,

we're running proxmox-ve: 5.4-1 (running kernel: 4.15.18-12-pve)

I was trying to test qemu-guest-agent fsfreeze function in a CloudLinux release 7.7 (3.10.0-962.3.2.lve1.5.26.4.el7.x86_64) guest.

When I issue the command

qm agent <vmid> fsfreeze-freeze

The guest agent stop responding and I get

"QEMU guest agent is not running"

In order to restore functionality I have to restart qemu-guest-agent.service (systemctl restart qemu-guest-agent.service) but the process seems not responding and finally systemd kills it to restart.

I've noticed some backup related post with a similar behaviour. Could you help me debug this problem?

Thank you
 
Do simple qm guest cmd<vmid> ping works? It should be also visible in the guest syslog..

I suspect that this is rather an issue with the in guest packaged guest-agent, the cloudlinux website was currently not reachable (got some cloudflare error site), but wikipedia says that the 7.x series was initially released in 2015, so depending on the updates for the agent it could be a bit outdated. Maybe see if there's some updates?

If ping and other QGA commands work, it could be related to freezing of the rootfs - what runs inside that VM, not all workloads are easy to freeze - some Databases or also some Container implementation (like older OpenVZ a long time ago) often have some issues with freeze. The guest kernel is also a bit outdated, not sure how good the FS freezer support was then.
 
Hello t.lamprecht,

qm guest ping result in "QEMU guest agent is not running".

Here's the installed version of guest agent:

qemu-guest-agent.x86_64 10:2.12.0-3.el7 @cloudlinux-x86_64-server-7
 
I believe I have a similar problem. This is what CloudLinux came back to me with.

The issue is not related to CloudLinux directly, but to Qemu agent, which does not freeze the file system(s) correctly. What is actually happening:

When VM backup is invoked, Qemu agent freezes the file systems, so no single change will be made during the backup. But Qemu agent does not respect the loop* devices in freezing order (we have checked its sources), which leads to the next situation:
1) freeze loopback fs
---> send async reqs to loopback thread
2) freeze main fs
3) loopback thread wakes up and trying to write data to the main fs, which is still frozen, and this finally leads to the hung task and kernel crash.

I'm afraid we have no further recommendations at this point.
 
  • Like
Reactions: bzb-rs and werenzo
I have same issue:

Details:
Proxmox ver. 6.1-8
VM OS:
Distributor ID: Ubuntu
Description: Ubuntu 16.04.4 LTS
Release: 16.04
Codename: xenial
Kernel 4.4.0-186-generic
qemu-guest-agent package installed on VM: 1:2.5+dfsg-5ubuntu10.44 (amd64)

Failed backup log:
--------------------------------------------------------------------------------------------------
INFO: starting new backup job: vzdump 800 --node elio --storage ftpback-xxxx.net --mode snapshot --mailto xx@xx --compress lzo --quiet 1 --mailnotification failure
INFO: Starting Backup of VM 800 (qemu)
INFO: Backup started at 2020-08-14 22:00:03
INFO: status = running
INFO: update VM 800: -lock backup
INFO: VM Name: xxvm
INFO: include disk 'virtio0' 'local:800/vm-800-disk-0.qcow2' 500G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/ftpback-xxxx.net/dump/vzdump-qemu-800-2020_08_14-22_00_02.vma.lzo'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 800 qmp command 'guest-fsfreeze-thaw' failed - got timeout
ERROR: got timeout
INFO: aborting backup job
ERROR: Backup of VM 800 failed - got timeout
INFO: Failed at 2020-08-14 22:00:33
INFO: Backup job finished with errors

TASK ERROR: job errors
-----------------------------------------------------------------------------------------

In VM's syslog log I see below rows:
--------------------------------------------------------------------------------------------------
Aug 15 00:00:05 localhost qemu-ga: info: guest-ping called
Aug 15 00:00:05 localhost qemu-ga: info: guest-fsfreeze called
Aug 15 00:00:31 localhost kernel: [365549.008052] TCP: request_sock_TCP: Possible SYN flooding on port 2121. Sending cookies. Check SNMP counters.
Aug 15 00:02:29 localhost systemd[1]: user@1002.service: Start operation timed out. Terminating.
Aug 15 00:02:30 localhost systemd[1]: user@1007.service: Start operation timed out. Terminating.
Aug 15 00:02:30 localhost systemd[1]: user@1004.service: Start operation timed out. Terminating.
Aug 15 00:02:32 localhost systemd[1]: user@1006.service: Start operation timed out. Terminating.
Aug 15 00:02:33 localhost systemd[1]: user@1010.service: Start operation timed out. Terminating.
Aug 15 00:02:37 localhost systemd[1]: user@1005.service: Start operation timed out. Terminating.
Aug 15 00:03:59 localhost systemd[1]: user@1002.service: State 'stop-final-sigterm' timed out. Killing.
Aug 15 00:04:00 localhost systemd[1]: user@1004.service: State 'stop-final-sigterm' timed out. Killing.
Aug 15 00:04:00 localhost systemd[1]: user@1007.service: State 'stop-final-sigterm' timed out. Killing.
Aug 15 00:04:02 localhost systemd[1]: user@1006.service: State 'stop-final-sigterm' timed out. Killing.
Aug 15 00:04:03 localhost systemd[1]: user@1010.service: State 'stop-final-sigterm' timed out. Killing.
Aug 15 00:04:07 localhost systemd[1]: user@1005.service: State 'stop-final-sigterm' timed out. Killing.
Aug 15 00:05:30 localhost systemd[1]: user@1002.service: Processes still around after final SIGKILL. Entering failed mode.
Aug 15 00:05:30 localhost systemd[1]: Failed to start User Manager for UID 1002.
Aug 15 00:05:30 localhost systemd[1]: user@1002.service: Unit entered failed state.
Aug 15 00:05:30 localhost systemd[1]: user@1002.service: Failed with result 'timeout'.
Aug 15 00:05:30 localhost systemd[1]: user@1004.service: Processes still around after final SIGKILL. Entering failed mode.
--------------------------------------------------------------------------------------------------


I was not able to access to console and via ssh so I did a force restart. The funny thing is that the previous night backup worked fine with fs-freeze.

Is there a way to increase qemu-guest-agent timeout? I cannot find configuration file on ubuntu.

I read a lot of threads about this issue but I didn't find really solutions or at least the main cause, it's like it happen randomly...

it makes me crazy, thinking that a backup can block my vm and my clients will be hungry the next morning *facepalm*

Anyway, thanks for support :)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!