Guest-agent fs-freeze command breaks the system on backup

in my case It also happened to me with cpanel + cloudlinux,

but some cpanel + cloudlinux work and another no, some crash in one day and some fail in random day's, don't care if i'm using pbs, nfs, local disk, crash some internally is the disventage of mantain very old kernel and try to patch

additionally this in some cases i remember that this occurs in machines without cloudlinux, but with centos7 + cpanel, which means that an old kernel was used too, but in newer ubuntu + cpanel for me is working well i don't no if this is failing in centos8 / alma linux based systems

For me fails also with centos/alma 8
 
For me fails also with centos/alma 8
centos/alma 8 based kernel or centos/alma 8 with cloudlinux kernel?

if are using cloudlinux kernel then is the same in centos 7, centos 8 or centos 1 if this existed
but could point if it really is something from cpanel or cloudlinux
 
Last edited:
For me this just happened (not for the first time) with a Debian 11 VM. No cpanel. Backup to PBS.

Backing up to a smb/cifs target (OMV) works flawlessly (well, it's a big backup, so I can't tell yet, but at least the VM didn't crash at fsfreeze).

I have been having issues with my PBS lately: timeouts when listing existing backups (the datastore sits on a hdd zpool which, I learned now, is not recommended).

So maybe these two issues are linked. And maybe it is not the fsfreeze command but something that happens (or should happen) directly after that (and that is kept from happening by PBS' timing out)?
 
Last edited:
happened to me too. VM with PBS, stuck on issuing guest-agent 'fs-freeze' command when backup PBS VM in PBS. fixed when agent disabled.
 
Guys who experience the issue with fsfreeze locking fs inside guest, do you by any chance have /tmp on loopback interface?
Like:
/dev/loop0 1,2G 2,9M 1,2G 1% /tmp
I vagely remember, that there is an issue for a few years and developers are (still?) doing the blame game.
So all my WHM installs have guest agent disabled.
 
Last edited:
Hello, I've just found on CentOS that in the file: /etc/sysconfig/qemu-ga

There is a parameter called:

FSFREEZE_HOOK_PATHNAME=/etc/qemu-ga/fsfreeze-hook

Anybody tried to investigate on this?
 
Last edited:
Hi,
Hello, I've just found on CentOS that in the file: /etc/sysconfig/qemu-ga

There is a parameter called:

FSFREEZE_HOOK_PATHNAME=/etc/qemu-ga/fsfreeze-hook

Anybody tried to investigate on this?
if you make sure the guest agent is invoked with the -F flag, the hook script will be invoked before freeze/after thaw and can be used e.g. to prepare applications for the freeze. You can place custom scripts in fsfreeze-hook.d (don't forget to make them executable) which will be called by the default script.
 
I think I have confirmed that the commit fixes qemu issue 520; I have updated the issue; https://gitlab.com/qemu-project/qemu/-/issues/520#note_2446149051

Now I guess people just need to try raising bugs downstream to try and encourage the maintainers for qemu-guest-agent in their distro of choice to cherry pick / backport what to my relatively inexperienced eye looks like a easy patch with no dependencies on anything else, so that people don't have to wait for qemu v10+ based packages to get to them.
 
iirc securetmp does what it does in making a bind mount which is what triggers this behaviour, at least if you are not using LVM for your storage. IMO trading a security feature for working backups is not a great choice to make, but it's certainly "easier" than compiling your own qemu-ga.

Users of CloudLinux with LVE / CageFS enabled however will still have problems, so then you really do need to make your own qemu-ga / have the package maintainers compile a patched build.

I will try to raise bugs downstream to try to encourage package maintainers at least for Debian and AlmaLinux (which should also benefit CloudLinux) to cherry-pick the fix when I get time but anyone else could do this in the meantime if you have time.


Also, make sure if you're backing up things like cPanel servers that you are adding the mysql-flush.sh hook script - not installed by default with qemu-guest-agent as packaged by most distros - to ensure your databases are actually consistent!
 
Last edited: