ERROR: VM 100 qmp command 'guest-fsfreeze-thaw' failed - got timeout

Glaciar Errante

New Member
Dec 2, 2020
4
4
3
These timeout occur for my cluster as well like once per month, qmp command 'guest-fsfreeze-thaw' failed - got timeout followed by qmp command 'backup' failed - got timeout. Initially I restarted all affected VMs, but there seems to be a workaround, apparently it's possible to un-freeze the VMs like this:
Code:
qm agent <vmid> fsfreeze-status
Answers with "thawed". Processes stuck on IO requests continue immediately. (fs-thaw might work as well, didn't try so far.)
In another thread (https://forum.proxmox.com/threads/q...ze-thaw-failed-got-timeout.77195/#post-344117) someone claimed "there is a bug in Qemu Guest Agent or in kernel", which could very well be the case, however does anyone know why both qmp commands appear to timeout the same time? Took a quick look at the source, but it's not clear to me.
 

DerDanilo

Well-Known Member
Jan 21, 2017
402
67
48
If it happens once a month it might be an issue with the storage backend that does e.g. scrubbing and has bad I/O. Bad I/O can also cause PBS backups to be problematic.
 

Glaciar Errante

New Member
Dec 2, 2020
4
4
3
Indeed it happens when the PBS storage is scrubbing, however the backup itself runs on a different node with independent storage. So the timeout regarding the backup command is maybe understandable, but thaw failing afterwards is not expected.

I tried to reproduce this by starting the backup manually while hacking the source (/usr/share/perl5/PVE/QMPClient.pm) to not send the backup command in order to force a timeout. The backup command times out as expected, but now thaw works, not sure what's the difference. I also added some debugging output, maybe I have to wait until next month. :/

(If someone else is digging deeper here and is confused that the timing changed: The timeout for the backup command was recently increased from 60s to 125s: https://git.proxmox.com/?p=qemu-ser...pm;h=46b676c0b127028d057f82c47b18df830fa26a49)

What do mean by "Bad I/O can also cause PBS backups to be problematic."?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!