Guest freeze timeout - feature request

nethfel

Member
Dec 26, 2014
151
0
16
Hi all,

I have a few guests that show this in the backup logs:
qmp command 'guest-fsfreeze-freeze' failed - got timeout

qmp command 'guest-fsfreeze-thaw' failed - got timeout

This is obviously dealing with the qemu guest agent, and the corresponding vss functionality (when dealing with a windows client in this case) - I notice in the source code:

Code:
if (!$timeout) {
	# hack: monitor sometime blocks
	if ($cmd->{execute} eq 'query-migrate') {
	    $timeout = 60*60; # 1 hour
	} elsif ($cmd->{execute} =~ m/^(eject|change)/) {
	    $timeout = 60; # note: cdrom mount command is slow
	} elsif ($cmd->{execute} eq 'guest-fsfreeze-freeze' ||
		 $cmd->{execute} eq 'guest-fsfreeze-thaw') {
	    $timeout = 10;
	} elsif ($cmd->{execute} eq 'savevm-start' ||
		 $cmd->{execute} eq 'savevm-end' ||
		 $cmd->{execute} eq 'query-backup' ||
		 $cmd->{execute} eq 'query-block-jobs' ||
		 $cmd->{execute} eq 'backup-cancel' ||
		 $cmd->{execute} eq 'query-savevm' ||
		 $cmd->{execute} eq 'delete-drive-snapshot' || 
		 $cmd->{execute} eq 'guest-shutdown' ||
		 $cmd->{execute} eq 'snapshot-drive'  ) {
	    $timeout = 10*60; # 10 mins ?
	} else {
	    $timeout = 3; # default
	}
    }

of qemu-server, the timeout is 10 seconds. normally I assume this would be a good time - these two machines happen to be WSUS servers so they have a lot of data and a lot of small files and I'm thinking the VSS functionality is just taking too long and timing out. Is there any way (either existing or for the future) to add in the .conf to make it possible to increase that timeout period? That way for machines that don't experience the problem, it would just use the default, but for those few cases, you could increase it if necessary without increasing it for all of the machines....

Right now I'm not even sure of the state of those VMs due to the two timeouts (freeze and thaw) right after each other where it may be in a state where it may have not properly thawed (assuming it ever finished freezing). For those two vms right now I'll have to turn off the guest agent functionality to prevent any future issues, but it'd be nice if one could extend the timeout if needed.
 
of qemu-server, the timeout is 10 seconds. normally I assume this would be a good time - these two machines happen to be WSUS servers so they have a lot of data and a lot of small files and I'm thinking the VSS functionality is just taking too long and timing out.

Please can you test how long it takes (by increasing the timeout). Maybe we can set longer timeouts by default.
 
Well, I ran tests on one of the two servers that threw up that note in the backup log - I couldn't get it to timeout - so I'm not sure what the problem was. This was the beginning of one of the actual backup reports:

Code:
113: May 20 00:00:02 INFO: Starting Backup of VM 113 (qemu)
113: May 20 00:00:02 INFO: status = running
113: May 20 00:00:02 INFO: update VM 113: -lock backup
113: May 20 00:00:03 INFO: backup mode: snapshot
113: May 20 00:00:03 INFO: ionice priority: 7
113: May 20 00:00:03 INFO: creating archive '/mnt/pve/Syn411/dump/vzdump-qemu-113-2015_05_20-00_00_02.vma.lzo'
113: May 20 00:00:13 ERROR: VM 113 qmp command 'guest-fsfreeze-freeze' failed - got timeout
113: May 20 00:00:23 ERROR: VM 113 qmp command 'guest-fsfreeze-thaw' failed - got timeout
113: May 20 00:00:23 INFO: started backup task '32db9234-efbd-4cc0-8a46-b2dbfa3b09d8'

It does claim to have backup correctly according to the status indicator at the top of the email. I don't know what generates this log, I'm just wondering if there is a more detailed log somewhere with regard to the qmp commands...
 
me too, proxmox up to date, it was yesterday night.:
qmp command 'guest-fsfreeze-freeze' failed - got timeout
on all my linux vm
 
@dietmar always the same problem about the timeout. Especially on a planified task which run every week.
what do you want me to give you as information? Seems to happen on several vm or maybe all of them.

rrd

rrd

rrd

rrd


INFO: starting new backup job: vzdump 100 101 102 104 105 106 107 111 --mode suspend --compress lzo --mailnotification failure --quiet 1 --storage backupsyno
INFO: Starting Backup of VM 100 (qemu)
INFO: status = running
INFO: update VM 100: -lock backup
INFO: backup mode: suspend
INFO: ionice priority: 7
INFO: suspend vm
INFO: snapshots found (not included into backup)
INFO: creating archive '/mnt/pve/backupsyno/dump/vzdump-qemu-100-2016_02_14-03_00_01.vma.lzo'
ERROR: VM 100 qmp command 'guest-fsfreeze-freeze' failed - got timeout
ERROR: VM 100 qmp command 'guest-fsfreeze-thaw' failed - got timeout
 
I see this a lot on Windows VMs which are running MSSQL. MSSQL subscribes to VSS notifications and checkpoints itself when it is notified a backup snapshot has been triggered. This can take much longer than 10 seconds. I'm sure there are a lot of other variables as to how long it takes VSS to generate the snapshot, so ideally this would be configurable.
 
  • Like
Reactions: chrone
I see this a lot on Windows VMs which are running MSSQL. MSSQL subscribes to VSS notifications and checkpoints itself when it is notified a backup snapshot has been triggered. This can take much longer than 10 seconds. I'm sure there are a lot of other variables as to how long it takes VSS to generate the snapshot, so ideally this would be configurable.
Actually, it not only on Windows VM, it appear at Linux VM. Bellow my log:
Code:
101: Dec 16 00:00:02 INFO: Starting Backup of VM 101 (qemu)
101: Dec 16 00:00:02 INFO: status = running
101: Dec 16 00:00:03 INFO: update VM 101: -lock backup
101: Dec 16 00:00:03 INFO: VM Name: svr20.campus.com
101: Dec 16 00:00:03 INFO: include disk 'virtio0' 'Campus_Raid:101/vm-101-disk-1.qcow2'
101: Dec 16 00:00:03 INFO: backup mode: snapshot
101: Dec 16 00:00:03 INFO: ionice priority: 7
101: Dec 16 00:00:03 INFO: creating archive '/var/lib/vz/dump/vzdump-qemu-101-2016_12_16-00_00_02.vma.lzo'
101: Dec 16 01:00:03 ERROR: VM 101 qmp command 'guest-fsfreeze-freeze' failed - got timeout
101: Dec 16 01:00:13 ERROR: VM 101 qmp command 'guest-fsfreeze-thaw' failed - got timeout
101: Dec 16 01:00:13 INFO: started backup task '2cd9899c-6d2b-4c11-b351-9833ff7bc086'
101: Dec 16 01:00:16 INFO: status: 8% (3048538112/34359738368), sparse 7% (2432135168), duration 3, 1016/205 MB/s
101: Dec 16 01:00:19 INFO: status: 9% (3227123712/34359738368), sparse 7% (2437697536), duration 6, 59/57 MB/s
101: Dec 16 01:00:22 INFO: status: 10% (3439591424/34359738368), sparse 7% (2437697536), duration 9, 70/70 MB/s
101: Dec 16 01:00:28 INFO: status: 11% (3782082560/34359738368), sparse 7% (2437758976), duration 15, 57/57 MB/s
101: Dec 16 01:00:36 INFO: status: 12% (4139384832/34359738368), sparse 7% (2437779456), duration 23, 44/44 MB/s
101: Dec 16 01:00:40 INFO: status: 13% (4543676416/34359738368), sparse 7% (2626998272), duration 27, 101/53 MB/s
101: Dec 16 01:00:43 INFO: status: 16% (5634129920/34359738368), sparse 10% (3702812672), duration 30, 363/4 MB/s
101: Dec 16 01:00:49 INFO: status: 17% (5876547584/34359738368), sparse 10% (3702812672), duration 36, 40/40 MB/s
101: Dec 16 01:00:54 INFO: status: 18% (6227689472/34359738368), sparse 10% (3702812672), duration 41, 70/70 MB/s
101: Dec 16 01:01:02 INFO: status: 19% (6587416576/34359738368), sparse 10% (3702816768), duration 49, 44/44 MB/s
101: Dec 16 01:01:05 INFO: status: 30% (10616438784/34359738368), sparse 22% (7672291328), duration 52, 1343/19 MB/s
101: Dec 16 01:01:08 INFO: status: 31% (10815471616/34359738368), sparse 22% (7675830272), duration 55, 66/65 MB/s
101: Dec 16 01:01:11 INFO: status: 32% (11052384256/34359738368), sparse 22% (7679934464), duration 58, 78/77 MB/s
101: Dec 16 01:01:16 INFO: status: 33% (11388059648/34359738368), sparse 22% (7680720896), duration 63, 67/66 MB/s
101: Dec 16 01:01:22 INFO: status: 34% (11684544512/34359738368), sparse 22% (7692029952), duration 69, 49/47 MB/s
101: Dec 16 01:01:30 INFO: status: 35% (12093620224/34359738368), sparse 22% (7708798976), duration 77, 51/49 MB/s
101: Dec 16 01:01:37 INFO: status: 36% (12381061120/34359738368), sparse 22% (7713583104), duration 84, 41/40 MB/s
101: Dec 16 01:01:44 INFO: status: 42% (14598799360/34359738368), sparse 28% (9657790464), duration 91, 316/39 MB/s
101: Dec 16 01:01:47 INFO: status: 54% (18563006464/34359738368), sparse 39% (13552480256), duration 94, 1321/23 MB/s
101: Dec 16 01:01:53 INFO: status: 55% (18918014976/34359738368), sparse 39% (13556228096), duration 100, 59/58 MB/s
101: Dec 16 01:02:05 INFO: status: 56% (19274399744/34359738368), sparse 39% (13557170176), duration 112, 29/29 MB/s
101: Dec 16 01:02:10 INFO: status: 57% (19590807552/34359738368), sparse 39% (13558095872), duration 117, 63/63 MB/s
101: Dec 16 01:02:16 INFO: status: 58% (19963314176/34359738368), sparse 39% (13561212928), duration 123, 62/61 MB/s
101: Dec 16 01:02:21 INFO: status: 59% (20332019712/34359738368), sparse 39% (13562327040), duration 128, 73/73 MB/s
101: Dec 16 01:02:26 INFO: status: 60% (20621099008/34359738368), sparse 39% (13562327040), duration 133, 57/57 MB/s
101: Dec 16 01:02:35 INFO: status: 63% (21944664064/34359738368), sparse 42% (14562889728), duration 142, 147/35 MB/s
101: Dec 16 01:02:38 INFO: status: 77% (26478379008/34359738368), sparse 55% (19020390400), duration 145, 1511/25 MB/s
101: Dec 16 01:02:45 INFO: status: 78% (26839482368/34359738368), sparse 55% (19020623872), duration 152, 51/51 MB/s
101: Dec 16 01:02:50 INFO: status: 79% (27205828608/34359738368), sparse 55% (19020685312), duration 157, 73/73 MB/s
101: Dec 16 01:02:55 INFO: status: 80% (27512274944/34359738368), sparse 55% (19021811712), duration 162, 61/61 MB/s
101: Dec 16 01:03:03 INFO: status: 81% (27888844800/34359738368), sparse 55% (19129757696), duration 170, 47/33 MB/s
101: Dec 16 01:03:09 INFO: status: 82% (28215476224/34359738368), sparse 55% (19130437632), duration 176, 54/54 MB/s
101: Dec 16 01:03:16 INFO: status: 83% (28596961280/34359738368), sparse 55% (19130437632), duration 183, 54/54 MB/s
101: Dec 16 01:03:21 INFO: status: 84% (28885123072/34359738368), sparse 55% (19130437632), duration 188, 57/57 MB/s
101: Dec 16 01:03:24 INFO: status: 100% (34359738368/34359738368), sparse 71% (24593018880), duration 191, 1824/4 MB/s
101: Dec 16 01:03:24 INFO: transferred 34359 MB in 191 seconds (179 MB/s)
101: Dec 16 01:03:24 INFO: archive file size: 3.63GB
101: Dec 16 01:03:24 INFO: delete old backup '/var/lib/vz/dump/vzdump-qemu-101-2016_12_15-00_00_02.vma.lzo'
101: Dec 16 01:03:25 INFO: Finished Backup of VM 101 (01:03:23)
This VM Running CentOS 7
 
Hi all,

I have a few guests that show this in the backup logs:
qmp command 'guest-fsfreeze-freeze' failed - got timeout

qmp command 'guest-fsfreeze-thaw' failed - got timeout

This is obviously dealing with the qemu guest agent, and the corresponding vss functionality (when dealing with a windows client in this case) - I notice in the source code:

Code:
if (!$timeout) {
    # hack: monitor sometime blocks
    if ($cmd->{execute} eq 'query-migrate') {
        $timeout = 60*60; # 1 hour
    } elsif ($cmd->{execute} =~ m/^(eject|change)/) {
        $timeout = 60; # note: cdrom mount command is slow
    } elsif ($cmd->{execute} eq 'guest-fsfreeze-freeze' ||
         $cmd->{execute} eq 'guest-fsfreeze-thaw') {
        $timeout = 10;
    } elsif ($cmd->{execute} eq 'savevm-start' ||
         $cmd->{execute} eq 'savevm-end' ||
         $cmd->{execute} eq 'query-backup' ||
         $cmd->{execute} eq 'query-block-jobs' ||
         $cmd->{execute} eq 'backup-cancel' ||
         $cmd->{execute} eq 'query-savevm' ||
         $cmd->{execute} eq 'delete-drive-snapshot' ||
         $cmd->{execute} eq 'guest-shutdown' ||
         $cmd->{execute} eq 'snapshot-drive'  ) {
        $timeout = 10*60; # 10 mins ?
    } else {
        $timeout = 3; # default
    }
    }

of qemu-server, the timeout is 10 seconds. normally I assume this would be a good time - these two machines happen to be WSUS servers so they have a lot of data and a lot of small files and I'm thinking the VSS functionality is just taking too long and timing out. Is there any way (either existing or for the future) to add in the .conf to make it possible to increase that timeout period? That way for machines that don't experience the problem, it would just use the default, but for those few cases, you could increase it if necessary without increasing it for all of the machines....

Right now I'm not even sure of the state of those VMs due to the two timeouts (freeze and thaw) right after each other where it may be in a state where it may have not properly thawed (assuming it ever finished freezing). For those two vms right now I'll have to turn off the guest agent functionality to prevent any future issues, but it'd be nice if one could extend the timeout if needed.

Hello,
which file is this, i cannot find it ? I'd like to increase timeout, have same problem with Windows2012+MSSQL vm.
Thank you
 
I think I experience the same issue:

Code:
INFO: starting new backup job: vzdump 1031 --mailnotification failure --maxfiles 14 --storage VM-Name --mode snapshot --quiet 1 --fullbackup 7 --compress gzip
INFO: doing differential backup against '/mnt/pve/VM-Name/dump/vzdump-qemu-1031-2017_01_12-00_00_02.vma.gz'
INFO: Starting Backup of VM 1031 (qemu)
INFO: status = running
INFO: update VM 1031: -lock backup
INFO: VM Name: VM-Name
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/VM-Name/dump/vzdump-qemu-1031-2017_01_12-00_00_02.vma.gz--differential-2017_01_18-00_00_03.vcdiff'
ERROR: VM 1031 qmp command 'guest-fsfreeze-freeze' failed - got timeout
ERROR: VM 1031 qmp command 'guest-fsfreeze-thaw' failed - unable to connect to VM 1031 qga socket - timeout after 101 retries
INFO: started backup task 'd8fa1c7d-d011-4e86-b71f-53132005df3f'
INFO: status: 0% (409206784/412316860416), sparse 0% (63393792), duration 3, 136/115 MB/s

Any progress/update on this? What information can I provide to help?

Packege Ver:

Code:
proxmox-ve: 4.3-66 (running kernel: 4.4.19-1-pve)
pve-manager: 4.3-3 (running version: 4.3-3/557191d3)
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-46
qemu-server: 4.0-91
pve-firmware: 1.1-9
libpve-common-perl: 4.0-75
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-66
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6.2-2
pve-container: 1.0-78
pve-firewall: 2.0-31
pve-ha-manager: 1.0-35
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.5-1
lxcfs: 2.0.4-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
zfsutils: 0.6.5.7-pve10~bpo80
openvswitch-switch: 2.5.0-1
 
it was bumped to 60min in qemu-server 4.0-98 , please update to the current version.
 
  • Like
Reactions: chrone
Hm, still seeing this with the current Proxmox version every now and then, any other ideas? Backups seem to succeed. This example was a Win10 workstation.

Code:
103: Jun 17 00:45:59 INFO: Starting Backup of VM 103 (qemu)
103: Jun 17 00:45:59 INFO: status = running
103: Jun 17 00:46:05 INFO: update VM 103: -lock backup
103: Jun 17 00:46:05 INFO: VM Name: charles.domain.tld
103: Jun 17 00:46:05 INFO: include disk 'virtio3' 'local:103/vm-103-disk-2.qcow2' 64G
103: Jun 17 00:46:07 INFO: backup mode: snapshot
103: Jun 17 00:46:07 INFO: ionice priority: 7
103: Jun 17 00:46:07 INFO: creating archive '/var/lib/vz/dump/vzdump-qemu-103-2017_06_17-00_45_59.vma'
103: Jun 17 01:46:07 ERROR: VM 103 qmp command 'guest-fsfreeze-freeze' failed - got timeout
103: Jun 17 01:46:17 ERROR: VM 103 qmp command 'guest-fsfreeze-thaw' failed - got timeout
103: Jun 17 01:46:17 INFO: started backup task 'eefbd13a-ea84-4d40-99c5-dcd02e6479a1'
103: Jun 17 01:46:20 INFO: status: 0% (456916992/68719476736), sparse 0% (97533952), duration 3, 152/119 MB/s
103: Jun 17 01:46:28 INFO: status: 1% (693043200/68719476736), sparse 0% (101101568), duration 11, 29/29 MB/s
...
103: Jun 17 02:36:36 INFO: status: 100% (68719476736/68719476736), sparse 1% (858796032), duration 3019, 20/20 MB/s
103: Jun 17 02:36:36 INFO: transferred 68719 MB in 3019 seconds (22 MB/s)
103: Jun 17 02:36:36 INFO: archive file size: 63.21GB
103: Jun 17 02:36:36 INFO: delete old backup '/var/lib/vz/dump/vzdump-qemu-103-2017_05_13-05_27_40.vma'
103: Jun 17 02:40:25 INFO: Finished Backup of VM 103 (01:54:26)
 
Same error occur with virtio-win-0.1.141 drivers on PVE 5.1-35
Windows as well as Linux VM's was involved.
Probably this is non drivers problem, because during same backup Linux VM's also got timeout. Those VM's uses different virtio drivers.

Details:

INFO: starting new backup job: vzdump 100 101 103 102 --storage backup1 --mode snapshot --quiet 1 --compress lzo --node pve1
INFO: Starting Backup of VM 100 (qemu)
INFO: status = running
INFO: update VM 100: -lock backup
INFO: VM Name: SRV1
INFO: include disk 'virtio0' 'local:100/vm-100-disk-1.raw' 50G
INFO: include disk 'virtio1' 'local:100/vm-100-disk-2.raw' 100G
INFO: include disk 'efidisk0' 'local:100/vm-100-disk-3.raw' 128K
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/backup-vm/pve/dump/vzdump-qemu-100-2017_11_09-23_00_01.vma.lzo'
ERROR: VM 100 qmp command 'guest-fsfreeze-freeze' failed - got timeout
ERROR: VM 100 qmp command 'guest-fsfreeze-thaw' failed - got timeout
INFO: started backup task '51043ddd-38e8-4161-b919-8c444939a098'
INFO: status: 2% (3470393344/161061404672), sparse 0% (302067712), duration 3, read/write 1156/1056 MB/s
INFO: status: 3% (4964220928/161061404672), sparse 0% (322658304), duration 9, read/write 248/245 MB/s
INFO: status: 4% (6507462656/161061404672), sparse 0% (334442496), duration 15, read/write 257/255 MB/s
INFO: status: 5% (8130527232/161061404672), sparse 0% (371150848), duration 22, read/write 231/226 MB/s
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!