ERROR: VM 100 qmp command 'guest-fsfreeze-thaw' failed - got timeout

hawk128

Member
May 22, 2017
17
0
6
37
Hi all,

I upgraded some clusters a week ago from test repository.
After that I have regularr stuck of some different Windows guests.

Code:
INFO: Starting Backup of VM 100 (qemu)
INFO: Backup started at 2020-04-05 04:00:02
INFO: status = running
INFO: VM Name: sexp-win10
INFO: include disk 'scsi0' 'mm-2-1_vm:vm-100-disk-0' 128G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/pve4-nfs/dump/vzdump-qemu-100-2020_04_05-04_00_02.vma.lzo'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 100 qmp command 'guest-fsfreeze-thaw' failed - got timeout
ERROR: got timeout
ERROR: Backup of VM 100 failed - got timeout
INFO: Failed at 2020-04-05 04:00:48

This error different from similar with description on the Internet. A guest can not unfreeze FS and stuck. Only Stop/Start VM helps.

Hosts are fast. Load is low.

Any ideas?
 

hawk128

Member
May 22, 2017
17
0
6
37
This happens often enough.
Not only on cluster but on single hosts also.
It is annoying to restart some Windows VM every morning after night backups...

Any ideas?
 

Progratron

Member
Feb 27, 2019
31
2
8
37
Is it happening with pve 6.1? For me it was gone with that...

BTW You don't need to restart VMs when they are stuck on backups: qm unlock 100
 

hawk128

Member
May 22, 2017
17
0
6
37
Yes. I am on Virtual Environment 6.1-8.
qm unlock 100 helps when it is locked.
In this case I can not connect or see anything via qm or console.
It looks like bug in qemu-kvm...
 

TeTeHacko

New Member
Apr 14, 2020
3
0
1
Hi, after update all backuped VMs failed with same error:

Code:
INFO: starting new backup job: vzdump --compress lzo --mode snapshot --pool *** --node stor --storage zfs_dir --mailto *** --mailnotification failure --quiet 1
INFO: skip external VMs: 102, 119, 103
INFO: Starting Backup of VM 114 (qemu)
INFO: Backup started at 2020-04-14 05:30:02
INFO: status = running
INFO: VM Name: ***
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/zfs_dir/dump/vzdump-qemu-114-2020_04_14-05_30_02.vma.lzo'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 114 not running
ERROR: client closed connection
ERROR: Backup of VM 114 failed - client closed connection
INFO: Failed at 2020-04-14 05:30:02
INFO: Starting Backup of VM 120 (qemu)
INFO: Backup started at 2020-04-14 05:30:02
INFO: status = running
INFO: VM Name: ***
INFO: include disk 'scsi1' 'zpool:vm-120-disk-0' 2G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/zfs_dir/dump/vzdump-qemu-120-2020_04_14-05_30_02.vma.lzo'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 120 not running
ERROR: client closed connection
ERROR: Backup of VM 120 failed - client closed connection
INFO: Failed at 2020-04-14 05:30:02
INFO: Backup job finished with errors

TASK ERROR: job errors

Code:
root@athos.rfa.cz ~ # pveversion --verbose
proxmox-ve: 6.1-2 (running kernel: 5.4.27-1-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-5.4: 6.1-8
pve-kernel-helper: 6.1-8
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.27-1-pve: 5.4.27-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
ceph: 14.2.8-pve1
ceph-fuse: 14.2.8-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksmtuned: 4.20150325+b1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-19
libpve-guest-common-perl: 3.0-6
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.0-2
lxcfs: 4.0.2-pve1
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-5
pve-container: 3.1-1
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-7
pve-ha-manager: 3.0-9
pve-i18n: 2.0-5
pve-qemu-kvm: 4.2.0-1
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-13
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

as workaround i setup HA policy to restart VM after failure. Is there any other logs to look? how can i enable some debug logging? It start few days ago after update to shis versions, i think its not kernel related beacuse i test 5.3 and 5.4. VM storage is ZFS, backup storage is directory on another ZFS.

After some testing, it seems its related to IO thread, when enabled backup fails, when disabled all is working as expected.
 
Last edited:

tom

Proxmox Staff Member
Staff member
Aug 29, 2006
14,955
718
133
Hi, after update all backuped VMs failed with same error:

A known issue - please note, you installed packages from pvetest (beta repo).

=> downgrade your pve-qemu-kvm to latest 4.1.x
 

TeTeHacko

New Member
Apr 14, 2020
3
0
1
Hm. I do not use IO thread.
But my error is slightly different.
i see you got timeout, i got error immediately (the VM crash), maybe yours issue is related to qemu-guest-utils? I dont have windows all is only linux VMs.
 

TeTeHacko

New Member
Apr 14, 2020
3
0
1
Last edited:

DerDanilo

Well-Known Member
Jan 21, 2017
402
67
48
This issue still seems to exist somehow. We have some VMs or services within VMs crashing during backup with stuck kernel tasks.

- PBS is running the stable non-subscription release:
Code:
proxmox-backup: 1.0-4 (running kernel: 5.4.65-1-pve)
proxmox-backup-server: 1.0.5-1 (running version: 1.0.5)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.44-2-pve: 5.4.44-2
ifupdown2: 3.0.0-1+pve3
libjs-extjs: 6.0.1-10
proxmox-backup-docs: 1.0.4-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-xtermjs: 4.7.0-3
smartmontools: 7.1-pve2
zfsutils-linux: 0.8.5-pve1


- VMs were shutdown and started to utilize the current qemu version and allow proper PBS backups

Code:
INFO: Starting Backup of VM 7000 (qemu)
INFO: Backup started at 2020-11-27 21:10:16
INFO: status = running
INFO: VM Name: drdemon7000
INFO: include disk 'scsi0' 'NVME:vm-7000-disk-0' 64G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/7000/2020-11-27T20:10:16Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 7000 qmp command 'guest-fsfreeze-thaw' failed - got timeout
ERROR: VM 7000 qmp command 'backup' failed - got timeout
ERROR: Backup of VM 7000 failed - VM 7000 qmp command 'backup' failed - got timeout
INFO: Failed at 2020-11-27 21:11:26

The qmp/agent command seems to cause some issues within the VM. In this example it's a mysqld service that was blocked to a timeout.

Code:
Nov 27 21:16:02 vm7000 kernel: [5091174.201310] INFO: task mysqld:20402 blocked for more than 120 seconds.
Nov 27 21:16:02 vm7000 kernel: [5091174.201936] Not tainted 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2
Nov 27 21:16:02 vm7000 kernel: [5091174.202603] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.


All Nodes are currntly running 6.2.x. Upgrade to 6.3 is planned within this week. The qemu agent package on all vms will be updates this week as well.

Code:
()
proxmox-ve: 6.2-1 (running kernel: 5.4.60-1-pve)
pve-manager: 6.2-11 (running version: 6.2-11/22fb4983)
pve-kernel-5.4: 6.2-6
pve-kernel-helper: 6.2-6
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
ceph: 14.2.12-1
ceph-fuse: 14.2.11-1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: not correctly installed
ifupdown2: 3.0.0-1+pve2
ksmtuned: 4.20150325+b1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-1
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-6
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-12
pve-cluster: 6.1-8
pve-container: 3.1-13
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-1
pve-qemu-kvm: 5.0.0-13
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-14
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve1

@tom
Is there something that one should do to avoid/fix this erros?
 
Last edited:

Blaiserman

Member
Mar 16, 2019
8
1
8
29
same like @DerDanilo on 6.3-on

Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.44-2-pve)
pve-manager: 6.3-3 (running version: 6.3-3/eee5f901)
pve-kernel-5.4: 6.3-3
pve-kernel-helper: 6.3-3
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.78-1-pve: 5.4.78-1
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph: 14.2.16-pve1
ceph-fuse: 14.2.16-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.7
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-2
libpve-guest-common-perl: 3.1-4
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-4
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.0-1
proxmox-backup-client: 1.0.6-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-2
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-8
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
Last edited:

DerDanilo

Well-Known Member
Jan 21, 2017
402
67
48
I am not sure that this is the root cause but it seems that if the PBS is busy that taking snapshots for VMs somehow takes longer and therefore kills some processes up until applications or the VM itself freezes. This should not be happening in any case.
I am writing this because we upgraded the hardware of the most problematic PBS to a lot more CPU Power and RAM. These kind of problems did only occure once in a while with that setup (for now).

Still waiting for official feedback on how to make PBS not slow down VMs if it needs longer.
 
  • Like
Reactions: Blaiserman
Hi,
I'm having problems like these with a just updated 3 nodes Proxmox/Ceph cluster and the last version of Proxmox Backup Server.

It seems that with virtual machines that have the QEMU Guest Agent option enabled, when I make an snapshot backup over a PBS storage, I get the next errors and the virtual machine freezes:
Code:
...
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 100 qmp command 'guest-fsfreeze-thaw' failed - got timeout
...
 
As you can see in these screenshots, the last fault backups have a size of 1B, and in the PBS are running endless.
I can't remove these 1B backups and I can't stop the endless running in the PBS. Even if I restart the PBS the backup tasks are still there.

Captura de pantalla 2021-01-31 a las 16.10.53.png

Captura de pantalla 2021-01-31 a las 16.10.27.png
 
Last edited:
It seems that in my case the problems where due the PBS version.
My version was 1.0.1 and the last release is 1.0.6.

After upgrade to the last PBS release I can remove the bad 1Byte virtual machines backups and it seems that I can make backups without more problems.
 

cglmicro

New Member
Oct 12, 2020
5
0
1
48
Mine was solved with QEMU GUEST AGENT set to DISABLE.
What is the impact of having this setting to DISABLE?
Thanks.
 

DerDanilo

Well-Known Member
Jan 21, 2017
402
67
48
If the VM runs any databases it is recommended to use the agent and tell the DBs that there will be a short system freeze. Otherwise the DBs could get messed up. Though I only have had this issue with windows VMs so far. But then again heavily used DBs will experience issues in any system.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!