ERROR: VM 110 qmp command 'guest-fsfreeze-freeze' failed - got timeout

Lenu

Active Member
Jan 14, 2019
8
0
41
33
Hi,

we often have VM / Kernel hung task / panik things with our one VM.
Does anyone has an Idea?

Backup Log:
Code:
110: 2019-01-07 00:01:57 INFO: Starting Backup of VM 110 (qemu)
110: 2019-01-07 00:01:57 INFO: status = running
110: 2019-01-07 00:01:57 INFO: update VM 110: -lock backup
110: 2019-01-07 00:01:57 INFO: VM Name: hostname
110: 2019-01-07 00:01:57 INFO: include disk 'scsi0' 'nvme:110/vm-110-disk-0.qcow2' 60G
110: 2019-01-07 00:01:57 INFO: backup mode: snapshot
110: 2019-01-07 00:01:57 INFO: ionice priority: 7
110: 2019-01-07 00:01:57 INFO: creating archive '/mnt/pve/Backup_Server/dump/vzdump-qemu-110-2019_01_07-00_01_57.vma.lzo'
110: 2019-01-07 01:01:57 ERROR: VM 110 qmp command 'guest-fsfreeze-freeze' failed - got timeout
110: 2019-01-07 01:01:58 INFO: started backup task '1bada98d-4884-494d-9a9b-6595ec87f908'
110: 2019-01-07 01:02:01 INFO: status: 2% (1702887424/64424509440), sparse 1% (993382400), duration 3, read/write 567/236 MB/s
110: 2019-01-07 01:02:04 INFO: status: 3% (2235039744/64424509440), sparse 1% (995135488), duration 6, read/write 177/176 MB/s
110: 2019-01-07 01:02:09 INFO: status: 4% (2816606208/64424509440), sparse 1% (1003827200), duration 11, read/write 116/114 MB/s
110: 2019-01-07 01:02:12 INFO: status: 5% (3637182464/64424509440), sparse 1% (1006587904), duration 14, read/write 273/272 MB/s
110: 2019-01-07 01:02:15 INFO: status: 6% (4425973760/64424509440), sparse 1% (1009418240), duration 17, read/write 262/261 MB/s
.....

Email with subject "[abrt] : Kernel panic - not syncing: hung_task: blocked tasks":
Code:
reason:         Kernel panic - not syncing: hung_task: blocked tasks
component:      kernel
hostname:       xxx.hostname.com
count:          1
analyzer:       vmcore
architecture:   x86_64
event_log:   
kernel:         3.10.0-962.3.2.lve1.5.24.7.el7.x86_64
kernel_tainted_long: O - Out-of-tree module has been loaded.
kernel_tainted_short: GO
last_occurrence: 1546819513
os_release:     CloudLinux release 7.6 (Vladimir Lyakhov)
runlevel:       N 3
time:           Mon 07 Jan 2019 01:05:13 CET
type:           vmcore
uid:            0
username:       root
uuid:           5b37bc38aeb82309605dff75d34f96fd7306f37e

backtrace:
:Kernel panic - not syncing: hung_task: blocked tasks
:CPU: 1 PID: 21 Comm: khungtaskd ve: 0 Kdump: loaded Tainted: G           O   ------------   3.10.0-962.3.2.lve1.5.24.7.el7.x86_64 #1 61.16
:Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
:Call Trace:
: [<ffffffff83f2872d>] dump_stack+0x19/0x1b
: [<ffffffff83f22f6d>] panic+0xe8/0x21f
: [<ffffffff8395399e>] watchdog+0x26e/0x2c0
: [<ffffffff83953730>] ? reset_hung_task_detector+0x20/0x20
: [<ffffffff838bf701>] kthread+0xd1/0xe0
: [<ffffffff838bf630>] ? create_kthread+0x60/0x60
: [<ffffffff83f3a677>] ret_from_fork_nospec_begin+0x21/0x21
: [<ffffffff838bf630>] ? create_kthread+0x60/0x60
machineid:
:systemd=02956c2999504428ac7a94e90f0b6386
:sosreport_uploader-dmidecode=3ce185c3ba2e23cc941959203608a05db9b77422cdaa5ee3ebc792281f466b78
not-reportable:
:A kernel problem occurred, but your kernel has been tainted (flags:GO). Explanation:
:O - Out-of-tree module has been loaded.
:Kernel maintainers are unable to diagnose tainted reports.

This was a newly created VM with Proxmox and we installed the qemu guest agent without any modification.

Cloudlinux Support says, that "the issue is that FS is being frozen by the qemu agent".
 
My guess on what happens here is that, host issues FS freeze to guest, guest does it, but does not reply back in time (
110: 2019-01-07 01:01:57 ERROR: VM 110 qmp command 'guest-fsfreeze-freeze' failed - got timeout), host thinks it was not successful, leaving the VM in locked state - never unfreezing the filesystem. Then you have to hard reboot or unlock the guest filesystem manually.

In my experience this usually happens when the host is starved of storage IO.
You can probably stop the freezes, if you disable backup for this guest.
 
thx! This System is new, only three VMs running on NVMe attached Storage. the other VMs (also with Plesk and CloudLinux) have no such Problems. This (also newly created VM) has Problems ... We didn't made any Changes nor is something different compared to other VMs. Just normal CL + Plesk Setup ..

What do you mean with "disabling backup for this guest" ? You mean Backup disable for this VM? I need Backup.
Turning something off is not a Solution in my honest opinion.
 
To solve your problems, you must find the reason why it does not unfreeze the FS.
I already told you what my guess is, now it is up to you.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!