Windows Guest hangs during Backup

poxin

Well-Known Member
Jun 27, 2017
70
6
48
I've installed a Windows Server 2016 VM and followed the instructions to a T for enabling the additional drivers.

QEMU agent is running and responding okay, as is Ballooning service.

I'm having an issue where the guest will go completely unresponsive in the console, RDP, and even to ping requests when the server is being backed up. I've tried both VirtIO and SCSI for the storage bus, no difference.

Code:
INFO: starting new backup job: vzdump 105 --remove 0 --compress 0 --storage nfs1-backup --node vs100 --mode snapshot
INFO: Starting Backup of VM 105 (qemu)
INFO: status = running
INFO: update VM 105: -lock backup
INFO: VM Name: wintestvm
INFO: include disk 'scsi0' 'local-zfs:vm-105-disk-1' 30G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating archive '/mnt/pve/nfs1-backup/dump/vzdump-qemu-105-2017_08_17-15_59_34.vma'
ERROR: VM 105 qmp command 'guest-fsfreeze-thaw' failed - got timeout
INFO: started backup task 'a181514b-b5db-4e8e-a4f7-9e57ff8da94c'
INFO: status: 100% (32212254720/32212254720), sparse 68% (22117900288), duration 126, read/write 816/0 MB/s
INFO: transferred 32212 MB in 126 seconds (255 MB/s)
INFO: archive file size: 9.41GB
INFO: Finished Backup of VM 105 (00:02:19)
INFO: Backup job finished successfully
TASK OK

I do see the Error 'ERROR: VM 105 qmp command 'guest-fsfreeze-thaw' failed - got timeout', not sure if this is an issue. The backup seem to finish okay.

IO load on the host also looks fine during backup. The node has 128gb ecc ram and a 12 core intel xeon and is barely used.
 
Did you also install the qemu drivers?
Code:
vssadmin list providers
Do you see an "QEMU Guest Agent VSS Provider"?
 
Code:
>sc query "BalloonService" | find "RUNNING"
        STATE              : 4  RUNNING

>sc query "QEMU-GA" | find "RUNNING"
        STATE              : 4  RUNNING

>sc query "QEMU Guest Agent VSS Provider" | find "RUNNING"
        STATE              : 4  RUNNING

>vssadmin list providers
vssadmin 1.1 - Volume Shadow Copy Service administrative command-line tool
(C) Copyright 2001-2013 Microsoft Corp.

Provider name: 'QEMU Guest Agent VSS Provider'
   Provider type: Software
   Provider Id: {3629d4ed-ee09-4e0e-9a5c-6d8ba2872aef}
   Version: 0.12.1

This looks all okay to me. I'm not sure what else to check.
 
Did you check with qm agent <vmid> ping, if the communication is with the agent inside is working?
 
Yes, that's responding okay from the node - anything else I can check? The guest itself seems to be running fine, issuing shutdown from proxmox works okay so QEMU is working okay there too.
 
qemu runs into a timeout when fsfreeze is called, maybe the Windows event log shows something. Is the VSS activated in Windows?
 
Seems this is on the right track, still investigating. I appreciate the help thus far.

Odd since the services are all running Local System, which has full access.
I even added 'Network Service' to the COM Security

Code:
Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface.  hr = 0x80070005, Access is denied.
. This is often caused by incorrect security settings in either the writer or requestor process.

Operation:
   Gathering Writer Data

Context:
   Writer Class Id: {e8132975-6f93-4464-a53e-1050253ae220}
   Writer Name: System Writer
   Writer Instance ID: {9a88ad08-e083-47ba-a597-a22f26ddac8a}
 
Last edited:
Was able to solve that permission error. Now upon backup there is nothing in the Event Logs windows wise but still having the same lockup problem. We have about 350 windows guests to move over, but may have to just leave those in hyper-v.
 
Just to double check the obvious, the "Qemu Agent" checkbox is checked under proxmox gui>VM105>under options?

Also what happens when you type the command "vssadmin list writers"? Do the last 2 lines say "stable"
and "no error". The link below said to try windows server backup if vss gives errors. And if you get "access denied", it could be a permissions error.

https://kb.vmware.com/selfservice/m...nguage=en_US&cmd=displayKC&externalId=1007696

If you get access denied message when running windows server backup using vss, the solution was:

"you do not have permission to access files under GLOBALROOT\Device\HarddiskVolumeShadowCopy3\VM1 folder. Try to take ownership and add your current account to Security tab and give Full Control permission to see the result."

https://social.technet.microsoft.co...-windows-2008-server-sp2?forum=winserverfiles

Hope this helps
 
Is it working, without the qemu agent? Latest virtio drivers (0.1.141) installed?
 
Qemu Agent is checked in proxmox gui. 'vssadmin list writers' output is okay.

I'll try with latest virtio drivers, I was using the stable branch.
 
Qemu Agent is checked in proxmox gui. 'vssadmin list writers' output is okay.

I'll try with latest virtio drivers, I was using the stable branch.

Make sure to full backup or snapshot the VM before changing VIRTIO drivers. I had a case where I forgot to backup, and the VM wouldn't start due to combo of beta VIRTIO drivers / PVEtest repository. Luckily it was just a test VM.
 
Got around to trying this again. Still not resolved using the latest VirtIO drivers. There are no errors in the Windows Event Log, VirtIO is up to date, QEMU installed and all services running for that, ballooning, vss, etc. Made sure that VSS has permissions and no access denied errors.

When I initiate a backup the guest completely freezes and stops responding until the backup is complete. Ping to the server even stops. This is a large problem if we ever hope to move our Hyper-V infrastructure to Proxmox. Currently over 500 machines at latest count.

Code:
INFO: creating archive '/mnt/pve/nfs1-backup/dump/vzdump-qemu-106-2017_08_25-13_28_37.vma'
ERROR: VM 106 qmp command 'guest-fsfreeze-thaw' failed - got timeout
INFO: started backup task 'f2df9063-37d8-44e7-bf3d-aea27652b2cb'

Disabling QEMU for testing produces nearly the same result:
Code:
INFO: creating archive '/mnt/pve/nfs1-backup/dump/vzdump-qemu-106-2017_08_25-13_35_12.vma'
INFO: started backup task '3276b621-86c0-4305-a106-7cf6c16df193'

There's no error regarding 'guest-fsfreeze-thaw' but the Windows Guest still goes unresponsive until backup is done.

I've also tried disabling ballooning in proxmox and unregistering from windows, and using host for CPU setting. No fix there either.
 
Last edited:
I just noticed something, are you doing backup to an NFS target hosted outside the server? I just did a proxmox full backup (snapshot) test on an old XP VM (40GB) on a server (with local lvm storage). And while backup is running, I can remote inside the VM and everything seems normal (ping times 1ms, open files & folders,start menu responsive,open programs). What OS is your NFS target using?
 
How is your local system behaving during this backup operation? Any signs of slow storage, full RAM, etc.
 
I am backing up to an NFS server running Centos 7.3, is that the cause of the issue? @Alwin, the local system, is this referring to the host node? It's behaving normally during backup.
 
If its not too much trouble, can you setup a Samba share on that CentOS target, add to ProxMox storage and retest and post results?
 
What is behaving normally? I suspect that either your local or your nfs storage is "slow" (or maybe just a config issue) while the backup is running and bringing down the VM.

Can you please post the VM config "qm config <vmid>" and how your "/etc/pve/storage.cfg" looks like?
 
Code:
~# qm config 106
agent: 1
balloon: 1
bootdisk: virtio0
cores: 4
memory: 4096
name: wintest
net0: virtio=A6:F7:3A:AF:ED:5E,bridge=vmbr1
numa: 0
ostype: win10
scsihw: virtio-scsi-pci
smbios1: uuid=b0297f0f-1436-4e79-b59f-9640467b94f9
sockets: 1
virtio0: local-zfs:vm-106-disk-1,discard=on,size=30G

/etc/pve/storage.cfg
Code:
nfs: nfs1-backup
    export /mnt/powervault_01/backup
    path /mnt/pve/nfs1-backup
    server 10.100.0.131
    content backup
    maxfiles 5
    options vers=3

/etc/exports from centos 7 nfs server which is 10GbE, 32 GB RAM, 12 core Xeon
Code:
/mnt/powervault_01/backup    10.100.0.0/16(rw,sync,no_root_squash)
 
How is your local-zfs storage setup? Please also post that part. How is your write performance onto the nfs share (eg. with dd) and your local-zfs? Also try to use lzo compression, as it might speed things up, as without compression every 0 goes over the line too.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!