SOLVED: Backup won't work while clients are active / qmp gets a timeout

furby

New Member
Nov 3, 2021
5
1
1
Warstein
Hello to everyone,

i am still testing Proxmox to use it as the main virtualizer in our small company. To give it a try, i have installed my 2 physical "servers" at home (debian machines with some services) as virtual machines on a more powerful pc with Proxmox 7 underneath (pveversion attachend). Both guests have the Qemu Guest Agent turned on with the "Guest-trim" option active. Now i am struggling with the backup of my guest-machines.

The documentation is very good and straight forward to understand. I used this site to setup the basics: Backup and Restore. I tried different backup methods (stop, suspend and snapshot) and use the main storage as the backup-target for now, but the result is always the same: The backup only works fine when the guests are down! When the guests are running, the backup-process tears them down and isn't able to start them up again, which i fear will end up with the loss of data when virtualizing productive servers.

I created a very small test-system (Debian 11 with a 4 GB Harddrive) and the backup works fine. To be as near as possible to our reallife-servers, i set up my home machines with 4 TB harddrives. Perhaps that's the reason why i get a timeout. How can i "borrow more time" for the process?

Here you can see what happens when the backup-process runs:
Code:
root@virt-base:~# vzdump 105 --compress 0 --mode snapshot --storage Daten --node virt-base
INFO: starting new backup job: vzdump 105 --storage Daten --compress 0 --node virt-base --mode snapshot
INFO: Starting Backup of VM 105 (qemu)
INFO: Backup started at 2021-11-03 04:40:57
INFO: status = running
INFO: VM Name: Test-Debian11-Docker
INFO: include disk 'scsi0' 'Daten:105/vm-105-disk-0.raw' 4T
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/Daten/dump/vzdump-qemu-105-2021_11_03-04_40_57.vma'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 105 qmp command 'guest-fsfreeze-thaw' failed - got timeout
INFO: started backup task 'b2439da6-8357-4e13-89c2-4088f9cc7a2c'
INFO: resuming VM again
ERROR: VM 105 qmp command 'cont' failed - got timeout
INFO: aborting backup job
ERROR: VM 105 qmp command 'backup-cancel' failed - client closed connection
INFO: resuming VM again
ERROR: Backup of VM 105 failed - VM 105 not running
INFO: Failed at 2021-11-03 04:41:29
INFO: Backup job finished with errors
job errors

Thanks for your help

furby
 

Attachments

INFO: starting new backup job: vzdump 105 --storage Daten --compress 0 --node virt-base --mode snapshot
...
INFO: include disk 'scsi0' 'Daten:105/vm-105-disk-0.raw' 4T

I might be wrong here, but to my knowledge raw disks don't support snapshot hence why they only backup when off. You can convert the raw disk to qcow2 (which supports snapshots) from pve ui with Move Disk button. Also make sure the guest has the qemu guest tools installed.
 
Last edited:
INFO: starting new backup job: vzdump 105 --storage Daten --compress 0 --node virt-base --mode snapshot
...
INFO: include disk 'scsi0' 'Daten:105/vm-105-disk-0.raw' 4T

I might be wrong here, but to my knowledge raw disks don't support snapshot hence why they only backup when off. You can convert the raw disk to qcow2 (which supports snapshots) from pve ui with Move Disk button. Also make sure the guest has the qemu guest tools installed.
Hello C.G.B. Spender,

I tried to convert the raw disks to the qcow2 format as you mentioned, but from the web frontend this was not possible. That reminded me, that i did something very stupid, when aiming at maximum stability: i created the storage-partition with the experimental btrfs filesystem! Perhaps that explains, why i can only create raw image harddrives, because the underlying filesystem looks after the snapshots and the real amount of storage-space used in the raw harddrives?

I'll first give the idea of fabian a try, but when this doesn't work, i will export my virtual machines an recreate the storage-partition with an official supported filesystem.

Thank you for your help

furby
 
Hello fabian,

you're right, my problem sounds similar to the bug you mentioned. As i already mentioned to C.G.B. Spender, i did something very stupid when aiming at stability: i used the btrfs for my storage-partition. But before recreating my storage-partition, i would like to try the suggestions you mentioned during the conversation in that bug report. Perhaps it helps you to get btrfs as storage-filesystem from the experimental state to stable in the future.
  1. My machines config looks like this:
    Code:
    agent: 1,fstrim_cloned_disks=1balloon: 2048
    boot: order=scsi0
    cores: 1
    ide2: none,media=cdrom
    lock: backup
    memory: 4096
    name: Test-Debian11-Docker
    numa: 0
    ostype: l26
    scsi0: Daten:105/vm-105-disk-0.raw,size=4T
    scsihw: virtio-scsi-pci
    smbios1: uuid=b3b15243-16fe-4e4c-a59b-63190401318b
    sockets: 1
    vmgenid: ad74810a-71d5-404c-8f9a-c753a64d8f30
  2. I updated the PVE/VZDump/QemuServer.pm manually with the different statements ("timeout = 45" and so on) and rebooted the node. Although there are still errors, the backup now starts and the virtual machine stays online. The output now looks like this:
    Code:
    root@virt-base:~# vzdump 105 --compress 0 --mode snapshot --storage Daten --node virt-baseINFO: starting new backup job: vzdump 105 --node virt-base --mode snapshot --storage Daten --compress 0
    INFO: Starting Backup of VM 105 (qemu)
    INFO: Backup started at 2021-11-04 03:58:13
    INFO: status = running
    INFO: VM Name: Test-Debian11-Docker
    INFO: include disk 'scsi0' 'Daten:105/vm-105-disk-0.raw' 4T
    INFO: backup mode: snapshot
    INFO: ionice priority: 7
    INFO: creating vzdump archive '/Daten/dump/vzdump-qemu-105-2021_11_04-03_58_13.vma'
    INFO: issuing guest-agent 'fs-freeze' command
    INFO: issuing guest-agent 'fs-thaw' command
    ERROR: VM 105 qmp command 'guest-fsfreeze-thaw' failed - got timeout
    INFO: started backup task 'aabb0547-115b-49a6-b2e5-4f2a2c08ae33'
    INFO: resuming VM again
    INFO:   0% (1.2 GiB of 4.0 TiB) in 12s, read: 105.7 MiB/s, write: 92.7 MiB/s
    ^CERROR: interrupted by signal
    INFO: aborting backup job
    ERROR: VM 105 qmp command 'backup-cancel' failed - client closed connection
    INFO: resuming VM again
    ERROR: Backup of VM 105 failed - VM 105 not running
    INFO: Failed at 2021-11-04 04:05:14
    INFO: Backup job finished with errors
    job errors
  3. The virtual machine stays active during the backup until i cancel the backup-process (or it finishes?).
  4. I restored the original QemuServer.pm, rebootet the node and the original error was there again. So the bugfix helped to start the backup, even when the virtual machine is torn down in the end
Do you have another suggestion what i can try before reformatting the storage-volume?

Thank you for your help

furby
 
Last edited:
INFO: starting new backup job: vzdump 105 --storage Daten --compress 0 --node virt-base --mode snapshot
...
INFO: include disk 'scsi0' 'Daten:105/vm-105-disk-0.raw' 4T

I might be wrong here, but to my knowledge raw disks don't support snapshot hence why they only backup when off. You can convert the raw disk to qcow2 (which supports snapshots) from pve ui with Move Disk button. Also make sure the guest has the qemu guest tools installed.
for VMs, 'snapshot' mode is not related to storage snapshots, so no, they work fine with raw images as well.
 
Hello fabian,

you're right, my problem sounds similar to the bug you mentioned. As i already mentioned to C.G.B. Spender, i did something very stupid when aiming at stability: i used the btrfs for my storage-partition. But before recreating my storage-partition, i would like to try the suggestions you mentioned during the conversation in that bug report. Perhaps it helps you to get btrfs as storage-filesystem from the experimental state to stable in the future.
  1. My machines config looks like this:
    Code:
    agent: 1,fstrim_cloned_disks=1balloon: 2048
    boot: order=scsi0
    cores: 1
    ide2: none,media=cdrom
    lock: backup
    memory: 4096
    name: Test-Debian11-Docker
    numa: 0
    ostype: l26
    scsi0: Daten:105/vm-105-disk-0.raw,size=4T
    scsihw: virtio-scsi-pci
    smbios1: uuid=b3b15243-16fe-4e4c-a59b-63190401318b
    sockets: 1
    vmgenid: ad74810a-71d5-404c-8f9a-c753a64d8f30
  2. I updated the PVE/VZDump/QemuServer.pm manually with the different statements ("timeout = 45" and so on) and rebooted the node. Although there are still errors, the backup now starts and the virtual machine stays online. The output now looks like this:
    Code:
    root@virt-base:~# vzdump 105 --compress 0 --mode snapshot --storage Daten --node virt-baseINFO: starting new backup job: vzdump 105 --node virt-base --mode snapshot --storage Daten --compress 0
    INFO: Starting Backup of VM 105 (qemu)
    INFO: Backup started at 2021-11-04 03:58:13
    INFO: status = running
    INFO: VM Name: Test-Debian11-Docker
    INFO: include disk 'scsi0' 'Daten:105/vm-105-disk-0.raw' 4T
    INFO: backup mode: snapshot
    INFO: ionice priority: 7
    INFO: creating vzdump archive '/Daten/dump/vzdump-qemu-105-2021_11_04-03_58_13.vma'
    INFO: issuing guest-agent 'fs-freeze' command
    INFO: issuing guest-agent 'fs-thaw' command
    ERROR: VM 105 qmp command 'guest-fsfreeze-thaw' failed - got timeout
    INFO: started backup task 'aabb0547-115b-49a6-b2e5-4f2a2c08ae33'
    INFO: resuming VM again
    INFO:   0% (1.2 GiB of 4.0 TiB) in 12s, read: 105.7 MiB/s, write: 92.7 MiB/s
    ^CERROR: interrupted by signal
    INFO: aborting backup job
    ERROR: VM 105 qmp command 'backup-cancel' failed - client closed connection
    INFO: resuming VM again
    ERROR: Backup of VM 105 failed - VM 105 not running
    INFO: Failed at 2021-11-04 04:05:14
    INFO: Backup job finished with errors
    job errors
  3. The virtual machine stays active during the backup until i cancel the backup-process (or it finishes?).
  4. I restored the original QemuServer.pm, rebootet the node and the original error was there again. So the bugfix helped to start the backup, even when the virtual machine is torn down in the end
Do you have another suggestion what i can try before reformatting the storage-volume?

Thank you for your help

furby
it looks to me like there still is some issue that needs to be investigated in qemu. if the VM continues to run normally despite the 'thaw' timing out, then you don't need to cancel the backup but can attempt to let it finish.
 
it looks to me like there still is some issue that needs to be investigated in qemu. if the VM continues to run normally despite the 'thaw' timing out, then you don't need to cancel the backup but can attempt to let it finish.
Thank you all. I'll give it a try and report back.
 
After testing for some days, the bugfix works for me. It doesn't matter if i do a snapshot-backup or a stopped or suspended one, the guest machines now start and stop as they should do and are not torn down. Thank you all for your help.
 
  • Like
Reactions: fabian

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!