SOLVED: Backup won't work while clients are active / qmp gets a timeout

furby · Nov 3, 2021

Hello to everyone,

i am still testing Proxmox to use it as the main virtualizer in our small company. To give it a try, i have installed my 2 physical "servers" at home (debian machines with some services) as virtual machines on a more powerful pc with Proxmox 7 underneath (pveversion attachend). Both guests have the Qemu Guest Agent turned on with the "Guest-trim" option active. Now i am struggling with the backup of my guest-machines.

The documentation is very good and straight forward to understand. I used this site to setup the basics: Backup and Restore. I tried different backup methods (stop, suspend and snapshot) and use the main storage as the backup-target for now, but the result is always the same: The backup only works fine when the guests are down! When the guests are running, the backup-process tears them down and isn't able to start them up again, which i fear will end up with the loss of data when virtualizing productive servers.

I created a very small test-system (Debian 11 with a 4 GB Harddrive) and the backup works fine. To be as near as possible to our reallife-servers, i set up my home machines with 4 TB harddrives. Perhaps that's the reason why i get a timeout. How can i "borrow more time" for the process?

Here you can see what happens when the backup-process runs:

Code:

root@virt-base:~# vzdump 105 --compress 0 --mode snapshot --storage Daten --node virt-base
INFO: starting new backup job: vzdump 105 --storage Daten --compress 0 --node virt-base --mode snapshot
INFO: Starting Backup of VM 105 (qemu)
INFO: Backup started at 2021-11-03 04:40:57
INFO: status = running
INFO: VM Name: Test-Debian11-Docker
INFO: include disk 'scsi0' 'Daten:105/vm-105-disk-0.raw' 4T
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/Daten/dump/vzdump-qemu-105-2021_11_03-04_40_57.vma'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 105 qmp command 'guest-fsfreeze-thaw' failed - got timeout
INFO: started backup task 'b2439da6-8357-4e13-89c2-4088f9cc7a2c'
INFO: resuming VM again
ERROR: VM 105 qmp command 'cont' failed - got timeout
INFO: aborting backup job
ERROR: VM 105 qmp command 'backup-cancel' failed - client closed connection
INFO: resuming VM again
ERROR: Backup of VM 105 failed - VM 105 not running
INFO: Failed at 2021-11-03 04:41:29
INFO: Backup job finished with errors
job errors

Thanks for your help

furby

C.G.B. Spender · Nov 3, 2021

INFO: starting new backup job: vzdump 105 --storage Daten --compress 0 --node virt-base --mode snapshot
...
INFO: include disk 'scsi0' 'Daten:105/vm-105-disk-0.raw' 4T

I might be wrong here, but to my knowledge raw disks don't support snapshot hence why they only backup when off. You can convert the raw disk to qcow2 (which supports snapshots) from pve ui with Move Disk button. Also make sure the guest has the qemu guest tools installed.

fabian · Nov 3, 2021

sounds similar to https://bugzilla.proxmox.com/show_bug.cgi?id=3693

furby · Nov 4, 2021

C.G.B. Spender said:
INFO: starting new backup job: vzdump 105 --storage Daten --compress 0 --node virt-base --mode snapshot
...
INFO: include disk 'scsi0' 'Daten:105/vm-105-disk-0.raw' 4T

I might be wrong here, but to my knowledge raw disks don't support snapshot hence why they only backup when off. You can convert the raw disk to qcow2 (which supports snapshots) from pve ui with Move Disk button. Also make sure the guest has the qemu guest tools installed.

Hello C.G.B. Spender,

I tried to convert the raw disks to the qcow2 format as you mentioned, but from the web frontend this was not possible. That reminded me, that i did something very stupid, when aiming at maximum stability: i created the storage-partition with the experimental btrfs filesystem! Perhaps that explains, why i can only create raw image harddrives, because the underlying filesystem looks after the snapshots and the real amount of storage-space used in the raw harddrives?

I'll first give the idea of fabian a try, but when this doesn't work, i will export my virtual machines an recreate the storage-partition with an official supported filesystem.

Thank you for your help

furby

furby · Nov 4, 2021

fabian said:
sounds similar to https://bugzilla.proxmox.com/show_bug.cgi?id=3693

Hello fabian,

you're right, my problem sounds similar to the bug you mentioned. As i already mentioned to C.G.B. Spender, i did something very stupid when aiming at stability: i used the btrfs for my storage-partition. But before recreating my storage-partition, i would like to try the suggestions you mentioned during the conversation in that bug report. Perhaps it helps you to get btrfs as storage-filesystem from the experimental state to stable in the future.

My machines config looks like this:

Code:

agent: 1,fstrim_cloned_disks=1balloon: 2048
boot: order=scsi0
cores: 1
ide2: none,media=cdrom
lock: backup
memory: 4096
name: Test-Debian11-Docker
numa: 0
ostype: l26
scsi0: Daten:105/vm-105-disk-0.raw,size=4T
scsihw: virtio-scsi-pci
smbios1: uuid=b3b15243-16fe-4e4c-a59b-63190401318b
sockets: 1
vmgenid: ad74810a-71d5-404c-8f9a-c753a64d8f30

I updated the PVE/VZDump/QemuServer.pm manually with the different statements ("timeout = 45" and so on) and rebooted the node. Although there are still errors, the backup now starts and the virtual machine stays online. The output now looks like this:

Code:

root@virt-base:~# vzdump 105 --compress 0 --mode snapshot --storage Daten --node virt-baseINFO: starting new backup job: vzdump 105 --node virt-base --mode snapshot --storage Daten --compress 0
INFO: Starting Backup of VM 105 (qemu)
INFO: Backup started at 2021-11-04 03:58:13
INFO: status = running
INFO: VM Name: Test-Debian11-Docker
INFO: include disk 'scsi0' 'Daten:105/vm-105-disk-0.raw' 4T
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/Daten/dump/vzdump-qemu-105-2021_11_04-03_58_13.vma'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
ERROR: VM 105 qmp command 'guest-fsfreeze-thaw' failed - got timeout
INFO: started backup task 'aabb0547-115b-49a6-b2e5-4f2a2c08ae33'
INFO: resuming VM again
INFO:   0% (1.2 GiB of 4.0 TiB) in 12s, read: 105.7 MiB/s, write: 92.7 MiB/s
^CERROR: interrupted by signal
INFO: aborting backup job
ERROR: VM 105 qmp command 'backup-cancel' failed - client closed connection
INFO: resuming VM again
ERROR: Backup of VM 105 failed - VM 105 not running
INFO: Failed at 2021-11-04 04:05:14
INFO: Backup job finished with errors
job errors

The virtual machine stays active during the backup until i cancel the backup-process (or it finishes?).
I restored the original QemuServer.pm, rebootet the node and the original error was there again. So the bugfix helped to start the backup, even when the virtual machine is torn down in the end

Do you have another suggestion what i can try before reformatting the storage-volume?

Thank you for your help

furby

fabian · Nov 4, 2021

C.G.B. Spender said:
INFO: starting new backup job: vzdump 105 --storage Daten --compress 0 --node virt-base --mode snapshot
...
INFO: include disk 'scsi0' 'Daten:105/vm-105-disk-0.raw' 4T

I might be wrong here, but to my knowledge raw disks don't support snapshot hence why they only backup when off. You can convert the raw disk to qcow2 (which supports snapshots) from pve ui with Move Disk button. Also make sure the guest has the qemu guest tools installed.

for VMs, 'snapshot' mode is not related to storage snapshots, so no, they work fine with raw images as well.

fabian · Nov 4, 2021

furby said:
Hello fabian,

you're right, my problem sounds similar to the bug you mentioned. As i already mentioned to C.G.B. Spender, i did something very stupid when aiming at stability: i used the btrfs for my storage-partition. But before recreating my storage-partition, i would like to try the suggestions you mentioned during the conversation in that bug report. Perhaps it helps you to get btrfs as storage-filesystem from the experimental state to stable in the future.

My machines config looks like this:

Code:

agent: 1,fstrim_cloned_disks=1balloon: 2048 boot: order=scsi0 cores: 1 ide2: none,media=cdrom lock: backup memory: 4096 name: Test-Debian11-Docker numa: 0 ostype: l26 scsi0: Daten:105/vm-105-disk-0.raw,size=4T scsihw: virtio-scsi-pci smbios1: uuid=b3b15243-16fe-4e4c-a59b-63190401318b sockets: 1 vmgenid: ad74810a-71d5-404c-8f9a-c753a64d8f30

I updated the PVE/VZDump/QemuServer.pm manually with the different statements ("timeout = 45" and so on) and rebooted the node. Although there are still errors, the backup now starts and the virtual machine stays online. The output now looks like this:

Code:

root@virt-base:~# vzdump 105 --compress 0 --mode snapshot --storage Daten --node virt-baseINFO: starting new backup job: vzdump 105 --node virt-base --mode snapshot --storage Daten --compress 0 INFO: Starting Backup of VM 105 (qemu) INFO: Backup started at 2021-11-04 03:58:13 INFO: status = running INFO: VM Name: Test-Debian11-Docker INFO: include disk 'scsi0' 'Daten:105/vm-105-disk-0.raw' 4T INFO: backup mode: snapshot INFO: ionice priority: 7 INFO: creating vzdump archive '/Daten/dump/vzdump-qemu-105-2021_11_04-03_58_13.vma' INFO: issuing guest-agent 'fs-freeze' command INFO: issuing guest-agent 'fs-thaw' command ERROR: VM 105 qmp command 'guest-fsfreeze-thaw' failed - got timeout INFO: started backup task 'aabb0547-115b-49a6-b2e5-4f2a2c08ae33' INFO: resuming VM again INFO: 0% (1.2 GiB of 4.0 TiB) in 12s, read: 105.7 MiB/s, write: 92.7 MiB/s ^CERROR: interrupted by signal INFO: aborting backup job ERROR: VM 105 qmp command 'backup-cancel' failed - client closed connection INFO: resuming VM again ERROR: Backup of VM 105 failed - VM 105 not running INFO: Failed at 2021-11-04 04:05:14 INFO: Backup job finished with errors job errors

The virtual machine stays active during the backup until i cancel the backup-process (or it finishes?).

I restored the original QemuServer.pm, rebootet the node and the original error was there again. So the bugfix helped to start the backup, even when the virtual machine is torn down in the end

Do you have another suggestion what i can try before reformatting the storage-volume?

Thank you for your help

furby

it looks to me like there still is some issue that needs to be investigated in qemu. if the VM continues to run normally despite the 'thaw' timing out, then you don't need to cancel the backup but can attempt to let it finish.

furby · Nov 5, 2021

fabian said:
it looks to me like there still is some issue that needs to be investigated in qemu. if the VM continues to run normally despite the 'thaw' timing out, then you don't need to cancel the backup but can attempt to let it finish.

Thank you all. I'll give it a try and report back.

furby · Nov 8, 2021

After testing for some days, the bugfix works for me. It doesn't matter if i do a snapshot-backup or a stopped or suspended one, the guest machines now start and stop as they should do and are not torn down. Thank you all for your help.

Search

Search

SOLVED: Backup won't work while clients are active / qmp gets a timeout

furby

New Member

Attachments

C.G.B. Spender

Member

fabian

Proxmox Staff Member

furby

New Member

furby

New Member

fabian

Proxmox Staff Member

fabian

Proxmox Staff Member

furby

New Member

furby

New Member