Proxmox backup failed because of VM not ready in timeout time

Tomasz Krzywicki

New Member
Apr 15, 2021
11
0
1
55
Hi.
I have a problem with backup. I keep VMs on local disk in Dell server. The sample backup command :
0 3 * * 0 root [ `/bin/date +\%d` -ge 8 ] && [ `/bin/date +\%d` -le 14 ] && vzdump 100 --storage gda-nas-03-10g --mailnotification failure --mailto admingda@domain.com --compress zstd --mode stop --quiet 1 --prune-backups keep-last=1,keep-monthly=1
I never start more than 1 backup at the same time. Next backup of next VM will start after some hours after the previous.

Generaly it works ok. From time to time I see in the backup log :
Info: stopping vm
Info: VM quit/powerdown failed
Error : Backup of VM 100 failed - command 'qm shutdown 100 --skiplock --keepActive --timeout 600' failed : exit code 255

All work absolutely ok. VM is working correct and sometimes needs more than 600s to be stopped.
If VM will be stopped in 600 seconds, the backup will occure and will finish ok and the VM will be started again to work by the backup system.
If VM will need more than 600 seconds, the backup will not occure, backup process will stop , after some time (maybe 1000seconds) VM will finish closing action and will stop. Nothing will start it again after this because backup action finished after 600 seconds.

Please hint me , where I can change 600seconds timeout to longer. This system will work ok if backup will wait longer.

It should not be needed, but I found very often the request for :
Code:
pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.140-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-helper: 6.4-8
pve-kernel-5.4: 6.4-7
pve-kernel-5.3: 6.1-6
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.143-1-pve: 5.4.143-1
pve-kernel-5.4.140-1-pve: 5.4.140-1
pve-kernel-5.4.128-1-pve: 5.4.128-2
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.98-1-pve: 5.4.98-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.22-pve1~bpo10+1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-4
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.3-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.6-pve1~bpo10+1
 
Hi,
If VM will need more than 600 seconds, the backup will not occure, backup process will stop , after some time (maybe 1000seconds) VM will finish closing action and will stop. Nothing will start it again after this because backup action finished after 600 seconds.

Please hint me , where I can change 600seconds timeout to longer.
You can pass the --stopwait MINUTES to vzdump, e.g.
Code:
vzdump 100 --storage gda-nas-03-10g --mailnotification failure --mailto admingda@domain.com --compress zstd --mode stop --quiet 1 --prune-backups keep-last=1,keep-monthly=1 --stopwait 15
To make it wait 15 minutes (900 seconds)

But, IMO it is weird that the VM needs that long, not totally unheard of for some slower storage and maybe big services running in the VM but still a long time. Maybe check what needs that long in the VM for shutdown, ideally that can be improved.
 
Thank you .
It can be situation when the service is divided between some machines, like separate sql , queue manager, web service ... The shutdown will clear close service and service has to set the other parts in correct state. It can need time. Sometimes it is not possible in 600s.
 
Understandable.

Mentioning for sake of completeness: You could try using the QEMU guest agent (installing in VM and then enabling it for that VM in PVE under VM Options) it ensures a consistent state by freezing the filesystem for the backup and then thawing it again.

For DBs you may want some extra hook step, e.g., like:
https://github.com/qemu/qemu/blob/m...t-agent/fsfreeze-hook.d/mysql-flush.sh.sample

It is definitely something you may want to check first in a test environment to be sure to get all right, but then backups can be way less intrusive as you do not need to use the stop-mode for a clean state.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!