Cancel a running backup job?

hotwired007 · Feb 29, 2012

i ahve a backup that is stuck... is there any way of cancelling it without killing the server?

tom · Feb 29, 2012

any details why it hangs?

hotwired007 · Feb 29, 2012

i was testing the backup routine and set a backup to run, but it suspended one of the servers instead of snapshot so the server was unavailable.

i sorted it by forcing a restart on the proxmox box after i'd migrated the rest of the servers off and running qm unlock <vmid>.

vcp_ai · Jun 13, 2012

I have a problem with the network used for backup machines, and backup (snapshot mode) lasts for several hours .

Can you point us to a gracefully way to cancel a backup job of several machines ?

Stop mode on Task Viewr backup window end with "Error unexpected status"

System is unrespondive . SSH works

Last think I can observe in Syslog is:

Code:

Jun 13 07:35:21 servidor177  rrdcached[1425]: queue_thread_main: rrd_update_r  (/var/lib/rrdcached/db/pve2-storage/servidor177/Of.122.20.backups4)  failed with status -1.  (/var/lib/rrdcached/db/pve2-storage/servidor177/Of.122.20.backups4:  illegal attempt to update using time 1339565706 when last update time is  1339565706 (minimum one second step))
Jun 13 07:37:46 servidor177  rrdcached[1425]: queue_thread_main: rrd_update_r  (/var/lib/rrdcached/db/pve2-vm/8105) failed with status -1.  (/var/lib/rrdcached/db/pve2-vm/8105: illegal attempt to update using  time 1339565559 when last update time is 1339565632 (minimum one second  step))
Jun 13 07:37:46 servidor177 rrdcached[1425]:  queue_thread_main: rrd_update_r (/var/lib/rrdcached/db/pve2-vm/8491)  failed with status -1. (/var/lib/rrdcached/db/pve2-vm/8491: illegal  attempt to update using time 1339565559 when last update time is  1339565632 (minimum one second step))
Jun 13 07:37:46 servidor177  rrdcached[1425]: queue_thread_main: rrd_update_r  (/var/lib/rrdcached/db/pve2-vm/8010) failed with status -1.  (/var/lib/rrdcached/db/pve2-vm/8010: illegal attempt to update using  time 1339565559 when last update time is 1339565632 (minimum one second  step))
Jun 13 07:37:46 servidor177 rrdcached[1425]:  queue_thread_main: rrd_update_r (/var/lib/rrdcached/db/pve2-vm/8492)  failed with status -1. (/var/lib/rrdcached/db/pve2-vm/8492: illegal  attempt to update using time 1339565559 when last update time is  1339565632 (minimum one second step))
Jun 13 07:37:46 servidor177  rrdcached[1425]: queue_thread_main: rrd_update_r  (/var/lib/rrdcached/db/pve2-vm/8001) failed with status -1.  (/var/lib/rrdcached/db/pve2-vm/8001: illegal attempt to update using  time 1339565559 when last update time is 1339565632 (minimum one second  step))
Jun 13 07:37:46 servidor177 rrdcached[1425]:  queue_thread_main: rrd_update_r  (/var/lib/rrdcached/db/pve2-storage/servidor177/Of.122.21.backups4)  failed with status -1.  (/var/lib/rrdcached/db/pve2-storage/servidor177/Of.122.21.backups4:  illegal attempt to update using time 1339565569 when last update time is  1339565706 (minimum one second step))
Jun 13 07:37:46 servidor177  rrdcached[1425]: queue_thread_main: rrd_update_r  (/var/lib/rrdcached/db/pve2-storage/servidor177/Of.201.187) failed with  status -1. (/var/lib/rrdcached/db/pve2-storage/servidor177/Of.201.187:  illegal attempt to update using time 1339565569 when last update time is  1339565706 (minimum one second step))
Jun 13 07:37:46 servidor177  rrdcached[1425]: queue_thread_main: rrd_update_r  (/var/lib/rrdcached/db/pve2-storage/servidor177/local) failed with  status -1. (/var/lib/rrdcached/db/pve2-storage/servidor177/local:  illegal attempt to update using time 1339565569 when last update time is  1339565706 (minimum one second step))

Code:

root@servidor177:~# pveversion -v
pve-manager: 2.1-1 (pve-manager/2.1/f9b0f63a)
running kernel: 2.6.32-12-pve
proxmox-ve-2.6.32: 2.1-68
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-12-pve: 2.6.32-68
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-39
pve-firmware: 1.0-16
libpve-common-perl: 1.0-27
libpve-access-control: 1.0-21
libpve-storage-perl: 2.0-18
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1
root@servidor177:~#

dietmar · Jun 13, 2012

vcp_ai said:
I have a problem with the network used for backup machines, and backup (snapshot mode) lasts for several hours .

Seems the system time is wrong (that would explain the errors in syslog).

Try standard unix tools to kill the backup task (ps, kill).

vcp_ai · Jun 13, 2012

dietmar said:
Seems the system time is wrong (that would explain the errors in syslog).

Sytem time is correct. Could be a diference (some seconds) between PVE and the NFS server (Openfiler) ??

dietmar · Jun 13, 2012

vcp_ai said:
Could be a diference (some seconds) between PVE and the NFS server (Openfiler) ??

No, those error only occur if the system time goes back in time (on the local host). Still get those errors?

kofik · Jun 19, 2012

I actually have a similar problem - once a week a (temporarily) big VM of 200GB on one disk needs to be backed up. We had to change hardware settings for this VM and unlocked the VM then (no, I know I shouldn't have done).

The next daily backups were logged with errors because the global lock could not be aquited.
After 2 days of this hanging backup I kill (p)killed vmtar process which made the backup process go to the next VMID.

I haven't found a way to actually tell Proxmox to stop a backup job as such.
Which actually be quite nice to either know or be able to.

I have found that GZIP backup on a big VMDK disk seems to take not only huge amount of processor power (and is only single threaded) but also takes really long to move the data.

tom · Jun 19, 2012

yes, gzip uses a lot of cpu power.

2.x uses lzo compression, much better.

vcp_ai · Jun 19, 2012

dietmar said:
No, those error only occur if the system time goes back in time (on the local host). Still get those errors?

Today I have the same problem. A backup running for 35 minutes per machine, Usually backup lasts for 3-4 minutes per machine. (10 GB disk)

Seems network is 'slow' today. Syslog says:

Code:

Jun 19 11:03:00 servidor176 pvestatd[729170]: status update time (6.139 seconds)
Jun 19 11:03:06 servidor176 pvestatd[729170]: WARNING: command 'df -P -B 1 /mnt/pve/Of.201.20.isos' failed: got timeout
Jun 19 11:03:08 servidor176 pvestatd[729170]: WARNING: command 'df -P -B 1 /mnt/pve/Of.201.20.backups2' failed: got timeout
Jun 19 11:03:10 servidor176 pvestatd[729170]: WARNING: command 'df -P -B 1 /mnt/pve/Of.201.20.backups4' failed: got timeout
Jun 19 11:03:10 servidor176 pvestatd[729170]: status update time (6.145 seconds)
Jun 19 11:03:16 servidor176 pvestatd[729170]: WARNING: command 'df -P -B 1 /mnt/pve/Of.201.20.isos' failed: got timeout
Jun 19 11:03:18 servidor176 pvestatd[729170]: WARNING: command 'df -P -B 1 /mnt/pve/Of.201.20.backups2' failed: got timeout
Jun 19 11:03:20 servidor176 pvestatd[729170]: WARNING: command 'df -P -B 1 /mnt/pve/Of.201.20.backups4' failed: got timeout
Jun 19 11:03:20 servidor176 pvestatd[729170]: status update time (6.149 seconds)
Jun 19 11:03:30 servidor176 pvestatd[729170]: WARNING: command 'df -P -B 1 /mnt/pve/Of.201.20.isos' failed: got timeout
Jun 19 11:03:32 servidor176 pvestatd[729170]: WARNING: command 'df -P -B 1 /mnt/pve/Of.201.20.backups2' failed: got timeout
Jun 19 11:03:34 servidor176 pvestatd[729170]: WARNING: command 'df -P -B 1 /mnt/pve/Of.201.20.backups4' failed: got timeout
Jun 19 11:03:34 servidor176 pvestatd[729170]: status update time (9.573 seconds)
Jun 19 11:03:36 servidor176 pvestatd[729170]: WARNING: command 'df -P -B 1 /mnt/pve/Of.201.20.isos' failed: got timeout

It seems that vzdump program is constantly doing 'df -P -B 1 /mnt ...... ' to check for space on ALL mounts (backup is executing against /mnt/pve/Of.201.20.backups4) .
As I have three mounts on same NTFS (one for isos and two for diferent backups number of copies), and network is slow doing this backup, df gets a timeout, and it 'gets in a loop' .

May be increasing df timeout OR reducing periodicity of it, OR doing df only against the destination of the backup, would help.

Also please remeber the origin of this thread:
-Is there any way to gracefully stop a running backup of several machines ??

tom · Jun 19, 2012

this thread is in the 1.x forum, pls do not confuse people by posting 2.x logs here.

vcp_ai · Jun 19, 2012

tom said:
this thread is in the 1.x forum, pls do not confuse people by posting 2.x logs here.

Sorry, did not realized that (I did a search for backup problems, and not checked ...).
Will try to find a similar problem in 2.x forums and append there, or open a new thread.

Cancel a running backup job?

hotwired007

Member

tom

Proxmox Staff Member

hotwired007

Member

vcp_ai

Renowned Member

dietmar

Proxmox Staff Member

vcp_ai

Renowned Member

dietmar

Proxmox Staff Member

kofik

Active Member

tom

Proxmox Staff Member

vcp_ai

Renowned Member

tom

Proxmox Staff Member

vcp_ai

Renowned Member

We value your privacy