Snapshot backup failure aftermath

sruffilli · May 7, 2012

Hello,
two days after a scheduled backup of a virtual machine, I noticed I didn't receive the usual "vzdump backup status".

I checked out
/var/log/vzdump/qemu-VMID.log and, indeed

Code:

May 03 00:01:01 INFO: Starting Backup of VM 126 (qemu)
May 03 00:01:01 INFO: status = running
May 03 00:01:03 INFO: backup mode: snapshot
May 03 00:01:03 INFO: ionice priority: 7
May 03 00:01:04 INFO:   Logical volume "vzsnap-proxmox02-0" created
May 03 00:01:04 INFO: creating archive '/mnt/pve/BUP/dump/vzdump-qemu-126-2012_05_03-00_01_01.tar.gz'
May 03 00:01:04 INFO: adding '/mnt/pve/BUP/dump/vzdump-qemu-126-2012_05_03-00_01_01.tmp/qemu-server.conf' to archive ('qemu-server.conf')
May 03 00:01:04 INFO: adding '/dev/cise-san-disk-2/vzsnap-proxmox02-0' to archive ('vm-disk-virtio0.raw')
May 04 16:15:12 INFO: lvremove failed - trying again in 8 seconds
May 04 16:15:20 INFO: lvremove failed - trying again in 16 seconds
May 04 16:15:36 INFO: lvremove failed - trying again in 32 seconds
May 04 16:16:08 ERROR: command 'lvremove -f /dev/cise-san-disk-2/vzsnap-proxmox02-0' failed: interrupted by signal
May 04 16:16:08 ERROR: Backup of VM 126 failed - command '/usr/lib/qemu-server/vmtar  '/mnt/pve/BUP/dump/vzdump-qemu-126-2012_05_03-00_01_01.tmp/qemu-server.conf' 'qemu-server.conf' '/dev/cise-san-disk-2/vzsnap-proxmox02-0' 'vm-disk-virtio0.raw'|gzip >/mnt/pve/BUP/dump/vzdump-qemu-126-2012_05_03-00_01_01.tar.dat' failed: interrupted by signal

At 16:15:12 i fired a "kill vzdump_pid" and after a couple of minutes, the process was closed.
(Please note that 40 hours were past from the beginning of the backup process)

Alarmed by the lvremove error i checked out lvscan, which greeted me with

Code:

root@proxmox02:~# lvscan
[COLOR=#ff0000]  /dev/cise-san-disk-2/vzsnap-proxmox02-0: read failed after 0 of 4096 at 210453331968: Input/output error[/COLOR]
[COLOR=#ff0000]  /dev/cise-san-disk-2/vzsnap-proxmox02-0: read failed after 0 of 4096 at 210453389312: Input/output error[/COLOR]
[COLOR=#ff0000]  /dev/cise-san-disk-2/vzsnap-proxmox02-0: read failed after 0 of 4096 at 0: Input/output error[/COLOR]
[COLOR=#ff0000]  /dev/cise-san-disk-2/vzsnap-proxmox02-0: read failed after 0 of 4096 at 4096: Input/output error[/COLOR]
[...] 
  [COLOR=#ff0000]inactive Original[/COLOR] '/dev/cise-san-disk-2/vm-126-disk-1' [196.00 GiB] inherit
 [COLOR=#ff0000] inactive Snapshot[/COLOR] '/dev/cise-san-disk-2/vzsnap-proxmox02-0' [1.00 GiB] inherit
[...]

The LVM snapshot is clearly messed up, but what makes me nervous is the vm-disk status, "inactive Original".
I found in many mailing lists (http://lists.debian.org/debian-user/2006/09/msg02538.html) that lvremove-ing the snapshot should solve the problem. Being the LVM volume on a clustered-lvm SAN over iSCSI, i'm not definitely sure my scenario allows me to fire an lvremove of the snapshot.

Furthermore, the VM is running fine at the moment (or it seems to be, what "inactive Original" means then?), and I need it to be up no matter what.

Does anyone have a suggestion? Thank you in advance.

My configuration:

4 node cluster
LVM volume on a san over iSCSI
Version 2.0-57/ff6cd700 (upgrading to latest stable in 2 weeks)

tom · May 7, 2012

Post pveversion -v

sruffilli · May 7, 2012

Code:

root@proxmox01:~# pveversion -v
pve-manager: 2.0-57 (pve-manager/2.0/ff6cd700)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.0-65
pve-kernel-2.6.32-11-pve: 2.6.32-65
pve-kernel-2.6.32-7-pve: 2.6.32-60
lvm2: 2.02.88-2pve2
clvm: 2.02.88-2pve2
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-37
pve-firmware: 1.0-15
libpve-common-perl: 1.0-25
libpve-access-control: 1.0-17
libpve-storage-perl: 2.0-17
vncterm: 1.0-2
vzctl: 3.0.30-2pve2
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1

tom · May 7, 2012

upgrade to latest, should fix this issue.

see http://forum.proxmox.com/threads/9417-Proxmox-VE-2-1-released!

sruffilli · May 7, 2012

Thank you tom, as I said I planned the update for the 19th of May for a planned downtime, but I'm not sure that just by updating I will fix my LVM volumes: I assume the update will prevent my problem to happen again.

Should I do an lvremove of the snapshot before the update?

tom · May 7, 2012

sruffilli said:
Thank you tom, as I said I planned the update for the 19th of May for a planned downtime, but I'm not sure that just by updating I will fix my LVM volumes: I assume the update will prevent my problem to happen again.

Should I do an lvremove of the snapshot before the update?

yes, remove to snapshot manually. also make sure that you unlock your VM´s. (qm unlock VMID).

Search

Search

Snapshot backup failure aftermath

sruffilli

Guest

tom

Proxmox Staff Member

sruffilli

Guest

tom

Proxmox Staff Member

sruffilli

Guest

tom

Proxmox Staff Member

We value your privacy