Snapshot backup failure aftermath

Status
Not open for further replies.
S

sruffilli

Guest
Hello,
two days after a scheduled backup of a virtual machine, I noticed I didn't receive the usual "vzdump backup status".

I checked out
/var/log/vzdump/qemu-VMID.log and, indeed

Code:
May 03 00:01:01 INFO: Starting Backup of VM 126 (qemu)
May 03 00:01:01 INFO: status = running
May 03 00:01:03 INFO: backup mode: snapshot
May 03 00:01:03 INFO: ionice priority: 7
May 03 00:01:04 INFO:   Logical volume "vzsnap-proxmox02-0" created
May 03 00:01:04 INFO: creating archive '/mnt/pve/BUP/dump/vzdump-qemu-126-2012_05_03-00_01_01.tar.gz'
May 03 00:01:04 INFO: adding '/mnt/pve/BUP/dump/vzdump-qemu-126-2012_05_03-00_01_01.tmp/qemu-server.conf' to archive ('qemu-server.conf')
May 03 00:01:04 INFO: adding '/dev/cise-san-disk-2/vzsnap-proxmox02-0' to archive ('vm-disk-virtio0.raw')
May 04 16:15:12 INFO: lvremove failed - trying again in 8 seconds
May 04 16:15:20 INFO: lvremove failed - trying again in 16 seconds
May 04 16:15:36 INFO: lvremove failed - trying again in 32 seconds
May 04 16:16:08 ERROR: command 'lvremove -f /dev/cise-san-disk-2/vzsnap-proxmox02-0' failed: interrupted by signal
May 04 16:16:08 ERROR: Backup of VM 126 failed - command '/usr/lib/qemu-server/vmtar  '/mnt/pve/BUP/dump/vzdump-qemu-126-2012_05_03-00_01_01.tmp/qemu-server.conf' 'qemu-server.conf' '/dev/cise-san-disk-2/vzsnap-proxmox02-0' 'vm-disk-virtio0.raw'|gzip >/mnt/pve/BUP/dump/vzdump-qemu-126-2012_05_03-00_01_01.tar.dat' failed: interrupted by signal

At 16:15:12 i fired a "kill vzdump_pid" and after a couple of minutes, the process was closed.
(Please note that 40 hours were past from the beginning of the backup process)

Alarmed by the lvremove error i checked out lvscan, which greeted me with

Code:
root@proxmox02:~# lvscan
[COLOR=#ff0000]  /dev/cise-san-disk-2/vzsnap-proxmox02-0: read failed after 0 of 4096 at 210453331968: Input/output error[/COLOR]
[COLOR=#ff0000]  /dev/cise-san-disk-2/vzsnap-proxmox02-0: read failed after 0 of 4096 at 210453389312: Input/output error[/COLOR]
[COLOR=#ff0000]  /dev/cise-san-disk-2/vzsnap-proxmox02-0: read failed after 0 of 4096 at 0: Input/output error[/COLOR]
[COLOR=#ff0000]  /dev/cise-san-disk-2/vzsnap-proxmox02-0: read failed after 0 of 4096 at 4096: Input/output error[/COLOR]
[...] 
  [COLOR=#ff0000]inactive Original[/COLOR] '/dev/cise-san-disk-2/vm-126-disk-1' [196.00 GiB] inherit
 [COLOR=#ff0000] inactive Snapshot[/COLOR] '/dev/cise-san-disk-2/vzsnap-proxmox02-0' [1.00 GiB] inherit
[...]


The LVM snapshot is clearly messed up, but what makes me nervous is the vm-disk status, "inactive Original".
I found in many mailing lists (http://lists.debian.org/debian-user/2006/09/msg02538.html) that lvremove-ing the snapshot should solve the problem. Being the LVM volume on a clustered-lvm SAN over iSCSI, i'm not definitely sure my scenario allows me to fire an lvremove of the snapshot.

Furthermore, the VM is running fine at the moment (or it seems to be, what "inactive Original" means then?), and I need it to be up no matter what.

Does anyone have a suggestion? Thank you in advance.

My configuration:

4 node cluster
LVM volume on a san over iSCSI
Version 2.0-57/ff6cd700 (upgrading to latest stable in 2 weeks)
 
Last edited by a moderator:
Post pveversion -v
 
Code:
root@proxmox01:~# pveversion -v
pve-manager: 2.0-57 (pve-manager/2.0/ff6cd700)
running kernel: 2.6.32-11-pve
proxmox-ve-2.6.32: 2.0-65
pve-kernel-2.6.32-11-pve: 2.6.32-65
pve-kernel-2.6.32-7-pve: 2.6.32-60
lvm2: 2.02.88-2pve2
clvm: 2.02.88-2pve2
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-2
pve-cluster: 1.0-26
qemu-server: 2.0-37
pve-firmware: 1.0-15
libpve-common-perl: 1.0-25
libpve-access-control: 1.0-17
libpve-storage-perl: 2.0-17
vncterm: 1.0-2
vzctl: 3.0.30-2pve2
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-9
ksm-control-daemon: 1.1-1
 
Thank you tom, as I said I planned the update for the 19th of May for a planned downtime, but I'm not sure that just by updating I will fix my LVM volumes: I assume the update will prevent my problem to happen again.

Should I do an lvremove of the snapshot before the update?
 
Thank you tom, as I said I planned the update for the 19th of May for a planned downtime, but I'm not sure that just by updating I will fix my LVM volumes: I assume the update will prevent my problem to happen again.

Should I do an lvremove of the snapshot before the update?

yes, remove to snapshot manually. also make sure that you unlock your VM´s. (qm unlock VMID).
 
Status
Not open for further replies.

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!