cannot restore to glusterfs

vkhera

Member
Feb 24, 2015
192
15
18
Maryland, USA
I had a VM go corrupt after a live migration from one host to another. Repairing the disk corruption was too much work, so I decided to restore from the backup earlier in the day.

The problem is, I cannot restore it directly back to the gluster storage. It fails immediately with this:

Code:
restore vma archive: vma extract -v -r /var/tmp/vzdumptmp9100.fifo /mnt/pve/filer/dump/vzdump-qemu-103-2016_04_02-03_15_01.vma /var/tmp/vzdumptmp9100
CFG: size: 606 name: qemu-server.conf
DEV: dev_id=1 size: 10737418240 devname: drive-scsi0
DEV: dev_id=2 size: 107374182400 devname: drive-scsi1
CTIME: Sat Apr 2 03:15:02 2016
[2016-04-02 22:19:41.341953] E [afr-common.c:4168:afr_notify] 0-datastore-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2016-04-02 22:19:41.719619] E [afr-common.c:4168:afr_notify] 0-datastore-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2016-04-02 22:19:41.996203] E [afr-common.c:4168:afr_notify] 0-datastore-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
Formatting 'gluster://pve1-cluster/datastore/images/103/vm-103-disk-1.qcow2', fmt=qcow2 size=10737418240 encryption=off cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2016-04-02 22:19:42.267284] E [afr-common.c:4168:afr_notify] 0-datastore-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
new volume ID is 'datastore:103/vm-103-disk-1.qcow2'
map 'drive-scsi0' to 'gluster://pve1-cluster/datastore/images/103/vm-103-disk-1.qcow2' (write zeros = 0)
[2016-04-02 22:19:42.460444] E [afr-common.c:4168:afr_notify] 0-datastore-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2016-04-02 22:19:42.797402] E [afr-common.c:4168:afr_notify] 0-datastore-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2016-04-02 22:19:43.361414] E [afr-common.c:4168:afr_notify] 0-datastore-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
Formatting 'gluster://pve1-cluster/datastore/images/103/vm-103-disk-2.qcow2', fmt=qcow2 size=107374182400 encryption=off cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2016-04-02 22:19:43.755407] E [afr-common.c:4168:afr_notify] 0-datastore-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
new volume ID is 'datastore:103/vm-103-disk-2.qcow2'
map 'drive-scsi1' to 'gluster://pve1-cluster/datastore/images/103/vm-103-disk-2.qcow2' (write zeros = 0)

** (process:9101): ERROR **: vma_reader_register_bs for stream drive-scsi1 failed - unexpected size 107390828544 != 107374182400
temporary volume 'datastore:103/vm-103-disk-1.qcow2' sucessfuly removed
temporary volume 'datastore:103/vm-103-disk-2.qcow2' sucessfuly removed
TASK ERROR: command 'vma extract -v -r /var/tmp/vzdumptmp9100.fifo /mnt/pve/filer/dump/vzdump-qemu-103-2016_04_02-03_15_01.vma /var/tmp/vzdumptmp9100' failed: got signal 5

Right now it is restoring to "local" storage and I'll move it back to gluster later.

Code:
# pveversion -v
proxmox-ve: 4.1-39 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-22 (running version: 4.1-22/aca130cf)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-3.19.8-1-pve: 3.19.8-3
pve-kernel-4.2.8-1-pve: 4.2.8-39
pve-kernel-4.1.3-1-pve: 4.1.3-7
pve-kernel-4.2.0-1-pve: 4.2.0-13
pve-kernel-4.2.1-1-pve: 4.2.1-14
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-36
qemu-server: 4.0-64
pve-firmware: 1.1-7
libpve-common-perl: 4.0-54
libpve-access-control: 4.0-13
libpve-storage-perl: 4.0-45
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-9
pve-container: 1.0-52
pve-firewall: 2.0-22
pve-ha-manager: 1.0-25
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve7~jessie
 
Same problem here (and thanks for the bug report).

There's an easier way to work round - add the gluster fuse mount as shared directory storage, restore the back up to that, then edit the vm.conf file and change the storage entries to the main gluster mount
 
I think I'm giving up on gluster. I am getting disk file corruption upon reboot even if the VM is not running. If I'm lucky enough to have a file system snapshot, rolling back to that seems to be safe (at least on one instance that worked).

I'm filing a bug report for this as well.
 
I'm getting a similiar issue. I did a VM backup and now, trying to restore to the same gluster storage, i'm getting the following output:
http://pastebin.com/raw/81AK3zAe

Additionally, why proxmox is using Gluster 3.3? I have 3.8 installed and also on server.

# pveversion -v
proxmox-ve: 4.2-48 (running kernel: 4.4.6-1-pve)
pve-manager: 4.2-2 (running version: 4.2-2/725d76f0)
pve-kernel-4.4.6-1-pve: 4.4.6-48
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-39
qemu-server: 4.0-72
pve-firmware: 1.1-8
libpve-common-perl: 4.0-59
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-50
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-14
pve-container: 1.0-62
pve-firewall: 2.0-25
pve-ha-manager: 1.0-28
ksm-control-daemon: 1.2-1
glusterfs-client: 3.8.4-1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie
 
A similier issue arise when trying to take a snapshot.
Seems that PM is unable to communicate with gluster on shapshotting or restoring, but is perfectly able to create new VM or start them