The problem with the disk migration on glusterfs

Melanxolik

Well-Known Member
Dec 18, 2013
86
0
46
Need help.
We have a cluster of seven servers on all machines purchased license: Proxmox VE Community Subscription 1 CPU/year
There are two storage with glusterfs, which has a volume

Code:
[root@gluster01 ~]# gluster volume info data0
 
Volume Name: data0
Type: Distributed-Replicate
Volume ID: 01398656-a824-43d8-84f6-152eaf14f83c
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: gluster01:/data/1TB01/vol01
Brick2: gluster02:/data/1TB01/vol01
Brick3: gluster01:/data/1TB02/vol02
Brick4: gluster02:/data/1TB02/vol02
Brick5: gluster01:/data/1TB03/vol03
Brick6: gluster02:/data/1TB03/vol03
Brick7: gluster01:/data/1TB04/vol04
Brick8: gluster02:/data/1TB04/vol04
[root@gluster01 ~]#

Plugged it in proxmox
Code:
root@cl7:~# mount|grep gluster
192.168.126.231:data0 on /mnt/pve/backup01 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
root@cl7#

Connect through the Web interface.
The problem occurs when you try to move a running system disk with LVM volumes on glusterfs, when creating a new vm on gluster problems are not present.




Code:
root@cl7:~# qm move_disk 506 virtio0 backup01
create full clone of drive virtio0 (LVM1:vm-506-disk-1)
[2014-05-15 19:40:28.889763] E [afr-common.c:3959:afr_notify] 0-data0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2014-05-15 19:40:28.889811] E [afr-common.c:3959:afr_notify] 0-data0-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2014-05-15 19:40:28.889826] E [afr-common.c:3959:afr_notify] 0-data0-replicate-2: All subvolumes are down. Going offline until atleast one of them comes back up.
Formatting 'gluster://192.168.126.231/data0/images/506/vm-506-disk-1.raw', fmt=raw size=34359738368 
[2014-05-15 19:40:28.889840] E [afr-common.c:3959:afr_notify] 0-data0-replicate-3: All subvolumes are down. Going offline until atleast one of them comes back up.
unable to connect to VM 506 socket - timeout after 31 retries
storage migration failed: mirroring error: VM 506 qmp command 'query-block-jobs' failed - interrupted by signal
root@cl7:~#

Test create tmp file on glusterfs.
Code:
root@cluster-1-7:/mnt/pve/backup01# dd if=/dev/zero of=test.file bs=1MB count=1024 conv=sync
1024+0 records in
1024+0 records out
1024000000 bytes (1.0 GB) copied, 22.9457 s, 44.6 MB/s
root@cluster-1-7:/mnt/pve/backup01#





Code:
root@cl7:/mnt/pve/backup01# pveversion -v
proxmox-ve-2.6.32: 3.2-126 (running kernel: 3.10.0-2-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-27-pve: 2.6.32-121
pve-kernel-3.10.0-2-pve: 3.10.0-8
pve-kernel-2.6.32-29-pve: 2.6.32-126
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.0-1
root@cl7:/mnt/pve/backup01#
Code:
root@cl7:/mnt/pve/backup01# dpkg -l |grep gluster
ii  glusterfs-client                 3.5.0-1                       amd64        clustered file-system (client package)
ii  glusterfs-common                 3.5.0-1                       amd64        GlusterFS common libraries and translator modules
ii  glusterfs-server                 3.5.0-1                       amd64        clustered file-system (server package)
root@cl7#

Gluster version had to upgrade because of problems in the version that comes in the original system.

*****
Excuse me, dear moderators move the topic to the respective versions proxmox.
 
Hi, I have repeated same test as you propose and it works without problem.
I'm using pve 3.2.4 and glusterfs version of pve repositories:
Code:
root@servidor169:~# dpkg -l | grep gluster
ii  glusterfs-client                 3.4.2-1                       amd64        clustered file-system (client package)
ii  glusterfs-common                 3.4.2-1                       amd64        GlusterFS common libraries and translator modules
ii  glusterfs-server                 3.4.2-1                       amd64        clustered file-system (server package)
root@servidor169:~#
Move disk from LVM to Gluster (machine stopped)
Code:
root@servidor179:~# qm move_disk 8107 virtio0  GLUS.169
create full clone of drive virtio0 (lvm-vol-datos:vm-8107-disk-1)
Formatting 'gluster://192.168.122.169/volGLUS169/images/8107/vm-8107-disk-2.raw', fmt=raw size=6442450944 
transferred: 0 bytes remaining: 6442450944 bytes total: 6442450944 bytes progression: 0.00 %
transferred: 130137509 bytes remaining: 6312313435 bytes total: 6442450944 bytes progression: 2.02 %
..........
transferred: 6370939738 bytes remaining: 71511206 bytes total: 6442450944 bytes progression: 98.89 %
transferred: 6442450944 bytes remaining: 0 bytes total: 6442450944 bytes progression: 100.00 %
root@servidor179:~#
The I moved back the disk to LVM (same correct result)
Then I removed 'unused disks' for the machine (using GUI Hardware tab).
Then i started the machine, and move again:
Code:
root@servidor179:~# qm move_disk 8107 virtio0  GLUS.169
create full clone of drive virtio0 (lvm-vol-datos:vm-8107-disk-1)
Formatting 'gluster://192.168.122.169/volGLUS169/images/8107/vm-8107-disk-2.raw', fmt=raw size=6442450944 
transferred: 0 bytes remaining: 6442450944 bytes total: 6442450944 bytes progression: 0.00 %
transferred: 20971520 bytes remaining: 6421479424 bytes total: 6442450944 bytes progression: 0.33 %
transferred: 62914560 bytes remaining: 6379536384 bytes total: 6442450944 bytes progression: 0.98 %
............

transferred: 6437863424 bytes remaining: 4587520 bytes total: 6442450944 bytes progression: 99.93 %
transferred: 6442450944 bytes remaining: 0 bytes total: 6442450944 bytes progression: 100.00 %
root@servidor179:~#
Same result moving back form gluster to lvm ....



I've glusterfs-server installed in servers 168 and 169 (replica 2)
I've a cluster with servers 168, 169, 173, 174, 175
All of them boot from a 60GB SSD, and all of them have VMs on glusterfs (no second disk on 173, 174, 175)

I've made the tests from a machine (179) that is not part of the cluster. (I have no LVM inside the cluster)

Regards