The problem with the disk migration on glusterfs

Melanxolik

Well-Known Member
Dec 18, 2013
86
0
46
Need help.
We have a cluster of seven servers on all machines purchased license: Proxmox VE Community Subscription 1 CPU/year
There are two storage with glusterfs, which has a volume

Code:
[root@gluster01 ~]# gluster volume info data0
 
Volume Name: data0
Type: Distributed-Replicate
Volume ID: 01398656-a824-43d8-84f6-152eaf14f83c
Status: Started
Number of Bricks: 4 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: gluster01:/data/1TB01/vol01
Brick2: gluster02:/data/1TB01/vol01
Brick3: gluster01:/data/1TB02/vol02
Brick4: gluster02:/data/1TB02/vol02
Brick5: gluster01:/data/1TB03/vol03
Brick6: gluster02:/data/1TB03/vol03
Brick7: gluster01:/data/1TB04/vol04
Brick8: gluster02:/data/1TB04/vol04
[root@gluster01 ~]#

Plugged it in proxmox
Code:
root@cl7:~# mount|grep gluster
192.168.126.231:data0 on /mnt/pve/backup01 type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
root@cl7#

Connect through the Web interface.
The problem occurs when you try to move a running system disk with LVM volumes on glusterfs, when creating a new vm on gluster problems are not present.




Code:
root@cl7:~# qm move_disk 506 virtio0 backup01
create full clone of drive virtio0 (LVM1:vm-506-disk-1)
[2014-05-15 19:40:28.889763] E [afr-common.c:3959:afr_notify] 0-data0-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2014-05-15 19:40:28.889811] E [afr-common.c:3959:afr_notify] 0-data0-replicate-1: All subvolumes are down. Going offline until atleast one of them comes back up.
[2014-05-15 19:40:28.889826] E [afr-common.c:3959:afr_notify] 0-data0-replicate-2: All subvolumes are down. Going offline until atleast one of them comes back up.
Formatting 'gluster://192.168.126.231/data0/images/506/vm-506-disk-1.raw', fmt=raw size=34359738368 
[2014-05-15 19:40:28.889840] E [afr-common.c:3959:afr_notify] 0-data0-replicate-3: All subvolumes are down. Going offline until atleast one of them comes back up.
unable to connect to VM 506 socket - timeout after 31 retries
storage migration failed: mirroring error: VM 506 qmp command 'query-block-jobs' failed - interrupted by signal
root@cl7:~#

Test create tmp file on glusterfs.
Code:
root@cluster-1-7:/mnt/pve/backup01# dd if=/dev/zero of=test.file bs=1MB count=1024 conv=sync
1024+0 records in
1024+0 records out
1024000000 bytes (1.0 GB) copied, 22.9457 s, 44.6 MB/s
root@cluster-1-7:/mnt/pve/backup01#





Code:
root@cl7:/mnt/pve/backup01# pveversion -v
proxmox-ve-2.6.32: 3.2-126 (running kernel: 3.10.0-2-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-27-pve: 2.6.32-121
pve-kernel-3.10.0-2-pve: 3.10.0-8
pve-kernel-2.6.32-29-pve: 2.6.32-126
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.0-1
root@cl7:/mnt/pve/backup01#
Code:
root@cl7:/mnt/pve/backup01# dpkg -l |grep gluster
ii  glusterfs-client                 3.5.0-1                       amd64        clustered file-system (client package)
ii  glusterfs-common                 3.5.0-1                       amd64        GlusterFS common libraries and translator modules
ii  glusterfs-server                 3.5.0-1                       amd64        clustered file-system (server package)
root@cl7#

Gluster version had to upgrade because of problems in the version that comes in the original system.

*****
Excuse me, dear moderators move the topic to the respective versions proxmox.
 
Hi, I have repeated same test as you propose and it works without problem.
I'm using pve 3.2.4 and glusterfs version of pve repositories:
Code:
root@servidor169:~# dpkg -l | grep gluster
ii  glusterfs-client                 3.4.2-1                       amd64        clustered file-system (client package)
ii  glusterfs-common                 3.4.2-1                       amd64        GlusterFS common libraries and translator modules
ii  glusterfs-server                 3.4.2-1                       amd64        clustered file-system (server package)
root@servidor169:~#
Move disk from LVM to Gluster (machine stopped)
Code:
root@servidor179:~# qm move_disk 8107 virtio0  GLUS.169
create full clone of drive virtio0 (lvm-vol-datos:vm-8107-disk-1)
Formatting 'gluster://192.168.122.169/volGLUS169/images/8107/vm-8107-disk-2.raw', fmt=raw size=6442450944 
transferred: 0 bytes remaining: 6442450944 bytes total: 6442450944 bytes progression: 0.00 %
transferred: 130137509 bytes remaining: 6312313435 bytes total: 6442450944 bytes progression: 2.02 %
..........
transferred: 6370939738 bytes remaining: 71511206 bytes total: 6442450944 bytes progression: 98.89 %
transferred: 6442450944 bytes remaining: 0 bytes total: 6442450944 bytes progression: 100.00 %
root@servidor179:~#
The I moved back the disk to LVM (same correct result)
Then I removed 'unused disks' for the machine (using GUI Hardware tab).
Then i started the machine, and move again:
Code:
root@servidor179:~# qm move_disk 8107 virtio0  GLUS.169
create full clone of drive virtio0 (lvm-vol-datos:vm-8107-disk-1)
Formatting 'gluster://192.168.122.169/volGLUS169/images/8107/vm-8107-disk-2.raw', fmt=raw size=6442450944 
transferred: 0 bytes remaining: 6442450944 bytes total: 6442450944 bytes progression: 0.00 %
transferred: 20971520 bytes remaining: 6421479424 bytes total: 6442450944 bytes progression: 0.33 %
transferred: 62914560 bytes remaining: 6379536384 bytes total: 6442450944 bytes progression: 0.98 %
............

transferred: 6437863424 bytes remaining: 4587520 bytes total: 6442450944 bytes progression: 99.93 %
transferred: 6442450944 bytes remaining: 0 bytes total: 6442450944 bytes progression: 100.00 %
root@servidor179:~#
Same result moving back form gluster to lvm ....



I've glusterfs-server installed in servers 168 and 169 (replica 2)
I've a cluster with servers 168, 169, 173, 174, 175
All of them boot from a 60GB SSD, and all of them have VMs on glusterfs (no second disk on 173, 174, 175)

I've made the tests from a machine (179) that is not part of the cluster. (I have no LVM inside the cluster)

Regards
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!