Updated gluster possible?

_dist

New Member
Aug 1, 2014
15
0
1
First off let me explain my setup briefly

We are running a 3-node (HA turned off presently) cluster. All three use a Gluster replicate volume for VM storage, the reason I chose proxmox is because (at the time of choosing) it was the only ui that supported libgfapi fully. Everything works pretty great, but there are two major issues that we are experiencing related to gluster bugs

1) Heal status is inaccurate for large busy files (VMs especially) this is due a couple of issues that are fixed in 3.5*
2) Modification of volume structure (add/replace/remove brick) isn't entirely stable in 3.4* as an example --> https://bugzilla.redhat.com/show_bug.cgi?id=1104861

I've been warmed (not yet burned) by both of these issues a few times since being live and am currently fighting a fire caused by #2

http://download.proxmox.com/debian/dists/wheezy/pve-no-subscription/binary-amd64/ <-- contains 3.4.2-1 (at newest)

I don't have an enterprise subscription so I can't know if that will contain something newer, or with a backported fix. I'm not sure what configure options were used in compiling qemu/gluster (but speaking on that I would love the dbg debs as well).

I can appreciate that proxmox builds are likely vetted and that a 3.5.2 vetting may be taking place already. However issue #1 is especially critical for a VM hosting platform. Please let me know if I'm mistaken anywhere above, or if there is anything I can do to help.
 
I can appreciate that proxmox builds are likely vetted and that a 3.5.2 vetting may be taking place already.

Many users reported problems with 3.5.x, so I am quit unsure if we should update now. But you can always use the packages from gluster directly - we do not compile them ourselves.
 
Last edited:
Thanks, as a virtual storage host we are very eager to move to 3.5.x because volume heal info just doesn't work in 3.4.x. The problem isn't that you may have compiled glusterfs-server (it's good to know you used stock debs) but that I suspect you must have compiled qemu (and it will need to be re-compiled over the new api/src files in gluster 3.5.x. Rather than having to trial and error it myself do you think you could provide the ./configure options that was used to compile?

http://download.proxmox.com/debian/...ion/binary-amd64/pve-qemu-kvm_1.7-8_amd64.deb

Which I assume based on kvm --version came from http://wiki.qemu-project.org/download/qemu-1.7.1.tar.bz2 (and no source modifications were made?) Obviously I could just try a re-compile from source once gluster 3.5.2 is present but I'd feel better knowing that my recompile is as close to stock proxmox as possible.

Thanks!
 
So here's my setup today

Node 1 (running glusterfs 3.4.2 and qemu 1.7.1 stock)
Node 2 (same as above)
Node 3 (running glusterfs 3.5.2 from gluster repo but qemu 1.7.1. stock)

My best guess is that qemu libgfapi points to the server (in my case I use localhost) because qemu on Node 1 & 2 can run VMs just fine on a volume that is wholly located on node 3 gluster. However, node 3 has trouble even starting VMs on it's own localhost volume. My theory is that the proxmox qemu libgfapi is out of date and incompatible with the new server api. However, this is just a guess, to verify I'll have to compile my own version over the new api/src files. I'm assuming a straight compile is good enough but let me know if you have any input and I'll keep this thread up to date.

Edit: NM I just found https://git.proxmox.com/?p=pve-qemu-kvm.git;a=shortlog <-- thanks
 
Last edited:
I'm not sure specifically where the problem is, but the symptoms are that QM reports a timeout even though the machine starts properly. This problem prevents live migration as well.

This post had something very similar (not the same error though)
http://forum.proxmox.com/threads/18526-The-problem-with-the-disk-migration-on-glusterfs

The error that I'm getting is "failed: got timeout"

If I run the kvm command verbatim that qm posts it works fine (just like how the machine start properly with QM as well; but gives an error anyway). What makes QM decide that "failed: got timeout" occurred? Perhaps it's a simple config issue?

Edit: Perhaps we should merge this with http://forum.proxmox.com/threads/18486-qm-start-fails-with-timeout-error-but-vm-is-actually-started <-- this is exact issue I'm having.
 
Last edited:
Hello.

We've been talking about this problem on the IRC channel of gluster.
I forgot to mention that I tried to figure it out with strace.
This is what I found.

During the first process run I saw the following system calls.

Command line : qm start 161

PID : 38662

Code:
...
pipe([6, 7])                            = 0
pipe([8, 10])                           = 0
...
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f19bef0e9d0) = 38663
...
close(7)                                = 0
close(8)                                = 0
read(6, "UPID:prox2:00009707:001037D4:53F"..., 4096) = 60
...
open("/var/log/pve/tasks/C/UPID:prox2:00009707:001037D4:53F3541C:qmstart:161:root@pam:", O_WRONLY|O_CREAT|O_EXCL, 0640) = 7
...
write(10, "OK", 2)                      = 2
close(10)                               = 0
// Then later it tries to read something from its child process one more time but it gets a timeout error about 30 times. 
...
select(8, [6], NULL, NULL, {1, 0})      = 0 (Timeout)

PID : 38663 (PPID : 38662)

Code:
...
write(1, "UPID:prox2:00009707:001037D4:53F"..., 60) = 60
read(8, "OK", 4096)                     = 2
...
pipe([15, 16])  
pipe([13, 14])  
...
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f19bef0e9d0) = 38666
close(14)                               = 0
close(16)                               = 0
...
select(16, [13 15], NULL, NULL, {1, 0}) = 0 (Timeout)
select(16, [13 15], NULL, NULL, {1, 0}) = 1 (in [13], left {0, 412061})
--- SIGCHLD (Child exited) @ 0 (0) ---
// 30 times as well
select(16, [15], NULL, NULL, {1, 0})    = 0 (Timeout)
// Here I cannot see how it is supposed to get something from a dead process ...

PID : 38666 (PPID : 38663)

Code:
...
execve("/usr/bin/kvm",  ["/usr/bin/kvm", "-id", "161", "-chardev",  "socket,id=qmp,path=/var/run/qemu"..., "-mon",  "chardev=qmp,mode=control", "-vnc",  "unix:/var/run/qemu-server/161.vn"..., "-pidfile",  "/var/run/qemu-server/161.pid", "-daemonize", "-name", "7-test", "-smp",  "sockets=1,cores=4", ...], [/* 16 vars */]) = 0
// Here we are in the KVM program. 
dup2(16, 2)                             = 2
dup(16)                                 = 12
fcntl(16, F_GETFD)                      = 0x1 (flags FD_CLOEXEC)
dup2(12, 16)                            = 16
fcntl(16, F_SETFD, FD_CLOEXEC)          = 0
close(12)                               = 0
fcntl(2, F_SETFD, 0)                    = 0
// We are supposed to wait something from the file descriptor 2 or 16 but I haven't find anything for the moment ...
...
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fa845e1bbd0) = 38667
// And then in the following process I never see any outputs or operations on file descriptors 2 or 16.

EDIT :
I forgot to post what we are supposed to have according to the first process start compared to a successful start.

Code:
select(8, [6], NULL, NULL, {1, 0})      = 1 (in [6], left {0, 694391})
read(6, "TASK OK\n", 4096)              = 8
write(7, "TASK OK\n", 8)                = 8
select(8, [6], NULL, NULL, {1, 0})      = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
 
Last edited:
Yes, for sure.
That's why I let the latest version of gluster of the 3.4 branch for the moment on my proxmox servers.
 
I just pulled latest glusterfs from git.proxmox.com and build and installed the packages:
glusterd -V
glusterfs 3.5.2 built on Aug 21 2014 23:10:43
Repository revision: git://git.gluster.com/glusterfs.git
Copyright (c) 2006-2013 Red Hat, Inc. <http://www.redhat.com/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.

Then trying to migrate a CT with the following result:
Aug 21 23:15:08 starting migration of CT 137 to node 'esx2' (172.16.3.9)
Aug 21 23:15:08 container is running - using online migration
Aug 21 23:15:08 container data is on shared storage 'gfs1'
Aug 21 23:15:08 start live migration - suspending container
Aug 21 23:15:08 dump container state
Aug 21 23:15:09 # vzctl --skiplock chkpnt 137 --dump --dumpfile /mnt/pve/gfs1/dump/dump.137
Aug 21 23:15:08 Setting up checkpoint...
Aug 21 23:15:08 join context..
Aug 21 23:15:08 dump...
Aug 21 23:15:09 Can not dump container: Invalid argument
Aug 21 23:15:09 Error: BUG: no socket index
Aug 21 23:15:09 ERROR: Failed to dump container state: Checkpointing failed
Aug 21 23:15:09 aborting phase 1 - cleanup resources
Aug 21 23:15:09 start final cleanup
Aug 21 23:15:09 ERROR: migration aborted (duration 00:00:01): Failed to dump container state: Checkpointing failed
TASK ERROR: migration aborted

Exactly the same error as with gluster-3.4.2!!

What todo?
 
Maybe it has significance, maybe not, but is the migrated CT 64 or 32-bits?
 
Yes, I know. Commentend on it in your thread. I thought it might have some connection to that problem (64-bit inode pointers don't always work in a 32-bit CT).
 
Seems qemu option -daemonize does not work correctly when using gluster://, not closing stdout ...
will try to find a fix
 
I'm trying to compile the pve-qemu-kvm project.
I don't know what are the ./configure options to be set for a proper installation.

Anyway, I just wanted to have a test to begin with, but it seems that a dependecy is missing.
I can't find where GSequence type is supposed to be defined.

Here is the make output error.

Code:
qemu-img.c: In function ‘compare_data’:
qemu-img.c:61:5: warning: implicit declaration of function ‘g_strcmp0’ [-Wimplicit-function-declaration]
qemu-img.c:61:5: warning: nested extern declaration of ‘g_strcmp0’ [-Wnested-externs]
qemu-img.c: In function ‘add_format_to_seq’:
qemu-img.c:71:5: error: unknown type name ‘GSequence’
qemu-img.c:73:5: warning: implicit declaration of function ‘g_sequence_insert_sorted’ [-Wimplicit-function-declaration]
qemu-img.c:73:5: warning: nested extern declaration of ‘g_sequence_insert_sorted’ [-Wnested-externs]
qemu-img.c: In function ‘help’:
qemu-img.c:159:5: error: unknown type name ‘GSequence’
qemu-img.c:162:5: warning: implicit declaration of function ‘g_sequence_new’ [-Wimplicit-function-declaration]
qemu-img.c:162:5: warning: nested extern declaration of ‘g_sequence_new’ [-Wnested-externs]
qemu-img.c:162:9: warning: assignment makes pointer from integer without a cast [enabled by default]
qemu-img.c:164:5: warning: implicit declaration of function ‘g_sequence_foreach’ [-Wimplicit-function-declaration]
qemu-img.c:164:5: warning: nested extern declaration of ‘g_sequence_foreach’ [-Wnested-externs]
qemu-img.c:166:5: warning: implicit declaration of function ‘g_sequence_free’ [-Wimplicit-function-declaration]
qemu-img.c:166:5: warning: nested extern declaration of ‘g_sequence_free’ [-Wnested-externs]
make: *** [qemu-img.o] Erreur 1
 
Having same issue with glusterfs 3.5.2. Proxmox gives timeout, but VM actually starts.

TASK ERROR: start failed: command '/usr/bin/kvm -id 127 -chardev 'socket,id=qmp,path=/var/run/qemu-server/127.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -vnc unix:/var/run/qemu-server/127.vnc,x509,password -pidfile /var/run/qemu-server/127.pid -daemonize -name gfs-HD-on-114 -smp 'sockets=1,cores=2' -nodefaults -boot 'menu=on' -vga cirrus -cpu host,+x2apic -k en-us -m 1024 -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -drive 'if=none,id=drive-ide2,media=cdrom,aio=native' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=gluster://stor1/HA-FAST-PVE1-150G/images/127/vm-127-disk-1.qcow2,if=none,id=drive-virtio0,format=qcow2,aio=native,cache=none' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap127i0,script=/var/lib/qemu-server/pve-bridge,vhost=on' -device 'virtio-net-pci,mac=E2:97:5E:02:25:A5,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: got timeout

any fixed .deb is available?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!