Latest PVE KVM build causes glusterfs server crash.

gulufish · Apr 6, 2018

Running a glusterfs 3.12 server (redhat, latest stable version). I have a volume set up that a whole cluster of updated PVE 5.1 hosts are on. Any attempt to use qemu-img create a qcow2 image will immediately crash the gluster server and bring the volume offline. Also any attempt to connect to an existing qcow2 image will result in immediate crash.

Info:

on the proxmox side, I can recreate this from the command line like so

Code:

# qemu-img create -f qcow2 gluster://10.1.2.14/proxmoxvms/myimage2.img 5G
Formatting 'gluster://10.1.2.14/proxmoxvms/myimage2.img', fmt=qcow2 size=5368709120 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
[2018-04-05 22:44:06.065959] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f38f4dfae83] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7f38f4bc2b61] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f38f4bc2c7e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7f38f4bc42e9] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f38f4bc4bb4] ))))) 0-proxmoxvms-client-0: forced unwinding frame type(GlusterFS 3.3) op(SEEK(48)) called at 2018-04-05 22:44:02.130372 (xid=0xc)
qemu-img: gluster://10.1.2.14/proxmoxvms/myimage2.img: Could not refresh total sector count: Transport endpoint is not connected
# qemu-img --version
qemu-img version 2.9.1pve-qemu-kvm_2.9.1-9
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers

The actual local gluster mount point on proxmox works fine, the problem happens when the libgfapi connection from KVM comes into play.

this is what is running, you can see in the other command that the brick completely goes offline.

Code:

$ glusterfsd --version
glusterfs 3.12.6
Repository revision: url-removed
Copyright (c) 2006-2016 Red Hat, Inc. <url-removed/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
$ sudo gluster volume status
Status of volume: proxmoxvms
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.1.2.14:/mnt/gbrick/proxmoxvms      N/A       N/A        N       N/A

Task Status of Volume proxmoxvms
------------------------------------------------------------------------------
There are no active volume tasks

I don't know if this is a gluster problem or a proxmox problem. Obviously if this was common it would be complained about by a lot of gluster users considering it's the latest stable version. I'm guessing it's a combo of something unique in the proxmox build of KVM in regards to libgfapi and some kind of undiscovered bug in gluster? I'm not really sure what to think.

The only way to get it back online is to force it to turn back on from the server side:

Code:

$ sudo gluster volume start proxmoxvms force
volume start: proxmoxvms: success
$ sudo gluster volume status
Status of volume: proxmoxvms
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.1.2.14:/mnt/gbrick/proxmoxvms      49152     0          Y       32440

Task Status of Volume proxmoxvms
------------------------------------------------------------------------------
There are no active volume tasks

EDIT: just wanted to add that this was discovered after updating some nodes on a PVE 4.x cluster to 5.x. The VMs run completely stable on 4.x nodes in the cluster but if trying to run a 5.x VM it does not work. All of the nodes are on 5.x now but NONE of them can create a glusterfs-based qcow2 disk image.

EDIT2: I have also discovered that it doesn't work with raw. The qemu-image command succeeds, but the 'kvm' command that accesses the gluster:// protocol image location causes it to crash, again.

wolfgang · Apr 6, 2018

Hi,

the GlusterFS Homepage tells 3.10 is the last stable version and 4.0 is the latest version at the moment. So i would use one of them and not an intermediate version.

See https://www.gluster.org/install/

If this is a external GlusterFS cluster you have to install the same gulsterfs client version on Proxmox.

gulufish · Apr 6, 2018

I am aware that the home page say 3.10, in fact I pointed that out to them in IRC a few weeks ago, but they confirmed to me that was a mistake and that the current stable version is 3.12

not allowed to post link, but search for "Announcing the release of Gluster 3.12 " )

Wikipedia also lists 3.12 as the current stable version.

Anyways that is a good point about client version. I may try running a server version based on what the client version says. Misses a few features but you would need the new client to take advantage of them anyways.

wolfgang · Apr 9, 2018

gulufish said:
I may try running a server version based on what the client version says

No install a client what match to the newer server. This way is the better approach.
If 3.12 is stable use it.

gulufish · Apr 9, 2018

How can I do that? Is there a Debian repo that I can use to get the latest kvm/qemu binaries with? The reason being, I need to also be using a different qemu version because it seems to be the component that is causing the crash. Qemu-img and kvm binaries are both using libgfapi to communicate with gluster server outside of that mount point (the actual mount point on the proxmox machines is stable). I wonder if the qemu-kvm packages from buster or sid would work?

wolfgang · Apr 10, 2018

Just install the Gluster packages form the official side or use there repo to be up to date.
https://download.gluster.org/pub/gl...lusterfs/glusterfs-client_3.10.11-1_amd64.deb
Qemu use libglusterfs to connect to Gluster.

Search

Search

Latest PVE KVM build causes glusterfs server crash.

gulufish

New Member

wolfgang

Proxmox Retired Staff

gulufish

New Member

wolfgang

Proxmox Retired Staff

gulufish

New Member

wolfgang

Proxmox Retired Staff

We value your privacy