Running a glusterfs 3.12 server (redhat, latest stable version). I have a volume set up that a whole cluster of updated PVE 5.1 hosts are on. Any attempt to use qemu-img create a qcow2 image will immediately crash the gluster server and bring the volume offline. Also any attempt to connect to an existing qcow2 image will result in immediate crash.
Info:
on the proxmox side, I can recreate this from the command line like so
The actual local gluster mount point on proxmox works fine, the problem happens when the libgfapi connection from KVM comes into play.
this is what is running, you can see in the other command that the brick completely goes offline.
I don't know if this is a gluster problem or a proxmox problem. Obviously if this was common it would be complained about by a lot of gluster users considering it's the latest stable version. I'm guessing it's a combo of something unique in the proxmox build of KVM in regards to libgfapi and some kind of undiscovered bug in gluster? I'm not really sure what to think.
The only way to get it back online is to force it to turn back on from the server side:
EDIT: just wanted to add that this was discovered after updating some nodes on a PVE 4.x cluster to 5.x. The VMs run completely stable on 4.x nodes in the cluster but if trying to run a 5.x VM it does not work. All of the nodes are on 5.x now but NONE of them can create a glusterfs-based qcow2 disk image.
EDIT2: I have also discovered that it doesn't work with raw. The qemu-image command succeeds, but the 'kvm' command that accesses the gluster:// protocol image location causes it to crash, again.
Info:
on the proxmox side, I can recreate this from the command line like so
Code:
# qemu-img create -f qcow2 gluster://10.1.2.14/proxmoxvms/myimage2.img 5G
Formatting 'gluster://10.1.2.14/proxmoxvms/myimage2.img', fmt=qcow2 size=5368709120 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
[2018-04-05 22:44:06.065959] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib/x86_64-linux-gnu/libglusterfs.so.0(_gf_log_callingfn+0x1a3)[0x7f38f4dfae83] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_unwind+0x1d1)[0x7f38f4bc2b61] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f38f4bc2c7e] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x89)[0x7f38f4bc42e9] (--> /usr/lib/x86_64-linux-gnu/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f38f4bc4bb4] ))))) 0-proxmoxvms-client-0: forced unwinding frame type(GlusterFS 3.3) op(SEEK(48)) called at 2018-04-05 22:44:02.130372 (xid=0xc)
qemu-img: gluster://10.1.2.14/proxmoxvms/myimage2.img: Could not refresh total sector count: Transport endpoint is not connected
# qemu-img --version
qemu-img version 2.9.1pve-qemu-kvm_2.9.1-9
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
The actual local gluster mount point on proxmox works fine, the problem happens when the libgfapi connection from KVM comes into play.
this is what is running, you can see in the other command that the brick completely goes offline.
Code:
$ glusterfsd --version
glusterfs 3.12.6
Repository revision: url-removed
Copyright (c) 2006-2016 Red Hat, Inc. <url-removed/>
GlusterFS comes with ABSOLUTELY NO WARRANTY.
It is licensed to you under your choice of the GNU Lesser
General Public License, version 3 or any later version (LGPLv3
or later), or the GNU General Public License, version 2 (GPLv2),
in all cases as published by the Free Software Foundation.
$ sudo gluster volume status
Status of volume: proxmoxvms
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.1.2.14:/mnt/gbrick/proxmoxvms N/A N/A N N/A
Task Status of Volume proxmoxvms
------------------------------------------------------------------------------
There are no active volume tasks
I don't know if this is a gluster problem or a proxmox problem. Obviously if this was common it would be complained about by a lot of gluster users considering it's the latest stable version. I'm guessing it's a combo of something unique in the proxmox build of KVM in regards to libgfapi and some kind of undiscovered bug in gluster? I'm not really sure what to think.
The only way to get it back online is to force it to turn back on from the server side:
Code:
$ sudo gluster volume start proxmoxvms force
volume start: proxmoxvms: success
$ sudo gluster volume status
Status of volume: proxmoxvms
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick 10.1.2.14:/mnt/gbrick/proxmoxvms 49152 0 Y 32440
Task Status of Volume proxmoxvms
------------------------------------------------------------------------------
There are no active volume tasks
EDIT: just wanted to add that this was discovered after updating some nodes on a PVE 4.x cluster to 5.x. The VMs run completely stable on 4.x nodes in the cluster but if trying to run a 5.x VM it does not work. All of the nodes are on 5.x now but NONE of them can create a glusterfs-based qcow2 disk image.
EDIT2: I have also discovered that it doesn't work with raw. The qemu-image command succeeds, but the 'kvm' command that accesses the gluster:// protocol image location causes it to crash, again.
Last edited: