Gluster storage

mijohnst · Jan 21, 2017

I'm new to ProxMox and so far it's been fairly strait forward...up to the point of building my first VM. I'm trying to get my storage setup but I'm having issues getting NAS or Gluster working. I have two HP DL380 servers with 5TB in each that I have in a gluster replica. I also have an 8TB Buffalo NAS. Both servers and the NAS have a pair of 1g Ethernet configured into an LAPC trunk on my Cisco switch.

I've created an NFS and Gluster shares in my cluster and they look like they're reading just fine. The gluster pair I have setup is working and the peers are talking but when I try to create anything I see the errors below. I've turned the firewall off but still having same issues. Also, I've checked the status of the bricks after each failure and they're up running fine.

I haven't found any good guides for setting up gluster on Proxmox...and I don't even know what to think about my NFS issue because I use NFS for other things in my home network with no issues at all. Any links or suggestions would be appreciated.

Task viewer: VM 101 - Create
OutputStatus
Stop
[2017-01-21 07:23:11.290547] E [client-handshake.c:1760:client_query_portmap_cbk] 0-vmhost-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2017-01-21 07:23:11.370076] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 07:23:11.378799] E [client-handshake.c:1760:client_query_portmap_cbk] 0-vmhost-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2017-01-21 07:23:11.572154] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 07:23:11.580568] E [client-handshake.c:1760:client_query_portmap_cbk] 0-vmhost-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2017-01-21 07:23:13.922711] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 07:23:13.931323] E [client-handshake.c:1760:client_query_portmap_cbk] 0-vmhost-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
Formatting 'gluster://artemis/vmhost/images/101/vm-101-disk-1.qcow2', fmt=qcow2 size=26843545600 encryption=off cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2017-01-21 07:23:13.950269] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
TASK OK

If I try to use NAS I see these errors:

Task viewer: VM 100 - Create
OutputStatus
Stop
TASK ERROR: create failed - unable to create image: got lock timeout - aborting command

mijohnst · Jan 21, 2017

I just blew my gluster share away and rebuild it. Still says it's down even though it's not...

[2017-01-21 08:22:19.292270] E [socket.c:2178:socket_connect_finish] 0-vmhost-client-1: connection to 192.168.2.80:49153 failed (No route to host)
[2017-01-21 08:22:19.410851] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 08:22:22.292166] E [socket.c:2178:socket_connect_finish] 0-vmhost-client-1: connection to 192.168.2.80:49153 failed (No route to host)
[2017-01-21 08:22:22.490044] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 08:22:25.292491] E [socket.c:2178:socket_connect_finish] 0-vmhost-client-1: connection to 192.168.2.80:49153 failed (No route to host)
[2017-01-21 08:22:27.594093] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 08:22:28.292269] E [socket.c:2178:socket_connect_finish] 0-vmhost-client-1: connection to 192.168.2.80:49153 failed (No route to host)
Formatting 'gluster://artemis/vmhost/images/100/vm-100-disk-1.qcow2', fmt=qcow2 size=26843545600 encryption=off cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2017-01-21 08:22:28.311493] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
TASK OK

root@artemis:~# gluster volume status
Status of volume: vmhost
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick artemis:/bricks/vmhost 49153 Y 5462
Brick apollo:/bricks/vmhost 49153 Y 5128
NFS Server on localhost 2049 Y 5481
Self-heal Daemon on localhost N/A Y 5486
NFS Server on apollo 2049 Y 5142
Self-heal Daemon on apollo N/A Y 5147

Task Status of Volume vmhost
------------------------------------------------------------------------------
There are no active volume tasks

mijohnst · Jan 21, 2017

I did find that that I had a bad IP in the hosts of one of my nodes. I fixed that but still no love...

()

Task viewer: VM 100 - Create
OutputStatus
Stop
[2017-01-21 18:06:36.015340] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 18:06:36.197805] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 18:06:36.490047] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
Formatting 'gluster://apollo/vmhost/images/100/vm-100-disk-1.qcow2', fmt=qcow2 size=26843545600 encryption=off cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2017-01-21 18:06:36.521323] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
TASK OK

mijohnst · Jan 21, 2017

Sorry for all thee posts. I just put up what I'm trying in case others have the same issue later.

I just tried changing the format of the underlying FS that gluster is using. I was using XFS but I changed to ext4. I also updated gluster to 3.8.8. Still not able to create a VM.

[2017-01-21 20:47:51.095096] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 20:47:53.211673] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 20:47:55.216092] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
Formatting 'gluster://artemis/vmhost/images/100/vm-100-disk-1.qcow2', fmt=qcow2 size=26843545600 encryption=off cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2017-01-21 20:47:57.004899] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
TASK OK

From the Gluster log file:

root@artemis:~# tail -f /var/log/glusterfs/bricks/bricks-vmhost.log
[2017-01-21 20:47:54.988505] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-vmhost-server: accepted client from apollo-18447-2017/01/21-20:47:54:966386-vmhost-client-0-0-0 (version: 3.8.8)
[2017-01-21 20:47:55.007263] E [MSGID: 113107] [posix.c:1051osix_seek] 0-vmhost-posix: seek failed on fd 21 length 196608 [No such device or address]
[2017-01-21 20:47:55.007291] E [MSGID: 115089] [server-rpc-fops.c:2007:server_seek_cbk] 0-vmhost-server: 18: SEEK-2 (3f0e6eae-0cf7-41ce-9f28-9ee46ccf64aa) ==> (No such device or address) [No such device or address]
[2017-01-21 20:47:55.225532] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-vmhost-server: disconnecting connection from apollo-18447-2017/01/21-20:47:54:966386-vmhost-client-0-0-0
[2017-01-21 20:47:55.225591] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-vmhost-server: Shutting down connection apollo-18447-2017/01/21-20:47:54:966386-vmhost-client-0-0-0
[2017-01-21 20:47:56.994035] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-vmhost-server: accepted client from apollo-18447-2017/01/21-20:47:56:971161-vmhost-client-0-0-0 (version: 3.8.8)
[2017-01-21 20:47:57.012746] E [MSGID: 113107] [posix.c:1051osix_seek] 0-vmhost-posix: seek failed on fd 21 length 26847870976 [No such device or address]
[2017-01-21 20:47:57.012787] E [MSGID: 115089] [server-rpc-fops.c:2007:server_seek_cbk] 0-vmhost-server: 18: SEEK-2 (3f0e6eae-0cf7-41ce-9f28-9ee46ccf64aa) ==> (No such device or address) [No such device or address]
[2017-01-21 20:47:57.014440] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-vmhost-server: disconnecting connection from apollo-18447-2017/01/21-20:47:56:971161-vmhost-client-0-0-0
[2017-01-21 20:47:57.014476] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-vmhost-server: Shutting down connection apollo-18447-2017/01/21-20:47:56:971161-vmhost-client-0-0-0

dcsapak · Jan 23, 2017

did you try to use the gluster images nonetheless?

i just tried it here, and although i get the same "error" all seems to be working, maybe this is just some erroneous logging from gluster?

mijohnst · Jan 23, 2017

Thanks for the reply dcsapak. I played with this all weekend and still had no luck. I can get an image on the file system and it appears to be the right size, but the VM never shows up in the list of objects. I can see it via the CLI but it doesn't show up in the GUI at all. The storage is there (NAS and Gluster) but the VM's never show up after I run through the creation wizard. I assumed it was because of the errors I was seeing in the logs.

As for my NAS storage, I took one of my servers and rebuilt it with Xenserver and was easily able to create a VM on my NFS storage. I don't want to use Zen because all the VM's that I have saved are qcow2 and I don't want to have to remake or convert them. I also like the KVM structure more. Proxmox is the coolest hypervisor that I've found...now if I can just get it to work!

mijohnst · Jan 24, 2017

Ok, I have this resolved. I'm still getting the errors when creating a VM on gluster, but I changed my browser from Chrome to Opera and it all seems be working fine. I was able to mount ISO files, build a VM and migrate it around. Thanks for test it out dcsapak... Hopefully this helps someone.

dcsapak · Jan 24, 2017

mijohnst said:
Ok, I have this resolved. I'm still getting the errors when creating a VM on gluster, but I changed my browser from Chrome to Opera and it all seems be working fine. I was able to mount ISO files, build a VM and migrate it around. Thanks for test it out dcsapak... Hopefully this helps someone.

glad you have a solution, but it should work on chrome also. maybe you just need to clear the browser cache

mijohnst · Jan 24, 2017

Thanks, I'll try that. I wish I'd tried it this weekend and saved myself a lot of time.

Do you happen to know if Proxmox is using libgfapi or is that something we need to add manually? I guess since Proxmox doesn't really setup Gluster, only supports it, that's something we have to do, correct?

JTY · Jan 24, 2017

Proxmox uses libgfapi

mijohnst · Jan 24, 2017

Thanks JTY... I upgraded to the latest stable version of gluster and all seems well even before I set the '
server.allow-insecure on' option.

vkhera · Jan 25, 2017

Personally I have had just the most horrible slow performance with Gluster. So much so that I decided to stick with local storage and lose the benefits of the shared storage. Also, with two nodes, you really have no safety if one of your nodes dies, and that node happens to be the first node you added to the gluster.

mijohnst · Jan 25, 2017

vkhera, What version of Gluster were you using? I don't think gluster has a 'primary' node. I've test my systems with either one of my hosts going down and all my VM's and shares stayed with nothing more than a few second delay. It just depends on how configure your redundancy.

fabian · Jan 26, 2017

mijohnst said:
vkhera, What version of Gluster were you using? I don't think gluster has a 'primary' node. I've test my systems with either one of my hosts going down and all my VM's and shares stayed with nothing more than a few second delay. It just depends on how configure your redundancy.

IIRC, gluster uses the first node as tie-breaker if the total number of nodes is even. so for two nodes, that means the first one alone is always quorate, and the second one does not count at all

mmenaz · Jan 26, 2017

So you mean that I can't create a 2 storage node with a 3rd cheap node just for proxmox quorum, as I did with drbd9 (that I have to abandon due to licensing changes)? What alternative (complexity and cost wise) do I have? People here just need the possibility to have redondant nodes/storage, HA is not needed, just be able to remove the broken node and keep woring on the survived one until the first node is repaired and can be "plugged" again.

mijohnst · Jan 26, 2017

I'm always learning something news... thanks Fabian.

I also agree with mmenaz. I'm playing with some home servers (although I do have several Linux clusters at work and I'm evaluating Proxmox for that), I just like having the ability to move VM's around and work on the hosts without interruption. I don't really need HA although being able to watch Plex is very important to me.

I am now considering buying a cheap 3rd server though for the quorum... I'm going to tell my wife that it's mmenaz fault.

vkhera · Jan 26, 2017

mmenaz said:
So you mean that I can't create a 2 storage node with a 3rd cheap node just for proxmox quorum, as I did with drbd9 (that I have to abandon due to licensing changes)? What alternative (complexity and cost wise) do I have? People here just need the possibility to have redondant nodes/storage, HA is not needed, just be able to remove the broken node and keep woring on the survived one until the first node is repaired and can be "plugged" again.

Gluster has its very own quorum computation. It is unrelated to proxmox quorum. The latest versions of gluster allow you to use a non-data storage node to add to the quorum decision. If you have a third small node for proxmox quorum votes, then you can use that as well for gluster.

vkhera · Jan 26, 2017

fabian said:
IIRC, gluster uses the first node as tie-breaker if the total number of nodes is even. so for two nodes, that means the first one alone is always quorate, and the second one does not count at all

I think this is a special case for 2 node cluster. You can still have quorum in a 4 node cluster when one node goes away... but you do risk split brain when there is a partition of two and two. Not sure how gluster works in that situation.

But basically, gluster with two nodes is a disaster waiting to happen. Don't do it.

mijohnst · Jan 26, 2017

vkhera said:
Gluster has its very own quorum computation. It is unrelated to proxmox quorum. The latest versions of gluster allow you to use a non-data storage node to add to the quorum decision. If you have a third small node for proxmox quorum votes, then you can use that as well for gluster.

Dumb question, but making the 3rd quorum system as a VM wouldn't work or is a bad idea?

fabian · Jan 27, 2017

vkhera said:
I think this is a special case for 2 node cluster. You can still have quorum in a 4 node cluster when one node goes away... but you do risk split brain when there is a partition of two and two. Not sure how gluster works in that situation.

But basically, gluster with two nodes is a disaster waiting to happen. Don't do it.

yes, of course the tie-braker is only active if the quorum is "tied". not sure if they only use it for two node or also for odd node clusters?

Gluster storage

New Member

New Member

New Member

New Member

Proxmox Staff Member

New Member

New Member

Proxmox Staff Member

New Member

Renowned Member

New Member

Member

New Member

Proxmox Staff Member

Renowned Member

New Member

Member

Member

New Member

Proxmox Staff Member

We value your privacy