Gluster storage

mijohnst

New Member
Jan 21, 2017
20
2
3
51
I'm new to ProxMox and so far it's been fairly strait forward...up to the point of building my first VM. I'm trying to get my storage setup but I'm having issues getting NAS or Gluster working. I have two HP DL380 servers with 5TB in each that I have in a gluster replica. I also have an 8TB Buffalo NAS. Both servers and the NAS have a pair of 1g Ethernet configured into an LAPC trunk on my Cisco switch.

I've created an NFS and Gluster shares in my cluster and they look like they're reading just fine. The gluster pair I have setup is working and the peers are talking but when I try to create anything I see the errors below. I've turned the firewall off but still having same issues. Also, I've checked the status of the bricks after each failure and they're up running fine.

I haven't found any good guides for setting up gluster on Proxmox...and I don't even know what to think about my NFS issue because I use NFS for other things in my home network with no issues at all. Any links or suggestions would be appreciated.

Task viewer: VM 101 - Create
OutputStatus
Stop
[2017-01-21 07:23:11.290547] E [client-handshake.c:1760:client_query_portmap_cbk] 0-vmhost-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2017-01-21 07:23:11.370076] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 07:23:11.378799] E [client-handshake.c:1760:client_query_portmap_cbk] 0-vmhost-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2017-01-21 07:23:11.572154] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 07:23:11.580568] E [client-handshake.c:1760:client_query_portmap_cbk] 0-vmhost-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
[2017-01-21 07:23:13.922711] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 07:23:13.931323] E [client-handshake.c:1760:client_query_portmap_cbk] 0-vmhost-client-1: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
Formatting 'gluster://artemis/vmhost/images/101/vm-101-disk-1.qcow2', fmt=qcow2 size=26843545600 encryption=off cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2017-01-21 07:23:13.950269] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
TASK OK



If I try to use NAS I see these errors:

Task viewer: VM 100 - Create
OutputStatus
Stop
TASK ERROR: create failed - unable to create image: got lock timeout - aborting command
 
I just blew my gluster share away and rebuild it. Still says it's down even though it's not...

[2017-01-21 08:22:19.292270] E [socket.c:2178:socket_connect_finish] 0-vmhost-client-1: connection to 192.168.2.80:49153 failed (No route to host)
[2017-01-21 08:22:19.410851] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 08:22:22.292166] E [socket.c:2178:socket_connect_finish] 0-vmhost-client-1: connection to 192.168.2.80:49153 failed (No route to host)
[2017-01-21 08:22:22.490044] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 08:22:25.292491] E [socket.c:2178:socket_connect_finish] 0-vmhost-client-1: connection to 192.168.2.80:49153 failed (No route to host)
[2017-01-21 08:22:27.594093] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 08:22:28.292269] E [socket.c:2178:socket_connect_finish] 0-vmhost-client-1: connection to 192.168.2.80:49153 failed (No route to host)
Formatting 'gluster://artemis/vmhost/images/100/vm-100-disk-1.qcow2', fmt=qcow2 size=26843545600 encryption=off cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2017-01-21 08:22:28.311493] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
TASK OK


root@artemis:~# gluster volume status
Status of volume: vmhost
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick artemis:/bricks/vmhost 49153 Y 5462
Brick apollo:/bricks/vmhost 49153 Y 5128
NFS Server on localhost 2049 Y 5481
Self-heal Daemon on localhost N/A Y 5486
NFS Server on apollo 2049 Y 5142
Self-heal Daemon on apollo N/A Y 5147

Task Status of Volume vmhost
------------------------------------------------------------------------------
There are no active volume tasks

 
I did find that that I had a bad IP in the hosts of one of my nodes. I fixed that but still no love...


()


Task viewer: VM 100 - Create
OutputStatus
Stop
[2017-01-21 18:06:36.015340] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 18:06:36.197805] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 18:06:36.490047] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
Formatting 'gluster://apollo/vmhost/images/100/vm-100-disk-1.qcow2', fmt=qcow2 size=26843545600 encryption=off cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2017-01-21 18:06:36.521323] E [afr-common.c:4168:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
TASK OK
 
Last edited:
Sorry for all thee posts. I just put up what I'm trying in case others have the same issue later.

I just tried changing the format of the underlying FS that gluster is using. I was using XFS but I changed to ext4. I also updated gluster to 3.8.8. Still not able to create a VM.

[2017-01-21 20:47:51.095096] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 20:47:53.211673] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
[2017-01-21 20:47:55.216092] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
Formatting 'gluster://artemis/vmhost/images/100/vm-100-disk-1.qcow2', fmt=qcow2 size=26843545600 encryption=off cluster_size=65536 preallocation=metadata lazy_refcounts=off refcount_bits=16
[2017-01-21 20:47:57.004899] E [MSGID: 108006] [afr-common.c:4404:afr_notify] 0-vmhost-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.
TASK OK

From the Gluster log file:

root@artemis:~# tail -f /var/log/glusterfs/bricks/bricks-vmhost.log
[2017-01-21 20:47:54.988505] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-vmhost-server: accepted client from apollo-18447-2017/01/21-20:47:54:966386-vmhost-client-0-0-0 (version: 3.8.8)
[2017-01-21 20:47:55.007263] E [MSGID: 113107] [posix.c:1051:posix_seek] 0-vmhost-posix: seek failed on fd 21 length 196608 [No such device or address]
[2017-01-21 20:47:55.007291] E [MSGID: 115089] [server-rpc-fops.c:2007:server_seek_cbk] 0-vmhost-server: 18: SEEK-2 (3f0e6eae-0cf7-41ce-9f28-9ee46ccf64aa) ==> (No such device or address) [No such device or address]
[2017-01-21 20:47:55.225532] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-vmhost-server: disconnecting connection from apollo-18447-2017/01/21-20:47:54:966386-vmhost-client-0-0-0
[2017-01-21 20:47:55.225591] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-vmhost-server: Shutting down connection apollo-18447-2017/01/21-20:47:54:966386-vmhost-client-0-0-0
[2017-01-21 20:47:56.994035] I [MSGID: 115029] [server-handshake.c:692:server_setvolume] 0-vmhost-server: accepted client from apollo-18447-2017/01/21-20:47:56:971161-vmhost-client-0-0-0 (version: 3.8.8)
[2017-01-21 20:47:57.012746] E [MSGID: 113107] [posix.c:1051:posix_seek] 0-vmhost-posix: seek failed on fd 21 length 26847870976 [No such device or address]
[2017-01-21 20:47:57.012787] E [MSGID: 115089] [server-rpc-fops.c:2007:server_seek_cbk] 0-vmhost-server: 18: SEEK-2 (3f0e6eae-0cf7-41ce-9f28-9ee46ccf64aa) ==> (No such device or address) [No such device or address]
[2017-01-21 20:47:57.014440] I [MSGID: 115036] [server.c:548:server_rpc_notify] 0-vmhost-server: disconnecting connection from apollo-18447-2017/01/21-20:47:56:971161-vmhost-client-0-0-0
[2017-01-21 20:47:57.014476] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-vmhost-server: Shutting down connection apollo-18447-2017/01/21-20:47:56:971161-vmhost-client-0-0-0

 
did you try to use the gluster images nonetheless?

i just tried it here, and although i get the same "error" all seems to be working, maybe this is just some erroneous logging from gluster?
 
Thanks for the reply dcsapak. I played with this all weekend and still had no luck. I can get an image on the file system and it appears to be the right size, but the VM never shows up in the list of objects. I can see it via the CLI but it doesn't show up in the GUI at all. The storage is there (NAS and Gluster) but the VM's never show up after I run through the creation wizard. I assumed it was because of the errors I was seeing in the logs.

As for my NAS storage, I took one of my servers and rebuilt it with Xenserver and was easily able to create a VM on my NFS storage. I don't want to use Zen because all the VM's that I have saved are qcow2 and I don't want to have to remake or convert them. I also like the KVM structure more. Proxmox is the coolest hypervisor that I've found...now if I can just get it to work! ;)
 
Ok, I have this resolved. I'm still getting the errors when creating a VM on gluster, but I changed my browser from Chrome to Opera and it all seems be working fine. I was able to mount ISO files, build a VM and migrate it around. Thanks for test it out dcsapak... Hopefully this helps someone.
 
Ok, I have this resolved. I'm still getting the errors when creating a VM on gluster, but I changed my browser from Chrome to Opera and it all seems be working fine. I was able to mount ISO files, build a VM and migrate it around. Thanks for test it out dcsapak... Hopefully this helps someone.
glad you have a solution, but it should work on chrome also. maybe you just need to clear the browser cache
 
Thanks, I'll try that. I wish I'd tried it this weekend and saved myself a lot of time. :)

Do you happen to know if Proxmox is using libgfapi or is that something we need to add manually? I guess since Proxmox doesn't really setup Gluster, only supports it, that's something we have to do, correct?
 
Thanks JTY... I upgraded to the latest stable version of gluster and all seems well even before I set the '
server.allow-insecure on' option.
 
Last edited:
Personally I have had just the most horrible slow performance with Gluster. So much so that I decided to stick with local storage and lose the benefits of the shared storage. Also, with two nodes, you really have no safety if one of your nodes dies, and that node happens to be the first node you added to the gluster.
 
vkhera, What version of Gluster were you using? I don't think gluster has a 'primary' node. I've test my systems with either one of my hosts going down and all my VM's and shares stayed with nothing more than a few second delay. It just depends on how configure your redundancy.
 
vkhera, What version of Gluster were you using? I don't think gluster has a 'primary' node. I've test my systems with either one of my hosts going down and all my VM's and shares stayed with nothing more than a few second delay. It just depends on how configure your redundancy.

IIRC, gluster uses the first node as tie-breaker if the total number of nodes is even. so for two nodes, that means the first one alone is always quorate, and the second one does not count at all ;)
 
So you mean that I can't create a 2 storage node with a 3rd cheap node just for proxmox quorum, as I did with drbd9 (that I have to abandon due to licensing changes)? What alternative (complexity and cost wise) do I have? People here just need the possibility to have redondant nodes/storage, HA is not needed, just be able to remove the broken node and keep woring on the survived one until the first node is repaired and can be "plugged" again.
 
I'm always learning something news... thanks Fabian. :)

I also agree with mmenaz. I'm playing with some home servers (although I do have several Linux clusters at work and I'm evaluating Proxmox for that), I just like having the ability to move VM's around and work on the hosts without interruption. I don't really need HA although being able to watch Plex is very important to me.

I am now considering buying a cheap 3rd server though for the quorum... I'm going to tell my wife that it's mmenaz fault. ;)
 
So you mean that I can't create a 2 storage node with a 3rd cheap node just for proxmox quorum, as I did with drbd9 (that I have to abandon due to licensing changes)? What alternative (complexity and cost wise) do I have? People here just need the possibility to have redondant nodes/storage, HA is not needed, just be able to remove the broken node and keep woring on the survived one until the first node is repaired and can be "plugged" again.

Gluster has its very own quorum computation. It is unrelated to proxmox quorum. The latest versions of gluster allow you to use a non-data storage node to add to the quorum decision. If you have a third small node for proxmox quorum votes, then you can use that as well for gluster.
 
IIRC, gluster uses the first node as tie-breaker if the total number of nodes is even. so for two nodes, that means the first one alone is always quorate, and the second one does not count at all ;)

I think this is a special case for 2 node cluster. You can still have quorum in a 4 node cluster when one node goes away... but you do risk split brain when there is a partition of two and two. Not sure how gluster works in that situation.

But basically, gluster with two nodes is a disaster waiting to happen. Don't do it.
 
Gluster has its very own quorum computation. It is unrelated to proxmox quorum. The latest versions of gluster allow you to use a non-data storage node to add to the quorum decision. If you have a third small node for proxmox quorum votes, then you can use that as well for gluster.

Dumb question, but making the 3rd quorum system as a VM wouldn't work or is a bad idea?
 
I think this is a special case for 2 node cluster. You can still have quorum in a 4 node cluster when one node goes away... but you do risk split brain when there is a partition of two and two. Not sure how gluster works in that situation.

But basically, gluster with two nodes is a disaster waiting to happen. Don't do it.

yes, of course the tie-braker is only active if the quorum is "tied". not sure if they only use it for two node or also for odd node clusters?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!