[SOLVED] Proxmox and GlusterFS

RRJ · Jul 29, 2014

Hello, dear community.

I've been using Proxmox for a while (total of 5 years

atm) and now decided to attach GlusterFS storage to proxmox nodes.

Configuration of GlusterFS: replicated volume of two bricks on different servers. Servers are connected over 10G network. Using LSI hardware raid.
Configuration of GlusterFS storage on Proxmox: mounted using Proxmox GUI.
The problem:
after restarting the server which was used in Promox GUI to add GlusterFS volume, guest qemu VM drops these messages into the logs:

Code:

[ 3462.988707] end_request: I/O error, dev vda, sector 25458896
[ 3462.989474] end_request: I/O error, dev vda, sector 25458896
[ 3763.435225] end_request: I/O error, dev vda, sector 25458896
[ 3987.913744] end_request: I/O error, dev vda, sector 26304696
[ 3987.917413] end_request: I/O error, dev vda, sector 26304720
[ 3987.917716] end_request: I/O error, dev vda, sector 26304752
[ 3987.917728] end_request: I/O error, dev vda, sector 26304792
[ 3987.917728] end_request: I/O error, dev vda, sector 26304848
[ 3987.917728] end_request: I/O error, dev vda, sector 26304880
[ 3987.917728] end_request: I/O error, dev vda, sector 26304896
[ 3987.917728] end_request: I/O error, dev vda, sector 26304912
[ 3987.917728] end_request: I/O error, dev vda, sector 26304960
[ 3987.917728] end_request: I/O error, dev vda, sector 26304992
[ 3987.917728] end_request: I/O error, dev vda, sector 26297448
[ 3987.917728] end_request: I/O error, dev vda, sector 26297408
[ 3987.917728] end_request: I/O error, dev vda, sector 26297384
[ 3987.917728] end_request: I/O error, dev vda, sector 26297312
[ 3987.917728] end_request: I/O error, dev vda, sector 26297272
[ 3987.921830] end_request: I/O error, dev vda, sector 26304696
[ 3997.914129] end_request: I/O error, dev vda, sector 17097768
[ 3997.914982] end_request: I/O error, dev vda, sector 17097768
[ 3997.915640] end_request: I/O error, dev vda, sector 17097768

and acts like this:

Code:

cat: /var/log/syslog: Input/output error
-bash: /sbin/halt: Input/output error

I know that basically GlusterFS client needs only one server to get the configuration file from the GlusterFS server, but it seems like Promox doesn't know about second server?

storage config:

Code:

glusterfs: FAST-HA-150G
        volume HA-Proxmox-TT-fast-150G
        path /mnt/pve/FAST-HA-150G
        content images,rootdir
        server stor1
        nodes pve1
        maxfiles 1

vmconfig:

Code:

#debian7
bootdisk: virtio0
cores: 2
cpu: host
ide2: none,media=cdrom
memory: 512
name: cacti
net0: virtio=42:01:8D:5A:2C:6C,bridge=vmbr0
onboot: 1
ostype: l26
sockets: 1
virtio0: FAST-HA-150G:116/vm-116-disk-1.raw,size=17G

and pveversion:

Code:

root@pve1:~# pveversion -vproxmox-ve-2.6.32: 3.2-132 (running kernel: 2.6.32-29-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-22-pve: 2.6.32-107
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-31-pve: 2.6.32-132
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1

GlusterFS servers version is: 3.4.4

Was looking for answers on forums, google and wiki. Was not able to find an answer.

http://www.jamescoyle.net/how-to/533-glusterfs-storage-mount-in-proxmox this says, that one has to add volume manually to use both servers.

Could someone drop some light on this problem?

[Added later]
seems like someone has the same problem:
http://forum.proxmox.com/threads/18221-Gluster-KVM-problem-on-gluster-node-reboot

fix is here
http://forum.proxmox.com/threads/19058-SOLVED-Proxmox-and-GlusterFS?p=98280#post98280

RRJ · Jul 29, 2014

Re: Proxmox and GlusterFS

Seems like the problem was that I didn't give enough time to GlusterFS servers to sync and restarted the other server

.

So it seems like everything is working as expected.

RRJ · Jul 29, 2014

Re: Proxmox and GlusterFS

Oh, not true.
It seems like after previous restart the VM was mounted from GlusterFS server2 and the server1 restart did not affected it. This time, before restarting the server2, I tested if there was any SYNC jobs before restarting the second one. There was nothing, so I restarted the second gluster server and result is the same.

Code:

cat: /var/log/kern.log: Input/output error

So proxmox VE does see the second gluster peer, but it seems like VM can't failover to it?
Any ideas?

More information:
After VM restart fsck did it job to clean the root FS.

RRJ · Jul 29, 2014

Re: Proxmox and GlusterFS

He-he,
and when I manually mount the glusterfs volume on VE host and add new storage as directory in VE GUI, I face the problem with no-cache and the workaround is to manually edit the qemu conf file and again set the path with only one gluster uri

Do I understand right, that it is not possible to run KVM guest on redundant glusterfs volume that way, so it would automagically failover when one of servers is down?

RRJ · Jul 30, 2014

Re: Proxmox and GlusterFS

up.
anyone?

RRJ · Jul 30, 2014

Re: Proxmox and GlusterFS

yes, seems it is not possible without problems.
http://joejulian.name/blog/keeping-...hen-encountering-a-ping-timeout-in-glusterfs/

The only use of HA NFS or GlusterFS storages for VM-s are:
1. full proxmox HA cluster (which is overkill for the most small enterprises)
2. fast recovery after one of storage-s goes down (one may remount GFS storage from the second storage server and start machines in state they were before first storage went down).

So basically, the main purpose of GFS is distributed storage.

Dont' forget about Backups, Backups, Backups.

joshin · Jul 30, 2014

Re: Proxmox and GlusterFS

RRJ said:
up.
anyone?

What about using the "backupvolfile-server=" mount option?

gluster1:/datastore /mnt/datastore glusterfs defaults,_netdev,backupvolfile-server=gluster2 0 0

And you can change Gluster's ping-timeout... See
http://gluster.org/community/docume...:_Setting_Volume_Options#network.ping-timeout

-Josh

RRJ · Jul 31, 2014

Re: Proxmox and GlusterFS

joshin said:
What about using the "backupvolfile-server=" mount option?

gluster1:/datastore /mnt/datastore glusterfs defaults,_netdev,backupvolfile-server=gluster2 0 0

And you can change Gluster's ping-timeout... See
http://gluster.org/community/docume...:_Setting_Volume_Options#network.ping-timeout

-Josh

Hi, joshin and thank you for your time.

If I mount GlusterFS via fstab than I can't start any KVM guest with cache=none, as GlusterFS does not support direct IO

. Have to choose some cache method and lose in performance. And it is not the default method also. And there is no need, actually, to do so, as Proxmox does see both servers when I mount GlusterFS via the GUI. It downloads from server brick's configuration file and knows, where is the second one. And by the way, I've tried this too with cache=pass through. Same result. It just won't go. After one HA storage node fails, one should manually start VM-s from the other HA storage node. At least this way I don't need to remount the GlusterFS via GUI using the second HA server name to start the VM

I changed the ping-timeout to 3 seconds. No luck.

Other way I've tried to solve this:

I've installed third machine with GlusterFS mounted on it and shared it to Proxmox via NFS. Result is same. If one glusterfs server fails (where the VM is running from of course), VM remounts root FS in RO and keeps crying about IO errors

So... At this moment the only point of any HA storage for me is pretty fast restore of failed VM-s (remount the storage or restart the VM, depends on the method one is using ). Pretty same is said in Proxmox WiKi about DRDB: https://pve.proxmox.com/wiki/Two-Node_High_Availability_Cluster

For this testing configuration, two DRBD resources were created, one for VM images an another one for VMs users data. Thanks to DRBD (if properly configured), a mirror raid is created through the network (be aware that, although possible, using WANs would mean high latencies). As VMs and data is replicated synchronously in both nodes, if one of them fails, it will be possible to restart "dead" machines on the other node without data loss.

And I've found this one. Pretty interesting.

http://supercolony.gluster.org/pipermail/gluster-users/2014-April/039959.html

Also interesting is that there are no replies from proxmox team here. To me at this moment, this behavior seems like a bug.

updated proxmox gluster client version to 3.4.4 like on servers from gluster debian repo. Still no luck.
Seems like GlusterFS and proxmox do not provide HA storage model

by the way! after VM locked its FS to read-only (after one of GlusterFS nodes failure), it won't come up by just simply reseting it via GUI. Have to stop it and start it! Seems like initialization problem like stated before in gluster forums. In console I see: no boot device! while downed Gluster node is up and synced.

My next step was:
I've created a VM on local storage. Added a virtio disk from the Gluster Storage via VE GUI. Mounted. Then rebooted one Gluster Node, where this disk was. Same situation: end_request: I/O error. While trying to remount, it says:
mount: special device UUID=a60204f8-b9ac-472f-ba10-0930db1f626b does not exist
and it sees it only after full restart and bringing back the Gluster storage (or change the gluster node name in GUI)

next step was:
a. Mount glusterfs on VE locally
b. Set cache=write through
c. boot previously locally created VM with attached via VE gluster storage
d. disconnect one of the glusted servers from network
result: same.
Buffer I/O error on device vdb1,
root@gluster-vm-local-test:~# umount /mnt/stor
root@gluster-vm-local-test:~# mount -a
mount: special device UUID=a60204f8-b9ac-472f-ba10-0930db1f626b does not exist
After using the reset button:
mount: special device UUID=a60204f8-b9ac-472f-ba10-0930db1f626b does not exist
After using the stop/start buttons:
got my VM back and didn't had to remount the gluster storage with the second gluster node server ip/hostname (because of backupvolfile-server option in fstab)

RRJ · Jul 31, 2014

Re: Proxmox and GlusterFS

So two things are running out from my tests:

1. glusterfs and kvm = no go for live failover. VMs keep locking FS to readonly after one of GlusterFS nodes is down / the one from VM was booted / (workaround: errors=continue in fstab, but it is dangerous). To get them back have to stop VM, change GlusterFS node name to one that is UP and start it again. Reset is not working neither, after reset it says, that there is no such storage attached. Of course, Proxmox GUI allows to add only one Gluster Server. Setting ping-timeout to 2 seconds does in gluster volumes does not help. Someone should take care of this. Its definitely an initialization BUG! And it seems like it is in libgfapi or kvm.
2. using proxmox GUI to add GlusterFS storage one will have to manually remount it with second Gluster node name in GUI to get machines back if another node fails. Mounting GlusterFS from fstab with backupvolfile-server option works like a charm (same using gluster config file in fstab). Proxmox team should fix this.

RRJ · Aug 4, 2014

Re: Proxmox and GlusterFS

Hey,

I've almost broke my brains.
Have anyone ever managed to set up glusterfs replicated storage that way, so it works with automatic failover ?

I mean this situation:
1. installed replicated storage
2. mounted it in proxmox
3. create a VM on this storage
4. shut one of gluster bricks down.
5. created VM is runnining.
6. power on downed brick
7. wait for the sync on both bricks
8. shut other brick down
9. VM runs.

anyone? please, share the setup configuration.

RRJ · Aug 7, 2014

Re: Proxmox and GlusterFS

If someone ever will find himself in the same situation:
answer is:
gluster volume set <volname> cluster.self-heal-daemon off on the volume, where your VM disks are.
BTW, don't test VMs with files generated from /dev/zero. Some kind of bug, files remain unsynced and you may lose your VM (qemu won't be able to read qcow2 file headers)

uwonlineict · Jan 9, 2016

RRJ said:
Re: Proxmox and GlusterFS

So two things are running out from my tests:

1. glusterfs and kvm = no go for live failover. VMs keep locking FS to readonly after one of GlusterFS nodes is down / the one from VM was booted / (workaround: errors=continue in fstab, but it is dangerous). To get them back have to stop VM, change GlusterFS node name to one that is UP and start it again. Reset is not working neither, after reset it says, that there is no such storage attached. Of course, Proxmox GUI allows to add only one Gluster Server. Setting ping-timeout to 2 seconds does in gluster volumes does not help. Someone should take care of this. Its definitely an initialization BUG! And it seems like it is in libgfapi or kvm.
2. using proxmox GUI to add GlusterFS storage one will have to manually remount it with second Gluster node name in GUI to get machines back if another node fails. Mounting GlusterFS from fstab with backupvolfile-server option works like a charm (same using gluster config file in fstab). Proxmox team should fix this.

- Can somebody tell the status and behaviour is changed, with the setup like this thread, with proxmox4 and glusterfs 3.5.2?
- Is it working stable now?
- What is the preferred configuration for a KVM node with its disk options? Default or write through?

ricardoj · Jul 11, 2019

Hi,

I know this is an old thread but maybe someone else return here in the future.

See this : https://lists.gluster.org/pipermail/gluster-users/2016-November/029144.html

There is also an interesting topic ( how to acces gluster using gfapi via QEMU ) here

And here more details about libgfapi.

Regards,

Ricardo Jorge

Search

Search

[SOLVED] Proxmox and GlusterFS

RRJ

Member

RRJ

Member

RRJ

Member

RRJ

Member

RRJ

Member

RRJ

Member

joshin

Renowned Member

RRJ

Member

RRJ

Member

RRJ

Member

RRJ

Member

uwonlineict

New Member

ricardoj

Member

We value your privacy