Storage Clustering / Shared File system

somecallmemike

New Member
May 27, 2009
13
0
1
Proxmox team,

Our group has been looking into clustering for some time now, and I have a thought about what we might do with proxmox coming up. We plan on using the Glusterfs file system, which has a client/server setup for shared storage accross multiple servers. There is special "translator" built into Glusterfs that allows a computer to be both a server and a client for the shared storage, kind of like a bunch of computers connecting to each other in a raid 5 fashion, and striping their data across a similar directory, set of drives, or LVM partition. We imagine creating a shared storage system on a number of proxmox servers, where the vz directories are stored in the shared file system and connected to all the machines, creating a true cluster of servers. We are not sure how to get openVZ to play nicely with this setup yet, but any thoughts would be appreciated.
 
We will support shared storage (iscsi, nfs) with the next release. What advantage do you have with Glusterfs?

Also, OpenVZ needs ext3 - else quotas does not work.
 
Hi Dietmar,

We will support shared storage (iscsi, nfs) with the next release.
That is great to hear. Shared storage (iSCSI, NFS, SMB) is a main
stream feature with all other free VM solutions (VMWare, XenServer).

What advantage do you have with Glusterfs?
Hm, I think somecallmemike was aiming a solution similar to what we
have been discussing lately. Think of it as a replicated shared storage.

The idea is to have the same VMs in identical status available on every
VM server. Of course this means multiple storage of the same data but
in some cases (cheap servers) it might be less cost intense to simply put
another two 1 TB SATA HDDs into each server than buying a big SAN with
FibreChannel.

I like the idea but on the other hand I am afraid that even with a dedicated
1 GBit "storage network" you´ll see big speed penalties when replicating all
the changing virtual HDDs over your cluster. I think a timed replication during
the night using a simply chron job with rsync will do a much better job than
the cluster filesystem approach.

Just my two cents,
Holger
 
Hm, I think somecallmemike was aiming a solution similar to what we have been discussing lately. Think of it as a replicated shared storage.

I doubt that glusterfs has a stable data replication technology. Where can I find technical docs/specs about that?
 
I doubt that glusterfs has a stable data replication technology.
Where can I find technical docs/specs about that?
Unfortunately I can not comment on this one. That´s why I wrote "similar"
but not identical :p

I haven´t evaluated any of the filesystems available for now :(

All in all I was thinking about file-based replication only and not a "networked" blockdevice.
 
Last edited:
Hi Dietmar,
i can't give usefull hints, because i don't use glusterfs yet. I have only bookmarked the webpages, because a while ago i'm searching for cluster-filesystems.
Perhaps somecallmemike can tell more about?

Udo
 
The simply copy to all replicated volumes?!

No, not exactly.
When used for replication, it's kind of a synchronous replication system on file level, not asynchronous like rsync.

If you use it between 2 servers you can create a mount point on both systems that consists of a mirrored directory on both (or other) systems. It's kind of like a raid1 between 2 directory's on 2 servers. The advantage of this file-level replication is that you can still access your files on one of the mirrored directory's in case the mount point of glusterfs breaks for some reason.

It still misses some stability in some cases but I think PVE in combination with glusterfs could make a very unique system.
 
I think such a combination makes perfectly sense for scenarios where you like to have the advantage of a "shared storage" without really using a dedicated machine as NAS / SAN. I´m still unsure about additional CPU Load and performance penalty because of replication of each I/O access to the "shared storage".
 
No, not exactly.
When used for replication, it's kind of a synchronous replication system on file level, not asynchronous like rsync.

Exactly. They do synchrounous copy at block level. But how do they do that? Error detection? Split brain detection? ...
 
Hi Dietmar,

while I understand your concerns about error correction methods in Glusterfs what I do not understand is your focus un Glusterfs in this context.

I think the idea using cluster filesystem as VM storage is the key point and what FS is / should be used comes in the second place. There are "tons" of cluster FS out there to be used with Linux. DRDB (http://www.drbd.org/) and Xtreemfs (http://www.xtreemfs.org/) just to name a few. In the first place the usage of such a system would need to be integrated in the Proxmox Server Installation Process. Before this we have of course some evaluation to do. Another option would be to let the user choose between 1 to n options.

Best regards,
Holger
 
while I understand your concerns about error correction methods in Glusterfs what I do not understand is your focus un Glusterfs in this context.

The initial post is about using Glusterfs.

I think the idea using cluster filesystem as VM storage is the key point and what FS is / should be used comes in the second place. There are "tons" of cluster FS out there to be used with Linux. DRDB (http://www.drbd.org/) and ..

I know how DRBD works, so I do not need to ask.

But I do not know how Glusterfs work - that is why is ask.

- Dietmar
 
Exactly. They do synchrounous copy at block level. But how do they do that? Error detection? Split brain detection? ...

At file level you mean I think? No block level is used with Glusterfs
They do it with userland filesystem based on Fuse.

It's been a while since I looked at Glusterfs, but error detection, split brain and resync are things that are dealt with or work in progress.

I understand you concern, but with block level replication like drbd you have just the same concerns. It's just on block level in that case.
 
Wow, haven't checked the forum in months (I recently became a father, so no time to be a nerd!) but I'm glad to see this conversation has grown! Initially our intention was to build a number of boxes with a second array of drives on each box configured to use the gluster file system and present the disk to each box as a shared storage for the open vz clients. I never quite got it working as the proxmox interface did not like the concept of the "same" container on two machines in cluster mode. I had to build a "new" machine on the second cluster box and point the location of the files from the original container to the "new" container's location. The boxes I did this on had single core procs and a small amount of memory so I was unable to really get a good feel for overhead on the file system transfers and replication, but it didn't seem to slow performance of the container in any perceptible way.

Unfortunately we never fully tested/developed this setup as our project migrated toward VMware (the management was enamored by the fancy features) so unfortunately I cannot speak to the issues discussed about overhead, security, split-brain, error detection, or file locking problems. Has anyone on the forum attempted this scenario?
 
Last edited: