Ceph and file storage backend capabilities

Oct 24, 2009
54
0
6
I am planing a buildout and trying to get a feel of what virt software I wish to use (I know it is going to be kvm based). DRBD does not seem to be capable of scaling as I would need it.

I am trying to get rid of the requirement of a san and ceph looks like it can fill that hole quite well, it can do striped blocks across multiple hosts as well as plain file sharing. What I am wondering is two things.

1. Are you looking at rbd (the block device part of ceph) at all as a possible file storage backend (like lvm).

2. If 1 is 'no', then is it possible to live migrate from one host to another with a shared file?
 
ad1: yes, looks interesting, but its still experimental. but also sheepdog is a candidate. the goal is to support all such promising storage back end.
ad2. live migration works, you just need a shared storage (or DRBD)
 
I never used any of them basically yes, in theory.
 
This is something I may test down the line then (2.3 months). The only other thing I can think of that would prevent me from using proxmox is user managment (looking at ganeti too).
 
zero configuration, uses cororsync (proxmox 2.0 is also based on corosync).
Hi Dietmar, thank for your reply.

Wath about performance of Sheepdog for storing the virtual disk of virtual machine (KVM), have you some results to post (eg. test result from a iozone command, or hw configuration used) ?
I am a little concerned about what is written in the disclaimer at the following address http://www.osrg.net/sheepdog/ # id1 "There is no guarantee that this software will be included in future software releases, and it probably will not be included" :-|

I'm testing GlusterFS (the last version):
- it's possible configure all node (peer) and volume with a simple command (only one executable for configure all the system)
- no use of metadata (best performance)
- GlusterFS not use a personal structure for store the file on filesystem of any node (if I use ls command on any cluster-node I can access the file stored directly on the local filesystem of any cluster-node .... sorry for my poor English !)
- scalable (both in terms of disk space that performance)
- NO experimental

What do you think of the solution GlusterFS ?

thanks,
 
Hi,

Sadly haven't had time to build my lab yet but GlusterFS is my first choice ATM for VM and data storage.
Enlightening readings have lately convinced me that file storage was the way to go for VHDs (at least in my use cases: SME).
GlusterFS seems definitively the most scalable AND easy to manage clustered NAS technology I know; at least compared to clustered NFS setups which, anyway, never give you the same bunch of features.
It gives a unique opportunity to virtualize and aggregate the whole storage space available on all nodes with an unequalled (is that english ?) ease, by the same time it allows to design and build all kinds of replication and distribution scenarios.
I really like the fact that it sits on our good ol' FSs and keeps our good ol' tools at hand.
With monitoring tools and remote replication coming in 3.2 series I can hardly think of something missing !

With announced features of Proxmox 2.0, mainly multi-master and HA, I really see it as a perfect match.

My 2 cts
 
Last edited:
Wath about performance of Sheepdog for storing the virtual disk of virtual machine (KVM), have you some results to post (eg. test result from a iozone command, or hw configuration used) ?
I am a little concerned about what is written in the disclaimer at the following address http://www.osrg.net/sheepdog/ # id1 "There is no guarantee that this software will be included in future software releases, and it probably will not be included"

Wath is your opinion ?
thanks.
 
any special reason why you are asking again the same? please do not double post.

if you need details about sheepdog post in their forums/lists, they also have some info about performance, but feel free to do your own tests and give feedback.
 
Why do You think that "no use of metadata" gives You best performance?

I use http://www.moosefs.org/ and I can recommend it, it's easy to set up and quite fast (I'm getting 50-60MB/s for reads on vm).

Sorry, I expressed myself badly.
GlusterFS "creates on the fly" the metadata, thera are not a server dedicated for metadata like moosefs.
There are not a single point of failure.

I've tested the moosefs and I think that moosfs is a great project.

Some questions:
- have you installed moosefs directly on the Proxmox ?
- or are you using separate server for moosefs (how many server) ?
- are you using moosefs with KVM or OpenVZ on Proxmox ?

thank you.
 
Sorry, I expressed myself badly.
GlusterFS "creates on the fly" the metadata, thera are not a server dedicated for metadata like moosefs.
There are not a single point of failure.

Metadata is not the only thing that affects performance and I had bad experience with glusterfs (some time ago, around ~v2.0, it may be different now). No dedicated central server may seem like a good think but it also creates problems:
- split-brain may occur and it may lead to data loss, it can't happen with single central server
- no central server means that You may need to lookup metadata on several nodes, moosefs keeps all metadata on single machine and it is stored in memory so all file lookups are fast
Moosefs does not have HA, You need to manually recover master but there are tools to speed this up, You can (and You should) run mfsmetalogger on another node, this is a kind of slave node and all metadata changes are replicated to that node, if mfsmaster is crashed You may start it on secondary node (the one with mfsmetalogger) and everything will continue to run, some people are using heartbeat to automate this but few times that my master server went down due to hardware failures I was able to bring it online within 15 minutes. For me it was always rock solid.

I've tested the moosefs and I think that moosfs is a great project.

Some questions:
- have you installed moosefs directly on the Proxmox ?
- or are you using separate server for moosefs (how many server) ?
- are you using moosefs with KVM or OpenVZ on Proxmox ?

thank you.

I got 4 mfs clusters, sizing from 4 to ~60 nodes. Proxmox does not work as a mfs storage node, it only mounts mfs and stores KVM images there. What is good is that I can make online copy-on-write snapshot with mfsmakesnapshot and I can scale it easily without restarting anything, just add new mfs storage node and run mfschunkserver on it. Bad thing is that mfsmount is fuse based so it may eat more cpu under high load than nfs or other native filesystems.
 
I got 4 mfs clusters, sizing from 4 to ~60 nodes. Proxmox does not work as a mfs storage node, it only mounts mfs and stores KVM images there. What is good is that I can make online copy-on-write snapshot with mfsmakesnapshot and I can scale it easily without restarting anything, just add new mfs storage node and run mfschunkserver on it. Bad thing is that mfsmount is fuse based so it may eat more cpu under high load than nfs or other native filesystems.

Ok.
You have 4 mfs clusters, how many chunk server have you used to getting 50-60MB/s for reads on vm ??
I understand that the performance depends on many things (number and type of hd, controller RAID, how many ethernet on server...).
thanks.
 
Ok.
You have 4 mfs clusters, how many chunk server have you used to getting 50-60MB/s for reads on vm ??
I understand that the performance depends on many things (number and type of hd, controller RAID, how many ethernet on server...).
thanks.

Proxmox is using 8 node cluster with HP DL180G5, each with 1TB 7200RPM SATA drives (no RAID volumes, each disk is single volume).
MFS splits files (vm images in this case) into 64MB chunks that are spread across mfs cluster, each chunk is replicated according to goal that was set for given file or folder (in my case it was goal 3, so 3 copies of each chunk). When You read file from mfs You are reading multiple chunks from different mfs nodes, so the reads are spread accross whole cluster but not all disks are used at a time. This is also important because with iscsi or other shared block storage single vm can use 100% storage I/O, with mfs You got lower performance for single vm, but in my opinion it scales better, You can have more vm doing heavy I/O at the same time without hogging the entire storage. Of course it depends on the number of nodes, disks and how are volumes configured (RAID or LVM from multiple devices). mfsmount does some caching but the memory it uses never exceeds ~200MB in my case (I remember reading that it should eat as much somewhere on moosefs.org).
 
Proxmox is using 8 node cluster with HP DL180G5, each with 1TB 7200RPM SATA drives (no RAID volumes, each disk is single volume).
MFS splits files (vm images in this case) into 64MB chunks that are spread across mfs cluster, each chunk is replicated according to goal that was set for given file or folder (in my case it was goal 3, so 3 copies of each chunk). When You read file from mfs You are reading multiple chunks from different mfs nodes, so the reads are spread accross whole cluster but not all disks are used at a time. This is also important because with iscsi or other shared block storage single vm can use 100% storage I/O, with mfs You got lower performance for single vm, but in my opinion it scales better, You can have more vm doing heavy I/O at the same time without hogging the entire storage. Of course it depends on the number of nodes, disks and how are volumes configured (RAID or LVM from multiple devices). mfsmount does some caching but the memory it uses never exceeds ~200MB in my case (I remember reading that it should eat as much somewhere on moosefs.org).

Thank you very much.
Only one question: how many Virtual Machine are running on these system ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!