Ceph vs GlusterFS

Proxmox India

Member
Oct 16, 2017
46
4
13
50
Bangalore
Hi,

We have been using GlusterFS with Proxmox VE on a 3 Host cluster for close to a year now for low IOPS VMs without a issue.

Now we plan to make a new proxmox cluster for a customer with Xeon E5-200 v4 -20 Core, 256GB DDR4 RAM, 120GB ZFS SSD Mirror for Proxmox, 1.92TB SSD * 4 (VM Storage) + 1.8TB 10K RPM SAS * 8 in each server for storage pools and 40Gbps QDR Infiniband for the cluster / storage traffic.

What would perform better ?

now that gluster supports sharded volumes and vm files can be split into smaller chunks the issue of syncing a single large file is no longer there.

Which is easier / faster to recover from a disk crash

How much of a performance penalty does erasure coded volumes have vs replicated volumes.

what about maintaining multiple snapshots of VM in a secondary storage out of the storage cluster (ceph or glusterfs) like pve-zsync (using ZFS). multiple snapshots every 15 mins for 1 day, 4 hours for a week, weekly for a month etc...
 
When you get this set up and tested, please let us know the results. Everything about this interesting subject is too old.
 
Its been a while since I last gave Gluster a go, but recovering from faults is why I stopped using it. When things go bad they can get very bad very fast. My experience was bad enough that I've not yet been willing to rely on it.

I've been running with Ceph for a couple of years now, started with Hammer and now to Luminous. It can be a PITA to set up but I've had several faults (disk failures, sudden catastrophic power loss, general pilot error during host upgrades, etc). In all cases was able to restore relatively painlessly. Performance isn't excellent - but is consistent as you grow.

One things I will note is the overhead of replicated storage. Erasure code pools are now fully supported, even for RBD (block storage for VMs). But getting effective use of erasure code pools AND maintaining host level resiliency(*) requires fairly significant size deployments. Too large for most lab and SoHo use cases. And if you are larger than that you are probably not on this forum asking this question this way :)

(*) maintaining host-level redundancy means being able to lose a host and still have a fully functional, accessible pool. If you are willing to accept losing access to your pool - but not losing data - then EC pools can be done on smaller systems.
 
LizardFS could also be an alternative.
Way easier to configure (and understand) than Ceph and probably much more flexible.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!