HA cluster with ceph or what else?

Kei · Jul 7, 2016

Hello,
I'm willing to setup a Proxmox HA cluster based off three nodes, where one of them is virtualized onto a different host, since it's just for quorum purposes.
I would like to have local redundant storage on both of the two mail nodes (and maybe even the third if needed for ceph), so I'm trying to understand what I would be beter off using, assuming that the options are ceph and DRBD9 (the last one maybe being still too "young" if I'm not wrong).
My questions are: is ceph a good choice for HA clustering? Are there better options? I don't mind it being difficult to set up since I'm not afraid to learn, infact this project is more of a "home lab", altought I would like to build it like I would do for a customer.

LnxBil · Jul 8, 2016

Ceph is a very interesting technology, but it has higher requirements for a performant setup. DRBD is simple and optimized for a two-host setup, whereas Ceph is optimized for at least three nodes up to ... whatever's in your pocket.

Personally, I haven't tried DRBD9 yet, but I'm running clusters with DRBD 8.x for the better part of a decade and it never failed me.

t.lamprecht · Jul 8, 2016

Kei said:
is ceph a good choice for HA clustering?

Yes it is, but ceph is designed for lot of nodes and osds so it shows its advantages there more and better than in a three (or two and a half) node setup, not that it has some perks there also. But that said you can also make a single node ceph instance and use it just fine.

I'd look into GlusterFS or Sheepdog also, GlusterFS works quite reliable as far as I've used it, performance wise its OK, could be better but OK. Sheepdog is made for VMs (Qemu/KVM), and with or setup it works quite out of the box, we currently make and test packets for the new version released about a week ago.
DRBD, as already mentioned is also an option.

If its an home lab and you're willing to learn I would suggest you try all of them, look a little into each one and then use the one which you like best/works best for you.
Ceph is, imo, one of the most interesting to learn as you are able to configure it in many complex ways, also its replication and it other design choices are nice to know.

Kei · Jul 8, 2016

LnxBil said:
I'm running clusters with DRBD 8.x for the better part of a decade and it never failed me.

Maybe I should've mentioned that I plan on using VE 4.2, and I belive that support for DRBD8 is discontinued.

t.lamprecht said:
Ceph is, imo, one of the most interesting to learn as you are able to configure it in many complex ways, also its replication and it other design choices are nice to know.

Ceph does look interesting indeed, the problem is that it's a really new concept for me and I'm struggling to find where I should start to learn it. I've purchased the latest book "Mastering Proxmox - Second Edition" where Ceph is discussed a lot, but I still find very hard to grasp some of the concepts.

LnxBil · Jul 8, 2016

Kei said:
Ceph does look interesting indeed, the problem is that it's a really new concept for me and I'm struggling to find where I should start to learn it. I've purchased the latest book "Mastering Proxmox - Second Edition" where Ceph is discussed a lot, but I still find very hard to grasp some of the concepts.

Just try all methods as @t.lamprecht suggested. GlusterFS is also a working setup. I haven't used GlusterFS with Proxmox itself, but with PostgreSQL and it works also fine.

Alessandro 123 · Jul 8, 2016

LnxBil said:
Personally, I haven't tried DRBD9 yet, but I'm running clusters with DRBD 8.x for the better part of a decade and it never failed me.

Could you please share some details about your DRBD infrastructure?
How do you manage/avoid split brains?
If you used DRBD for about 10 years with no issues, you should have a very good infrastructure.

LnxBil · Jul 8, 2016

DRBD is only for two servers (in the free version), so there no big infrastructure necessary. We used the setup in the beginning with desktop grade hardware. We hat 2 dedicated interconnects and two DRBD volumes. Each server had 10 disks with software raid. Our whole stack was

Disks -> Software RAID (2x RAID 5) -> 2x DRBD -> 2 Volume Group -> Volumes

On top was XEN, later Proxmox and the interconnect was directly connected without switch. We ran in multi-master node with cluster LVM on top and heartbeat with XEN to have HA. We only had split-brains because of tests in the beginning. In the end we had the same infrastructure as we have now with our multi-node Proxmox cluster with a FC-based SAN: clustered LVM. The problems with accessing the same physical block from different machines is still present, but never occurred in real life. A split-brain on the disks also never occurred because of the direct connection. This was only interrupted when on host was down. DRBD automatically resyncs even with a hard reset of power failure, so here was never a problem.

Today, we still run Proxmox on two APU boxes from PCEngines as a HA firewall solution integrated in our big cluster. Each box has one mSATA SSD, which is mirrored to the other box with DRBD. Boxes are running for over a year now without any problems.

Alessandro 123 · Jul 8, 2016

So are you exporting DRBD volumes via FC SAN ?

LnxBil · Jul 8, 2016

No, we now have a real SAN with multiple controllers, multipath switches, etc ... whole package, but for the two APU-boxes, we still use DRBD. I just integrated the two machines into the big cluster to have one management interface. We have different HA-Groups for each VM and also different storage groups for different parts of our cluster.

Kei · Jul 10, 2016

Sorry, just a clarification. At minimum, Ceph requires three nodes, is that correct? So this means I can't have the third "light weight" server just for quorum, but it also must replicate the Ceph cluster, or at least a part of it. Am I right?

LnxBil · Jul 10, 2016

You have also to get quorum with Ceph (monitor nodes), so yes.

Kei · Jul 10, 2016

LnxBil said:
You have also to get quorum with Ceph (monitor nodes), so yes.

I understand. Onthe other hand, DRBD9 would allow me to have redundant storage in two nodes only, but it also seems to require a separate drive, meaning you cannot have Proxmox and storage on the same phisical media, which would note be great for me. Is it the same for Ceph too?

LnxBil · Jul 10, 2016

Technically you can, but due to performance reasons it is not the best option. Same or even worse for Ceph.

Ceph and DRBD will be fast with a mirrored SSD as cache and some data disks. SSD an be used for the OS. If you only have, e.g. two disks per server with BBU hardware raid 1, you can create volumes or partitions for DRBD, that's technically not a problem.

spirit · Jul 15, 2016

Kei said:
Sorry, just a clarification. At minimum, Ceph requires three nodes, is that correct? So this means I can't have the third "light weight" server just for quorum, but it also must replicate the Ceph cluster, or at least a part of it. Am I right?

you can have a third light node for mon service only without any storage, and use 2 others node with storage with replication x2 (osd daemons) + mon daemon

udo · Jul 15, 2016

spirit said:
you can have a third light node for mon service only without any storage, and use 2 others node with storage with replication x2 (osd daemons) + mon daemon

Hi,
but an replica-2 ceph cluster is not realy recommendable... if in both nodes one OSD die, you will have data lost (because ceph spread all data over all OSDs). The only valid replica-2 ceph cluster is with raided OSDs (which have other drawbacks...).

Udo

Search

Search

HA cluster with ceph or what else?

Kei

Well-Known Member

LnxBil

Distinguished Member

t.lamprecht

Proxmox Staff Member

Kei

Well-Known Member

LnxBil

Distinguished Member

Alessandro 123

Well-Known Member

LnxBil

Distinguished Member

Alessandro 123

Well-Known Member

LnxBil

Distinguished Member

Kei

Well-Known Member

LnxBil

Distinguished Member

Kei

Well-Known Member

LnxBil

Distinguished Member

spirit

Distinguished Member

udo

Distinguished Member

We value your privacy