HA and multiple nodes

Alessandro 123 · May 22, 2016

Hi to all, I'm new and I'm coming from XenServer

Actually I have 8 XenServer nodes with no shared storage and about 100-150 virtual machines.

I would like to move everything to proxmox/kvm with shared storage (deb's?) for HA and live migration.

Would be possible, with drbd 9 to create and 8 nodes cluster (plus a ninth as spare) with no network storage?
I would like to keep local storage as actually but replicated to allow automatic failover

avoiding access to a network storage should decrease latency and network requirements (having all 9 servers access to the same storage would require at least multiple 10gb connections to avoid bottlenecks)

What do you suggest? Should i use drbd or gluster? Lvm or qcow images?

proxtest · May 22, 2016

Better use Ceph! Thats realy cool stuff if u have a few disc per server!

Alessandro 123 · May 22, 2016

Any suggestions about a good architecture with ceph and proxmox?

proxtest · May 22, 2016

1 Disc for system
all other for Ceph as JBOD.
SSD for Journal if u want.

But up2u!

Good source is here:

http://www.sebastien-han.fr/blog/

Alessandro 123 · May 22, 2016

I don't want to run VM on the same ceph nodes. I prefere to have ceph separated by the rest.
Ceph requires too many resources (in example, 1GB ram for each OSD) and having 12 osd for each server mean wasting 12GB ram just for osds

Alessandro 123 · May 22, 2016

Anyway, aren't sata disks too slow in small environment?
to get good performances with sata disks a lot of disks are needed
Actually i have 8 servers with 8 SAS 15k in raid6 on each, move everything to ceph with suggested hardware and sata disk would slow down everything

proxtest · May 22, 2016

U can speed up the latency with using 1 ssd for 4 or 5 spinners.

But anyway the hardware will hit u with drbd also and gluster too.

Ceph scales up, more discs - more speed, more nodes - more speed. (theoreticaly)

I think your limit is the gigabit network.

Alessandro 123 · May 22, 2016

Drbd should be faster as don't has to distribute or split in chuncks and so on.
It will replicate and nothing more and i will use sas disks with it.

Alessandro 123 · May 22, 2016

Additionally, with DRBD I can create multiple 2 nodes cluster with DRBD on each server.
Proxmox1 has DRBD volume1 as master and DRBD volume2 as slave.
Proxmox2 has DRBD volume1 as slave and DRBD volume2 as master.

Half VM will run on Proxmox1, the other half will run on Proxmox2.
In case of failure, I can automatically migrate the 'failed' VM to the other node.

Normally, both nodes would use local storage (that is replicated by DRBD) and network is not involved. I could also use dual 10GbE for DRBD with no switches but direct connections between each nodes.

What I don' understand is how to increase this infrastructure to 3 nodes like https://pve.proxmox.com/wiki/DRBD9 If I understand properly, with this setup, will only increase the replica count, not the space available. All VM all replicated across all nodes, right ?

Alessandro 123 · May 22, 2016

Probably, a 3 nodes cluster with DRBD could be usefull to balance VMs across al nodes.
In example:
With 2 nodes and 30 VM:
15 VMs on node1, 15 on node2.
In case of a server failure, all 15 VMs must be moved on the alive host that has to manage a double load (30VM insted of 15)

With 3 nodes and 30 VM:
10 VMs on node1, 10 on node2, 10 on node3
In case of a server failure, only 10VMs are migrated. 5 on node1 and 5 on node2. Node1 and node2 has to manage 20VMs and no 30

Is Proxmox smart enough to migrate VMs automatically across the whole pool?

Alessandro 123 · May 25, 2016

No other suggestions ?

Alessandro 123 · Jun 2, 2016

No one? Is DRBD the suggested storage for a 3 node cluster? What If I have to increase the cluster to a 5 or 10 nodes? Is DRBD9 still the best solution or should I look at Ceph/Gluster?

I would like to create this cluster in next few month, starting with 3 (or probably 5 nodes) up to 10 or more.
Our cluster *must be* 100% HA and redundant. Any suggestion?

Anyone willing to share info, configuration and experience for a cluster with at least 5 nodes?

fabian · Jun 2, 2016

DRBD 9 is currently only available as technology preview and thus not covered by the subscription support. I would not recommend using it in production in its current state.

Alessandro 123 · Jun 2, 2016

fabian said:
DRBD 9 is currently only available as technology preview and thus not covered by the subscription support. I would not recommend using it in production in its current state.

Any date about it's support?
Other suggestions without drbd?

fabian · Jun 3, 2016

ZFS if asynchronous replication is enough for you, NFS or iSCSI if you want shared storage, Ceph if you want distributed. You can take a look at various forum threads here which should give you an idea of the individual pro and cons.

Alessandro 123 · Jun 3, 2016

Async replication is not good, we need HA, not disaster recovery. You can't do HA with async replication, because if primary node fails, you won't be able to start the VM on a secondary node due to outdated data.

NFS and iSCSI are not storage, but protocols. I'm trying to know which is the best storage system for proxmox and HA.
Cephi is better than Gluster for this? Last time i've used ceph it was very hard to manage, full of log to check with tons of debugging texts. Gluster is much more easier and it will log usefull information. Is all is going bad, with ceph you loose hours to just know what is happening

CBdVSdFSMB · Jun 3, 2016

One quick question to answer first: Is GlusterFS still a technology preview in Proxmox? Maybe the wiki is outdated, but I stumbled upon it and it definitively says "technology preview" in the actual version: https://pve.proxmox.com/wiki/Storage:_GlusterFS

Alessandro 123 said:
Last time i've used ceph it was very hard to manage, full of log to check with tons of debugging texts.

Ceph is integrated into Proxmox Webif completely - you shouldn't see much of the debugging texts (as well as from Gluster). And Ceph explicitely says in its documentation, that doing disaster recovery with Ceph is no fun. The question is: Why should you do that? Once one node is lost, just replace/repair it, add it again to the cluster and you're fine (?). When your whole rack catches fire and all 8 machines get destroyed or damaged, you hopefully have a separate backup ;-)

And to mention/confirm it finally: DRBD is definitely not an option in its actual development stage (been there, tried that). But also to mention it: They try to catch up with other distributed solutions by being able to use more than 2 nodes for replication.

As I don't use both them but read a lot about them, I would tend to Ceph as it is/was from the beginning an object storage system, designed especially for hosting VMs on top of it in HA-mode. Gluster added the object support later and from the start was more or less a distributed filesystem (for big and redundant storage over multiple servers). Also the argument to add some SSDs for easy and massive performance increase, speaks for Ceph.

proxtest said:
I think your limit is the gigabit network.

With 8 nodes - absolutely!

But in the end I think there is no definitive answer or much better/worse solution. You can find as many articles/tests as you want like this one: http://www.networkcomputing.com/storage/gluster-vs-ceph-open-source-storage-goes-head-head/8824853

EDIT: I think I found somewhere, that Linbit plans to reach final state for DRBD9 within this month. Unfortunately I can not find it again.

Cheers,
Johannes

Alessandro 123 · Jun 3, 2016

Currently i don't need object storage as i have to only store VM images
Ceph is an object storage, gluster is a distributed filesystem.

gluster seems to be more ideal for vm hosting as there is no need to convert a huge file in small objects

ceph is integrated in the web interface but the cluster must be created manually and is much more complicated than gluster allowing a huge number of unneeded features and if anything gonna bad you still have to check at log file to see what is happening. The webinterface doesn't help you in this and ceph logs are full of debugging texts.
Last time I've used i wasn't able to know why a node was failed.
Logging too much is equal to log none.

udo · Jun 3, 2016

Alessandro 123 said:
Currently i don't need object storage as i have to only store VM images
Ceph is an object storage, gluster is a distributed filesystem...

Hi ceph is an objectstore, yes - but the VM don't care about this. (the VM disk will spread over many many 4MB Chunks).

ceph is ha (if you do all right ;-) ). The single thread performance is not so good like with drbd8.

I have 8 ceph osd-nodes running (not with pve) and the ceph-mons on the pve-nodes.

Udo

Alessandro 123 · Jun 3, 2016

@udo could you please share some details about your infrastructure? Hardware/software/network?

HA and multiple nodes

Well-Known Member

Active Member

Well-Known Member

Active Member

Well-Known Member

Well-Known Member

Active Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Proxmox Staff Member

Well-Known Member

Renowned Member

Well-Known Member

Distinguished Member

Well-Known Member