Cluster setup advice wanted

tb303

Member
Dec 23, 2021
3
0
6
52
Hi,

I'm looking for some advice. We are a small ISP and VoIP-provider.

In our DC's we are now running ˜280 VM's on VMware with vCenter and vSan on 12 servers and we are thinking of migrating to Proxmox Enterprise. As the current hardware is stil fairly new, we would at least like to try to reuse our current servers as much as possible.

Each of the 12 servers has roughly the following specs:

1U Server
2 x Intel Xeon Gold (most of them are 5220R)
16 x Certified 32GB DDR 2666MHZ ECC REG, some have 8x32GB
6 x Intel 1.92TB S4510 SATA SSD + 1.9TB NVME PCIe SSD (cache disk)
1 x Intel X520-DA2 dual 10GB SFP+

So each host contains it's own storage bundled in a vSan. We are able to expand each host with additional 4x 1.92GB disks.

There is roughly 6 hosts in each of our 2 server DC's
We have virtually "unlimited" acces to 10GB DWDM (redundant routes) between the DC's

The VM's are situated in separate vlans, some clusters of VM's (in the same vlan) are situated on one DC, some in the other.

Redundancy and wel as VoIP (SIP UDP RTP) is obviously quit important for us.

What would be the best way of taking this on with Proxmox, network (in regards to openvswitch) and storage wise in our specific situation?
 
Well....
first.... get at least a subscription which includes support direct from Proxmox....
We made nearly the same move a couple of month ago.
HyperV->Proxmox 7.0 (now 7.1-7)

First we cleared some of our HyperV 13 Nodes and installed a new Proxmox-Cluster with 3-Nodes.

Migrated VMs after configuring everything. We used CEPH with some specific CRUSH-MAPS to keep data on specific Nodes with identical SSDs. Like we had before with S2D in 4-Way-Mirror.

We ended up in a 13-Nodes-Cluster, divided into 2 Datacenter. This was stable until one day where we had a CoroSync-Hickup... known bug in older Version. However we decided splitting it into smaller clusters and wait for the currently not ready yet Multi-Cluster-Manager will be the more stable approach. Cause if anything in your large Cluster goes wrong, your whole datacenter can be down.... so we walked the small island way....

So my recommendation.... 12-Server makes 4 3-Node-Clusters with CEPH. Use a FAST ProxmoxBackupServer for Migrate between clusters if necessary, cause you can not live-migrate between Cluster-Islands... but this is on the Roadmap I guess....

Get dedicated LANs for Corosync, Storage and VM-Client-Traffic and if you have, one for "management/backup"

When you create 3-Node-Clusters with CEPH, you can cross-connect the NICs. This is described somewhere on Proxmox-Wiki. So no need for any switch for Storage-Network.

And test your first deployment... really! test it.... crunch Disks.... Unplug a Node.... "push" up the Memory usage to its real limits.... Cause then you will get a very good idea, where are the Limits of Proxmox.... there aren't many.... but better test and be sure....

Oh, and there are dozens of parameters which are in default work, but if you really want to push it to the limits... there are hundreds of changes to get Proxmox and even Ceph unbelievable Snappy..... ;-)
 
Oh, and we learned: If your VMs all have different IPs/Subnet using the last three octets for VM-ID works great, even when moving VMs "offline" between the cluster-islands. No doubleID ever.... Like 168178001 (for 192.168.178.001) or 010222005 (10.10.222.005)
 
And even for a small ISP its no shame to get professional help with this.... we are also.....
 
How is the latency between those 2 datacenters?
This is the basis for any further recommendations we can make. A PVE cluster requires rather low latency (< 2-4ms) between all nodes of the cluster.
If you plan on using Ceph, the latency should be even lower.
 
The latency between our DC's is even below 1ms, so we are good there. We light up and mux our own dark fibers and we have 2 fully redundant DF paths between the DC's.

My main questions are:
1. Could we do without a centralised CEPH storage cluster and keep using the disks already in the servers and is that a wise thing to do?
2. itNGO recommends to create seperate clusters. This has some disadvantages, but I can understand why he recommends it. Redundancy is an important factor for us. What impact does this have on openvswitch?
3. What is the recommended way of networking this hardware over 2 DC's and what if we do build separate clusters?

Off-course we will purchase Enterprise licenses for all servers. These are more or less pre-sales questions. We are not even running a POC yet. I want our engineers to draw this up first and use the best-practice methods.
 
Last edited:
The latency between our DC's is even below 1ms, so we are goed there. We light up and mux our own dark fibers and we have 2 fully redundant DF paths between the DC's.

My main questions are:
1. Could we do without a centralised CEPH storage cluster and keep using the disks already in the servers and is that a wise thing to do?
2. itNGO recommends to create seperate clusters. This has some disadvantages, but I can understand why he recommends it. Redundancy is an important factor for us. What impact does this have on openvswitch?
3. What is the recommended way of networking this hardware over 2 DC's and what if we do build separate clusters?

Off-course we will purchase Enterprise licenses for all servers. These are more or less pre-sales questions. We are not even running a POC yet. I want our engineers to draw this up first and use the best-practice methods.
Hi,
1. Yes, you can work with local storage without using CEPH. But if you Migrate VMs between the nodes, it will take much longer, as storage has to be moved to the target-node too. With CEPH your storage is shared and in 3-Node-Config you have 3-Way-Mirror. In my opinion the Hyper-Converged-CEPH-Proxmox-Cluster is a very good way to go. No need for a dedicated extra CEPH-Storage-Backend-Cluster....

2. That's cause we had some bad experience with a big cluster. (But maybe we where just to dumb in configuring it!?) You can also do 2x6Nodes or even 12 Nodes. However, due to majority-cluster-logic it would be very, very wise to have an uneven-node-cluster to easier reach quorum.
Regarding opnvswitch have a look here... https://icicimov.github.io/blog/vir...tenant-isolation-in-Proxmox-with-OpenVSwitch/ Maybe that gives you a hint....

3. If you put 2 3-Node-Clusters in each DC the Link between your DCs is at least just for migrating Storage and Backups between DCs. If you have big cluster, then you need at least same bandwidth and low latency between DCs as in the single DCs. At least 10GBit, but I would really go with 25 or 40Gbit when you are running 200+ VMs.

If you can get 3 of your 12 nodes empty, you can build a POC and decide/draw with this first "Beta-Setup" to go for larger or smaller clusters....

PS: you can PM me, if you wan't to see comparable Data-Centers in a remote-session.... we are running several...
 
Last edited:
Hi iTNGO, that's a very nice offer. After the hollidays I might take you up on that :)

1. We want to use CEPH or Gluster (CEPH seems to be the recommended approach) but I was wondering if I could use the local disks of each system, or if a centralised CEPH storage would be better.

2. OK clear. thanks for that.

3. We are now using VMware with 6 machines in one hardware DC, 6 machines in the other and a witness/management server in the middle (our connectivity DC).
For the VM's there are now 2x10GB available and that seems to be more then enough for vmware. We have a lot of room for extra 10GB DWDM's.
 
Hi iTNGO, that's a very nice offer. After the hollidays I might take you up on that :)

1. We want to use CEPH or Gluster (CEPH seems to be the recommended approach) but I was wondering if I could use the local disks of each system, or if a centralised CEPH storage would be better.

2. OK clear. thanks for that.

3. We are now using VMware with 6 machines in one hardware DC, 6 machines in the other and a witness/management server in the middle (our connectivity DC).
For the VM's there are now 2x10GB available and that seems to be more then enough for vmware. We have a lot of room for extra 10GB DWDM's.
Hi, you are welcome.

1. Even with CEPH you can use the "same" local disks. In HyperConverged they are used by CEPH and you configure the type of replication. 3 or 4 Way mirror and so on.
2. You are welcome
3. That's something that is Proxmox missing a little bit. A dedicated "Witness" like VMWARE or HyperV can have. But if you have a "middleDC" you can place a very small but comparable connected Server to be your "uneven-witness-qourum-member". But for each cluster you need a dedicated one. Or, but maybe a bit risky and possible not really supported, have a CoroySync-Node as VM on one 3-Node or 6-Node-Cluster and a second on the other 3 or 6-Node-Cluster and so on. So each cluster uses a Vote-VM from another Cluster. Just a crazy idea if you don't want more hardware....
 
Ceph requires low latency, so you'll have to test it to see if you get acceptable performance.

If you have a separate 13th node that can act as QDevice/Ceph Monitor, you should be good to go creating a single cluster.

Ceph can be quite flexible, and some users have 4/2 (size/min_size) setups configured. This together with an additional monitor in neither of the 2 DCs, you can make sure that VMs can be started in the other DC if one is offline.

For a cluster of this size, 10G might not be enough for the best Ceph performance.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!