Advice needed for a 3-node proxmox HA cluster

Qtchn

New Member
Jan 14, 2023
3
0
1
Hi everybody,

I am planning to implement a home-lab environment that requires fault tolerance to avoid downtime. I don't have a large budget compared to the cost of a "professional" installation for high availability.

I was able to get 3 second-hand servers SuperMicro SuperServer 5019-SL :
I also have the following new SSDs:
  • 3x Samsung 980 pro nvme (nvme- 1To)
  • 3x Crucial CT1000P1SSD8 (nvme - 1To)
  • 3x Samsung 870 EVO (sata - 500Go)
For storage I plan to use nvme SSD for Ceph (2 OSD per node) and a sata ssd to install the system (proxmox).
For the network I plan to use Mellanox infiniband equipment, which seems to have the best bandwidth/cost ratio, with this in mind I thought of two solutions :

Option 1 : Full mesh network with my 3 nodes :
  • 3x Mellanox MCX354A-FCBT pcie card
  • 6x MC2207130-001cables
Pros :​
+ more cost-effective​
Cons :​
- not scalable architecture​

Option 2 : With interconnection switch :
  • 2x Mellanox SX6036
Pros :​
+ the architecture is scalable and I could use dedicated storage nodes in the future and keep the current servers as computing node.​
Cons :​
- I have to deal with the licence to set the switch to ethernet mod​
- increases the total cost​

My first problem is that the motherboard has the following pci lines (and CPU is 16 pcie 3.0 lanes limited) :
  • 1 PCIe 3.0 x8 (in x16 slot) => I plan to install one mellanox MCX354A-FCBT card
  • 1 PCIe 3.0 x8 => I plan to install one Crucial CT1000P1SSD8 with a pcie to m.2 adapter
  • 1 PCIe 3.0 x4 (in x8 slot) => I plan to install one Samsung 980 pro with a pcie to m.2 adapter

According to my research and especially this post, the pcie 3.0 8x port will not be sufficient to guarantee the maximum bandwidth of the dual QSFP+ mellanox cards.

Q1 : Given this limitation, is one of the two envisaged architectures (mesh or switch) preferable, or does it not matter ?
Q2 : If we assume that I manage to obtain a bandwidth of 40Gbs between my 3 nodes, will the configuration envisaged allow me to have correct performance in your opinion or should I abandon my plans for HA :) ?

Thank you for your advice.
 
If you don't really want to deal with a switch, I suggest a 4 x 10GbE network card.

I run a 3-node Ceph cluster on 12-year old servers using a full-mesh 4 x 1GbE broadcast network https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server#Broadcast_Setup

I use the last two ports on the server NICs for Corosync, Ceph public & private traffic (not considered best practice but it works and again this is 12-year old hardware). The first port is used for the GUI and the second port is for VM traffic. Works very well. The key point here is that corosync, public, and private traffic is via a broadcast network. Every server gets a copy of this traffic.

As you know, Proxmox eats consumer SSDs (research it on-line). From my research, the EVO is a good low-cost option as a OS drive due to it's high write endurance. I don't run SSDs in my setup because it's 12-years old. I run SAS HDDs with no issues.
 
Hello @jdancer ,

Thank you for your feedback, I will consider the 10GBASE-T option but all the cards I found with 4 ports are quite expensive such as the intel 710-T4.

I'm worried that my setup won't have good performance because of the small number of nodes and OSDs, so I think it would be better to go for a switch architecture so that I can add more nodes in the future. But the cost of the 10BASE-T architecture seems just as high as with 56gb/s Infiniband, although I would probably have fewer configuration problems with the first option.

Regarding SSDs, unfortunately I can't invest in enterprise grade models. But some recent models of consumer SSDs seem to have an acceptable endurance.
 
I'm worried that my setup won't have good performance because of the small number of nodes and OSDs

Can't comment on number of nodes and OSDs, but beside of those factors, the thing that will kill your performance in the first place are your consumer SSDs.
You definitely want enterprise SSDs with PLP for Ceph and even need them for adequate performance:
https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2020-09-hyper-converged-with-nvme.76516 (Especially the comparison table on page 3.)
 
Hi @Neobin ,

Thank you very much for your reply, I have been doing a bit more research on the subject of disks over the last few days and I have come across this very interesting thread .
I'm thinking of moving to enterprise SSDs, assuming I install a single disk (samsung PM983 3.84TB M.2) per node can I expect decent performance, a single OSD per node on a 3 node cluster doesn't seem to be a good approach.
 
a single OSD per node on a 3 node cluster doesn't seem to be a good approach.

From what I have heard so far, yes.
But, as I said, I can not comment on all of this, because I unfortunately simply have no experience with Ceph.

The recommendation from Proxmox is:
We recommend a Ceph cluster with at least three nodes and at least 12 OSDs, evenly distributed among the nodes.
https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster#pve_ceph_osds

So with 3 nodes, this would obviously be 4 OSDs per node.
If at all and how much you could stretch this for a homelab, I do not know, sorry.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!