PVE HCI Network and Storage Overview

baligh.bedewi

New Member
Mar 3, 2021
3
0
1
37
Dear Everyone

I'm building HCI using Proxmox as hypervisor, CEPH cluster as storage. My environment should be running around 40-60 VMs for production, my up-link is 100Mbps

Proxmox is totally new for me and therefore I need some advise, clarification and recommendation for my setup

My network runs in full fail-over (Active/Passive) with cross-cabling to provide full redundancy

for this setup I'm using the following

  • 2x ToR Switches, each is 2Tbps bandwidth, stacked with 100Gbps supports 1/10/40/100Gbps
  • 4x identical nodes with the following
    • 4x Dell PowerEdge R740xd
      • 2x 28 core Intel Gold CPU
      • 12x 64GB
      • 2x 300GB SSD (RAID1) for Proxmox OS
      • 6x 7.68TB SSD (non-raid) for CEPH cluster
      • 4x 1Gbps NIC
      • 4x 25Gbps NIC

My concerns are the followings

1- Networking (attached quick diagram for your references)
  • Proxmox MGMT (Is MGMT the only use of this interface? can I bond 2 cables where 1 is active and second is passive?) my plan is 2x 1Gbps interfaces
  • Proxmox Cluster (Node sync only?) I could see while I join nodes that i can use more than 1IP, which one is better, bond active/backup with single IP or two separate interfaces with 2 IPs? my plan is 2x 1Gbps interfaces
  • CEPH Cluster/Private (heartbeats, replication, OSD... What else?) I'm planning to have 2x 25Gbps bond active/backup
  • CEPH Public (VMs accessing their storage using that ?) I'm planning to have 2x 25Gbps bond active/backup
I want to have more clear picture upon those 4 points and whether I can go with this setup or do i need to consider different speed, bond or even adding extra interfaces

I want to run very stable and quick environment without latency

In the diagram, the straight line connected to the active switch, while the doted line connected to the passive switch

2- Storage
  • When configuring the ceph cluster, what is the replication, I'm aware of RAID and I'm considering RAID1 out of my 24 disks, therefore what is the replication number and minimum if i want to use half of my space or what is the best practice/ use cases in this case
  • Part of the running VM requires hot storage where they will be hosted on the CEPH OSD SSD, and cold storage (spinning disk) that will be hosted in separate storage and therefore what type of storage is recommended to use... e.g SAN/NAS, do i need to configure RAID on the hardware level or I can use Proxmox software raid and what is the connectivity speed in redundancy and on which VLAN i should connect them within the 4 mentioned point above, if possible which model i can use from Dell

3- Backup
Does Proxmox backup is a separate solution where a server required to be installed, or it is a feature to be enabled on the Proxmox VE itself

Sorry guys if any of this questions are duplicated, we are going to invest more than 300K$ and we need to make sure we are not going with wrong setup

Thanks
 

Attachments

  • Design.jpg
    Design.jpg
    75.8 KB · Views: 81
Hi
For Proxmox Backup is a dedicated solution. You can run it baremetal on a physical server or on a virtual machine.
If you run it on a virtual machine consider that running Proxmox Backup in this mode can be hazardous in case of crash of the Promox VE ...
Promox Backup is not a functionnality of Promox VE. It's a dedicated OS for backup purpose.

Considering bonding, there are several types of bonding. The most efficient (and normalized) is the 802.3ad one also known as LACP (https://fr.wikipedia.org/wiki/IEEE_802.3ad). A LACP bond agregates several physical ports into one logical ports and load balance trafic thru the "lines" contained in the logical ports. If one physical failed, the logical contines its work on the lines that are still up. Thus, LACP use the whole capacity of trafic of all the physical ports and backup physical ports. BUT to use it in the right way, your switch has to support the 802.3ad spécifications.

Regards
 
Hi Pierre-Yves,

Thanks a lot for your reply,

For the bonding I'm going to have LACP since I'm running active-active setup, regarding the backup I'm considering dedicated hardware, can you please advise which VLAN does this back should operate, Proxmox cluster, CEPH cluster or CEPH public? or could it be on separate VLAN to reduce latency on network?

Regards
 
Usually, the SAN (Ceph in your case) operate with dedicated switchs.
Your network design seems ok to me.
But it does not show the switchs.
Switchs are very important in cluster design as they carry all the informations between compute hosts and storage hosts (Ceph).

That traffic must be quick, fluent and redondant.
Corosync network is important too as it carry the administrative tasks of the cluster.

Be careful with the selection of the network switchs. You must consider total bandwith, compute capacity, motherboard failover as well as number of ports and speed that are often the main technical elements. If your 24 switch cannont operate at full capacity when all your ports are using max bandwith, this not a right choice. In that context, main brands (Cisco, HPE, ...) are often a good choice if you take the right model for your needs.

Regards
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!