Moving from VMWare to Proxmox - Need advice on the hardware configuration

sabari

New Member
Nov 14, 2023
5
1
1
Hi

We have 7 machine cluster with vSAN with the following configuration.

HP ProLiant DL380 G10 24-Bay Server with 2.5'' Bays
- 2x 3.10Ghz Gold 6254 18-Core Processors - Total of 36x Cores
- 24x 32GB PC4-2933Y RAM - Total of 768GB Memory
- 3x 1.92TB SSD SAS 2.5'' 12G + 21x 2.4TB 10K SAS 2.5'' 12G
- 2x 800W Platinum Power Supplies with 2x Power Cords
- P408i-a RAID Controller 2GB Cache
- No Optical Drive
- Network Interface: 1x Dual-Port 10GB SFP+ (HP_548SFP+)

Now have to migrate these to Proxmox. Will get a POC of similar configuration with 3 nodes, setup and configure CEPH. Now after going through the forums, I see that P408i is not good with CEPH. Can you suggest good HBA which would help me connect the 24 drives? Will HPE H240 support my requirements

Also planning to add HP NS204i-p OS Boot Device with 2x 480GB M.2 NVMe SSDs for Proxmox install which would eventually let me use the hard drives completely for Ceph

The above configuration is for vSAN with 3 disk groups, each having a cache SSD and remaining as capacity drives. How do I structure the same for Ceph or does it require any modification?
 
  • Like
Reactions: Johannes S
This is how i would set it up
1. direct disk no raid card for ceph disk (OSD)
2. minimum 5 ceph dedicated node - i called it storage node
3. 3 or more compute node
4. each node has at least 2x100G and 1x1G (for corosync 2nd network)
 
Migrations like this are such a headache… When I was dealing with one before, I found a pretty handy tool called Vinchin or something—it could migrate VMs along with their data directly to a new environment without having to shut down the services. They offer a trial version for free on their website.
 
Migrations like this are such a headache… When I was dealing with one before, I found a pretty handy tool called Vinchin or something—it could migrate VMs along with their data directly to a new environment without having to shut down the services. They offer a trial version for free on their website.
This is not an answer to the OPs question who was about a Ceph HCI setup. Now I know you are a big fan of Vinchin (since most of your posts here are recommending them for migration) but in fact nobody needs a third-party-tool for migration (although to be fair if one already have a Vinchin or Veeam license they might be helpful), the ProxmoxVE wiki has a good overview of available options for migrating with the integrated tools:
https://pve.proxmox.com/wiki/Migrate_to_Proxmox_VE
https://pve.proxmox.com/wiki/Advanced_Migration_Techniques_to_Proxmox_VE

@sabari I'm not sure whether it's possible with your RAID adapter, but maybe it can be flashed to "IT-mode" or something like that to disable RAID?

The most important thing to consider for Ceph is the network setup, since the performance of the Ceph cluster is depending on the network setup. Since any data needs to be written to at least three nodes in the default configuration you can seriously impact your performance.
Just one network adapter for anything won't do you any favours.

The reference documentation has a chapter how to setup a HCI cluster with CEPH, it recommends at least three independent networks:
We recommend a network bandwidth of at least 10 Gbps, or more, to be used exclusively for Ceph traffic. A meshed network setup [4] is also an option for three to five node clusters, if there are no 10+ Gbps switches available.
The volume of traffic, especially during recovery, will interfere with other services on the same network, especially the latency sensitive Proxmox VE corosync cluster stack can be affected, resulting in possible loss of cluster quorum. Moving the Ceph traffic to dedicated and physical separated networks will avoid such interference, not only for corosync, but also for the networking services provided by any virtual guests.
For estimating your bandwidth needs, you need to take the performance of your disks into account.. While a single HDD might not saturate a 1 Gb link, multiple HDD OSDs per node can already saturate 10 Gbps too. If modern NVMe-attached SSDs are used, a single one can already saturate 10 Gbps of bandwidth, or more. For such high-performance setups we recommend at least a 25 Gpbs, while even 40 Gbps or 100+ Gbps might be required to utilize the full performance potential of the underlying disks.
If unsure, we recommend using three (physical) separate networks for high-performance setups:
  • one very high bandwidth (25+ Gbps) network for Ceph (internal) cluster traffic.
  • one high bandwidth (10+ Gpbs) network for Ceph (public) traffic between the ceph server and ceph client storage traffic. Depending on your needs this can also be used to host the virtual guest traffic and the VM live-migration traffic.
  • one medium bandwidth (1 Gbps) exclusive for the latency sensitive corosync cluster communication.
https://pve.proxmox.com/wiki/Deploy...r#_recommendations_for_a_healthy_ceph_cluster
https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_recommendations_for_a_healthy_ceph_cluster
This tutorial also contains information on other considerations for the cluster hardware.
You should also read the rest of admin guide, it contain's a lot of information how to approach this.
@UdoB did a great writeup on small clusters (like your POC), which is a more hands-on tutorial/guide: https://forum.proxmox.com/threads/fabu-can-i-use-ceph-in-a-_very_-small-cluster.159671/

It might also be an option to contact one of Proxmox partner companys to assist you: https://forum.proxmox.com/threads/proxmox-partners-remote-and-international.165756/
 
  • Like
Reactions: UdoB