Hardware sizing guidance for Proxmox production cluster

Nadee

New Member
May 19, 2026
1
0
1
Hello everyone,


I am planning to design a production Proxmox VE cluster and would like to get guidance on appropriate hardware sizing and best practices for the following expected workload:


  • vCPU requirement: ~1500 vCPUs
  • Memory requirement: ~3.5 TB RAM (3500 GB)
  • Storage requirement: ~180 TB usable capacity

I would appreciate recommendations on:


  1. Minimum and recommended number of nodes for a stable cluster
  2. CPU sizing per node (Intel/AMD generation, core count, overcommit guidance)
  3. Memory distribution strategy across nodes
  4. Storage design (Ceph vs ZFS vs external SAN) to support ~180 TB usable capacity
  5. Network requirements (10/25/40/100 GbE considerations)
  6. High availability and failure domain best practices
  7. Any known limitations or design pitfalls for this scale in Proxmox VE

The goal is to ensure a highly available, scalable, and production-grade design with room for future growth.


Any reference architectures or real-world examples would be greatly appreciated.


Thank you in advance for your support.
 
This is not the default query you would ask just in the forums. To answer this properly, best to contact a proxmox partner for a tailored setup. Information is just not enough to answer properly. Your money bounds would also be very important. CEPH in general is more expensive than just buying one HA-SAN solution with e.g. dedicated storage paths. For CEPH you would need a good network infrastructure, which is also costly.

I will try to address the easier questions with some assumptions:

Minimum and recommended number of nodes for a stable cluster
3 is the minimum number of nodes for a Proxmox VE cluster. The stable aspect is AFAIK in the number of maximum nodes, which depends heavily on the used network and latency between the nodes. If all are in one DC, it should be better, yet you will not have problems up to 15 nodes. Number has be odd.

5 would be the minimum number of nodes for a CEPH cluster.

CPU sizing per node (Intel/AMD generation, core count, overcommit guidance)
AMD has generally more NUMA nodes, but higher core counts, better performance and higher number of PCIe lanes, so in general better performance on paper. More NUMA nodes may need some pinning and optimization. You will also have licensing problems for you VMs if you e.g. run something like an Oracle database on EPYC with Standard Edition licensing.

Depending on the mininum frequency for each core, the number of cores varies. So if you require fast cores, you can only have e.g. 16 core EPYC and have for example the requirement for 256 real cores /512 threads), 256 / 16 / 2 = 9 nodes. Assuming 512 threads is sufficient for the number of vCPUs. Without any real utilization, no one can plan.

Storage design (Ceph vs ZFS vs external SAN) to support ~180 TB usable capacity
ZFS with replication will be not easy to setup (lot of scripting) in bigger environments and I would generally not recommend it. You would need to have a fine grained ZFS replication setup or 180 TB on each node, which is totally unpractical. Go with a KISS solution.

The easiest method would be a SAN with Proxmox VE support so that you have ALL features you need. This is ESSENTIAL. I've never seen BlockBridge in action myself, but read only good things about it. Don't use a storage with just LVM integration.

If you go with CEPH, you would need to have as fast network as possible with that amount of space. I would recommend multiple 100 Gb connections or faster. Local storage should be NVMe and you will easily outperform one 100 Gb with a recent PCIe 5.0 card.

Assuming your 180 TB need with default 3 copies in CEPH 540 TB in total, so having in 9 nodes 540 / 9 = 60 TB, which are roughly at least 8x7,68 TB SSDs.
 
Obligatory reminder regarding Ceph: https://forum.proxmox.com/threads/fabu-can-i-use-ceph-in-a-_very_-small-cluster.159671/

Please note, that Udos Writeup is a little extreme since he assumes, that you want to survive the outage of two nodes (due to two node failures or one node in maintenance and another in failure mode). There are reports of people who run Ceph in a three-node cluster who are quite happy with it since their usecase can tolerate the envolved risc with an potential outage of two nodes. For them it's enough that a three-node-cluster can survive the outage of one node. You will still need fitting hardware for storage (so enterprise SSDs with power-loss-protection) and network (10 Gbit minimum, 25 Gbit recommended, @Falk R. mentioned that for new setups he always go with 100Gbit now to allow future growth of business) though.

For more information on Ceph requirements: https://pve.proxmox.com/wiki/Deploy...r#_recommendations_for_a_healthy_ceph_cluster
The wiki has a description for a meshed-network which might be of benefit if you can live with the limits of a three-node cluster and want to avoid the cost of dedicated switches for the cluster networks: https://pve.proxmox.com/wiki/Full_Mesh_Network_for_Ceph_Server
 
Hi Nadee,

This is a pretty common design pattern we see for production Proxmox clusters at scale for folks coming from VMware. The good news is that what you're describing is absolutely achievable. That said, before anyone can give you truly dialed-in recommendations, there are some key constraints you'll want to nail down first.

Budget: Are you buying new hardware, repurposing existing gear, or considering a mix? At 180 TB usable + 3.5 TB RAM, you're looking at meaningful CapEx either way. Most of the people we see tend to migrate from one platform to another to save on CapEx (i.e., partially reuse). One thing worth double-checking is your vCPU to memory ratio. At 1500 vCPUs and 3.5 TB RAM you are sitting at roughly 2.3 GB per vCPU, which is on the low side for most production workloads. If that ratio is intentional based on your workload character, great. If not, your true memory requirement may be higher, which would have a material impact on node count, chassis selection, and overall budget.

Power and cooling envelope: Do you have a fixed rack space or power budget?

Rack space: How many racks are available? This directly affects your failure domain strategy and node density decisions.

Workload: Are these VMs mostly compute-heavy, memory-heavy, or IO-heavy? Is storage throughput or IOPS the bigger concern? 180 TB of cold archive capacity is a very different design than 180 TB of low-latency, high-IOPS workloads. You will find that storage performance directly correlates with CPU choice and NUMA arrangement. You need to choose servers and hardware that fit your application need.

HA requirements: What is the tolerance for downtime? N+1 at the node level? N+2? Do you need to survive a full rack failure?

Support requirements: Is your team comfortable relying on community support and in-house expertise? Or do you prefer assistance from outsiders? This has a real impact on storage and platform choices, as some solutions carry enterprise support options and others do not.

Growth trajectory: Is 1500 vCPUs the ceiling or a starting point? Planning for 2x growth in 2 years changes how you size node counts and network uplinks today.

Once you share those details, the community might be able to give you more concrete guidance. That said, driving a coherent design for a system at that scale will likely take more than a forum post!

Good Luck!!!


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox