Advice on first attempt at proxmox & ceph hybrid cluster

premmy

New Member
Apr 18, 2025
1
0
1
Hi All,


So I've recently decided to try Proxmox. I'll be honest; it was primarily the Ceph integration that initially sold it to me for the following reasons:


In production we use/pay license fees for the Hitachi VSP, and to be honest, it's been very stable over the last 2/3 years. However, with our data processing requirements rapidly changing day by day, I thought Proxmox could be a potential winner.


At the moment, it's early days in terms of infrastructure planning, etc. I would like to test a couple of ideas and see if it could be a viable option.


Initially I was looking at VMware/vSAN for a similar approach but have been drawn towards Proxmox/Ceph.


Side note: Should the testing look promising, we would be purchasing the Proxmox enterprise subscriptions, but at the moment it's early days and doesn't warrant jumping straight into a subscription.


Production: (The difference between the test environment I've put together vs our production will primarily be newer hardware, a faster network and double the number of nodes, but the concept will stay the same.)



Test Environment:

Connectivity: 2 x 10g independent uplinks

Network: 2 x Cisco Nexus N9k Switches (32 x 40 Gbps ports)

Servers: 3 x Dell R730XD's (All 3 of them have the following identical configuration)


RAM: 128 GB DDR4 2133 MHz

CPU: 2 x Xeon E5-2690 v4 @ 2.60 GHz (56 Cores)

DISCS: 20 x Seagate Exos 12 Gbps 1.8 TB SAS HDD (ST1800MM0129)

FLASH: 4 x 1TB PCI NVMe



The network is 40GbE via QSFP+ & fibre from Mellanox ConnectX-Pro's to the N9K switches.

All discs are being presented to Proxmox without any RAID configuration; currently they still have the embedded H730P Mini's, but in HBA mode, they can be changed for HBAs if needed.

So each of the servers has about 80TB of raw storage.


My thoughts/goal:


Most online posts talk about high availability/replication; however, I would like to stray away from the replication as much as possible, primarily because our use case is like this:


The cluster will simply be like a dumping ground; there will be a handful of small containers ingesting data into the Ceph cluster throughout the day, for example, and this data will be handed off at specific intervals to the more stable, long-term storage (for now, the Hitachi VSP).


So the primary purpose of the Proxmox/Ceph cluster will be to ingest data, do some basic compute tasks, and then it will be backed up on long-term storage.


Thus, the point I'm making is that replication is of no importance but, in reality, becomes a hindrance by wasting significant resources for data that will be short-lived.


Worst case, we lose a couple of hours of data that we could get back with a few hours of manual work. Therefore, the risk of having to spend a few hours manually pulling the data for potential downtime should something fail far outweighs the need to spend X amount of resources on a resilient "highly available" setup.


We will benefit far more from the resources and the performance gain by not having any raid or replication in place.


So my question to the seasoned pros here is...


What would one recommend for such a use case? Can one simply create a crush rule that requires 0 replicas? What options are available to maximise performance in trade for replication?


Or is there a minimalistic approach that can give the best of both worlds (for example, 1 drive failure per pool) without the need of dedicating too many resources to the replication side?
 
Servers: 3 x

Yes, three Nodes is the minimum for Ceph and your hardware is great. But I would never start a serious business with only the absolute minimum of anything.

Ceph requires some more resources to be reliable; I've written down some things I've noticed: https://forum.proxmox.com/threads/fabu-can-i-use-ceph-in-a-_very_-small-cluster.159671/

Disclaimer: I've dropped my six node Ceph with 12 ODSs and went back to ZFS only - after 15 months of productive use in a Homelab with slow (2.5 GBit/s) network.
 
you need 3 nodes

my 3 node cluster is comprised of intel NUCs, with one NVME for ceph in each
networking is ~26Gb/s (thunderbolt mesh) and it uses so little of that... i know of folks who had small envs on 2.5Gb/s

all that matter is the realtime load, boot storm loads and amout of data that will be read and written

i use ceph replication pool for both my RBD and CephsFS 0- if you don't want 3 copies of everything you could use an erasure encoded pool

this is all to say - you need to test your workload and meet you needs, at the end of the day Ceph works fine on even minimal systems.