For Best Performance - Proxmox Cluster with CEPH or ZFS?

iadityaharsh

Member
Jun 21, 2022
11
0
6
After months of planning, I came to a conclusion to assemble 3 Proxmox Nodes and cluster them together.
I'm mostly interested in Mini-PC (NUC Style) with dual 2.5GbE LANs but after building 32 Core Epyc Proxmox Node, I'm known to the performance boost with actual server hardware. Anyway, I will be building 3 nodes and one thing is haunting me. Should I use ZFS with mirror disks on each node and replicate data across all other nodes to achieve HA or Install CEPH on all nodes and combine 6 M.2 NVMe drives to 1 large CEPH pool?

I've heard some amazing things on both sides and some nasty drawbacks.
I would like have some fresh opinions on this topic, especially after Proxmox VE 8.0 Release.

Some Specs that I have in mind for each nodes:
1. SP A55 512GB SSD (For Proxmox Boot Environment)
2. Intel i5 13th Gen (1340P if going with NUC or 13500 if building new altogether)
3. (IF APPLICABLE) B-Series Intel Motherboard with 2.5GbE and dual M.2 Gen-4 NVMe
4. 64GB DDR4/DDR5 RAM (as per system)

Application for Cluster (Primarily with HA in mind):
1. 2 docker servers (1 for home use, 1 for business)
2. MySQL Server for business
3. 1-2 containers for small applications like Omaha Controller, Pi-Hole, etc.
4. Jellyfin Server for small Media Collection (stored in a separate TrueNAS Scale Build)
PVE-Cluster Large.jpeg
 
I've heard some amazing things on both sides and some nasty drawbacks.
I would like have some fresh opinions on this topic,
For me this is relevant: with Ceph you get a whole new bunch of unwelcome complexity. You will encounter failure domains you didn't know they exist. (And one would probably want something like 5 or 7 nodes with more than 2 OSDs on each one.)

That's why I walk the ZFS road. This is fine for me because I can tolerate data-loss between replication intervals!

Good luck
 
2.5gb/s is okay for start, for smaller implementations, but ceph uses a lot of ram, zfs is more suitable for smaller nodes.
 
I would not recommend deploying a cluster with 2.5Gb connectivity for Ceph in a production environment.
This goes against Ceph's best practices.

Additionally, having such a low number of OSDs increases the likelihood of storage loss. Just think, with a 1Gbps network, it takes approximately 3 hours to replicate 1TB of data.
It would take around 9 hours for 3TB. With a 10Gbps network, it would take 20 minutes for 1TB and 1 hour for 3TB.

Keep in mind that when an OSD fails, the data from that OSD is replicated to the remaining OSDs in the same pool. Therefore, if an entire node goes down, it could lead to bandwidth saturation if not properly designed.

Ensure that in the event of a node failure, the network bandwidth (of the Cluster Network) is correctly sized to restore the state of Ceph within an acceptable timeframe, and that all clients can continue working without performance issues.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!