For Best Performance - Proxmox Cluster with CEPH or ZFS?

iadityaharsh · Jun 27, 2023

After months of planning, I came to a conclusion to assemble 3 Proxmox Nodes and cluster them together.
I'm mostly interested in Mini-PC (NUC Style) with dual 2.5GbE LANs but after building 32 Core Epyc Proxmox Node, I'm known to the performance boost with actual server hardware. Anyway, I will be building 3 nodes and one thing is haunting me. Should I use ZFS with mirror disks on each node and replicate data across all other nodes to achieve HA or Install CEPH on all nodes and combine 6 M.2 NVMe drives to 1 large CEPH pool?

I've heard some amazing things on both sides and some nasty drawbacks.
I would like have some fresh opinions on this topic, especially after Proxmox VE 8.0 Release.

Some Specs that I have in mind for each nodes:
1. SP A55 512GB SSD (For Proxmox Boot Environment)
2. Intel i5 13th Gen (1340P if going with NUC or 13500 if building new altogether)
3. (IF APPLICABLE) B-Series Intel Motherboard with 2.5GbE and dual M.2 Gen-4 NVMe
4. 64GB DDR4/DDR5 RAM (as per system)

Application for Cluster (Primarily with HA in mind):
1. 2 docker servers (1 for home use, 1 for business)
2. MySQL Server for business
3. 1-2 containers for small applications like Omaha Controller, Pi-Hole, etc.
4. Jellyfin Server for small Media Collection (stored in a separate TrueNAS Scale Build)

ness1602 · Jun 27, 2023

For this amount of ram,i would recommend zfs.

UdoB · Jun 27, 2023

iadityaharsh said:
I've heard some amazing things on both sides and some nasty drawbacks.
I would like have some fresh opinions on this topic,

For me this is relevant: with Ceph you get a whole new bunch of unwelcome complexity. You will encounter failure domains you didn't know they exist. (And one would probably want something like 5 or 7 nodes with more than 2 OSDs on each one.)

That's why I walk the ZFS road. This is fine for me because I can tolerate data-loss between replication intervals!

Good luck

iadityaharsh · Jun 27, 2023

ness1602 said:
For this amount of ram, i would recommend zfs.

So Option B would be better, adding a second switch solely dedicated to replication of ZFS vm data.

iadityaharsh · Jun 27, 2023

UdoB said:
with Ceph you get a whole new bunch of unwelcome complexity.

Exactly what I'm fearing of.

UdoB said:
data-loss between replication intervals!

In my case that would just be statistics in most applications and only mysql server would have to be configured differently.

alexskysilk · Jun 27, 2023

iadityaharsh said:
Should I use ZFS with mirror disks on each node and replicate data across all other nodes to achieve HA or Install CEPH on all nodes and combine 6 M.2 NVMe drives to 1 large CEPH pool?

Not a good use case for ceph- small number of osds; also your interconnect is 2.5gb which severely limits the performance of the storage vs local (zfs.)

ness1602 · Jun 27, 2023

2.5gb/s is okay for start, for smaller implementations, but ceph uses a lot of ram, zfs is more suitable for smaller nodes.

supermicro_server · Jun 27, 2023

I would not recommend deploying a cluster with 2.5Gb connectivity for Ceph in a production environment.
This goes against Ceph's best practices.

Additionally, having such a low number of OSDs increases the likelihood of storage loss. Just think, with a 1Gbps network, it takes approximately 3 hours to replicate 1TB of data.
It would take around 9 hours for 3TB. With a 10Gbps network, it would take 20 minutes for 1TB and 1 hour for 3TB.

Keep in mind that when an OSD fails, the data from that OSD is replicated to the remaining OSDs in the same pool. Therefore, if an entire node goes down, it could lead to bandwidth saturation if not properly designed.

Ensure that in the event of a node failure, the network bandwidth (of the Cluster Network) is correctly sized to restore the state of Ceph within an acceptable timeframe, and that all clients can continue working without performance issues.

guioday83 · Thursday at 17:18

I think there are distinct use cases for both. As the colleague said above, ceph is way more complex and rely on the “network performance” based IO, while ZFS relies on “storage performance” based IO. That means as more NICs and Network bandwidth better ceph cluster performance. Not your case.

On other hand ZFS has compression support that wisely used zeroing the vm disk free space can save tons of storage local space and also guarantee that your not replicating garbage.

alexskysilk · Thursday at 18:21

guioday83 said:
I think there are distinct use cases for both.

Most definitely. ZFS is a solution for single hosts; ceph is a solution for clusters.

guioday83 said:
On other hand ZFS has compression support that wisely used zeroing the vm disk free space can save tons of storage local space and also guarantee that your not replicating garbage.

you can do the same with ceph.

aabraham · Friday at 10:16

iadityaharsh said:
After months of planning, I came to a conclusion to assemble 3 Proxmox Nodes and cluster them together.
I'm mostly interested in Mini-PC (NUC Style) with dual 2.5GbE LANs but after building 32 Core Epyc Proxmox Node, I'm known to the performance boost with actual server hardware. Anyway, I will be building 3 nodes and one thing is haunting me. Should I use ZFS with mirror disks on each node and replicate data across all other nodes to achieve HA or Install CEPH on all nodes and combine 6 M.2 NVMe drives to 1 large CEPH pool?

I've heard some amazing things on both sides and some nasty drawbacks.
I would like have some fresh opinions on this topic, especially after Proxmox VE 8.0 Release.

Some Specs that I have in mind for each nodes:
1. SP A55 512GB SSD (For Proxmox Boot Environment)
2. Intel i5 13th Gen (1340P if going with NUC or 13500 if building new altogether)
3. (IF APPLICABLE) B-Series Intel Motherboard with 2.5GbE and dual M.2 Gen-4 NVMe
4. 64GB DDR4/DDR5 RAM (as per system)

Application for Cluster (Primarily with HA in mind):
1. 2 docker servers (1 for home use, 1 for business)
2. MySQL Server for business
3. 1-2 containers for small applications like Omaha Controller, Pi-Hole, etc.
4. Jellyfin Server for small Media Collection (stored in a separate TrueNAS Scale Build)
View attachment 52223

I would also advise using ZFS. Ceph could add a mountain of complexity that could make things more complicated for you than they need to be. A good rule of thumb is to keep things as simple and minimalist as possible.

Search

Search

For Best Performance - Proxmox Cluster with CEPH or ZFS?

iadityaharsh

Member

ness1602

Renowned Member

UdoB

Distinguished Member

iadityaharsh

Member

iadityaharsh

Member

alexskysilk

Distinguished Member

ness1602

Renowned Member

supermicro_server

Well-Known Member

guioday83

New Member

alexskysilk

Distinguished Member

aabraham

New Member