Production PVE Cluster Decisions

ScottDavis

New Member
May 23, 2024
26
4
3
I have been testing a three node cluster with HA and both ZFS and CEPH (full mesh) and I'm seeing some stark differences in disk speed testing between CEPH and ZFS. ZFS is over twice as fast, however in fairness we are only using 1GB nics in the test environment that I know limits CEPH.

To the point I'm thinking for 3-4 nodes in production running web and sql servers with ZFS and use HA and replication.

Can CEPH meet or exceed ZFS performance with 10GBe full mesh?
 
I have been testing a three node cluster with HA and both ZFS and CEPH (full mesh) and I'm seeing some stark differences in disk speed testing between CEPH and ZFS. ZFS is over twice as fast,
By "HA with ZFS" did you mean ZFS replication? If so, then, unlike Ceph, it is asynchronous. Essentially you are only writing locally to a disk. With Ceph you are writing to all disks over the network. So you are gated by network, disk, and CPU in each of the nodes.

A lot of your testing will depend on the type of disks used (HDD, SSD, NVMe). The network is important, and so is the CPU. How you write data is critical as well (dd if=/dev/zero is not a good way to test).
Performance testing/comparison is very nuanced.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Kingneutron