Ceph Performance.

rborg

New Member
Jun 26, 2024
5
2
3
Hi, we are currently evaluating Proxmox. We have decided to use Ceph as storage. However our design is an external CEPH cluster.

The ceph cluster is currently setup as a 3 node, each node with 4 hdds of 500GB.

Connectivity between all nodes is properly configured, and can reach close to 10Gbit.

That being said performance of a vm writing to ceph is slower than a vm writing to local hdd disks on the pve node.

Pve version is 8.4.1 and ceph is reef.

Are we missing something obvious?
 
hi, i'm not an expert on this but with this setup you won't get good performance with ceph.

Code:
That being said performance of a vm writing to ceph is slower than a vm writing to local hdd disks on the pve node.

that's to be expected, every write of the vm gets written to one or more local hard disks and to at least another hard disk on another node in your ceph cluster.

some issues in you setup:
- 3 nodes are the bare minimum and should only be used for testing / evaluation / home labs
- hdd's are slow, you should really use ssd's nowadays (with plp, dont use QLC ssd's)
- 10Gbit is the bare minimum, every write to disk has to be send over the network to other nodes so this could be your main bottleneck (assuming ssd use)

if you want to use ceph for production you should:
- use at least 5 nodes, multiple disk's per node
- use enterprise ssd's with plp
- use fast networking (25G++)

check the official recommendations before going that route

for more information on small ceph setups and their issues you can read this post
 
hi, i'm not an expert on this but with this setup you won't get good performance with ceph.

Code:
That being said performance of a vm writing to ceph is slower than a vm writing to local hdd disks on the pve node.

that's to be expected, every write of the vm gets written to one or more local hard disks and to at least another hard disk on another node in your ceph cluster.

some issues in you setup:
- 3 nodes are the bare minimum and should only be used for testing / evaluation / home labs
- hdd's are slow, you should really use ssd's nowadays (with plp, dont use QLC ssd's)
- 10Gbit is the bare minimum, every write to disk has to be send over the network to other nodes so this could be your main bottleneck (assuming ssd use)

if you want to use ceph for production you should:
- use at least 5 nodes, multiple disk's per node
- use enterprise ssd's with plp
- use fast networking (25G++)

check the official recommendations before going that route

for more information on small ceph setups and their issues you can read this post
Thanks Markus for your input. Converting the cluster to SSD. Will let you know the outcome.