Hi community,
we have a server cluster consisting of 3 nodes with EPYC 7402P 24-Core CPUs and 6 Intel Enterprise SSDs (4620) and 256GB RAM each. Also we have a 10Gbits NIC for Ceph.
SSD performance alone is fine, Jumbo frames are enabled and also iperf gives resonable results in terms of performance with the 10Gbits link.
We have read the Proxmox Benchmark document (https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark) and compared the results to our cluster.
The SSD performance is better, also the CPUs are more powerful that we have.
Anyway if we setup the Ceph cluster and perform the benchmark test, we only get around 180MB/s in average instead of 800-1000MB/s which are mentioned in the benchmark PDF and we don't know why. I think the hardware is pretty powerful so I guess it's a configuration problem.
Our Ceph config:
Crush map:
These are the default settings.
We are using bluestore as well as OSD type which was also the default.
Any idea why the Ceph performance is so bad?
Also we have disabled swap with swapoff -a on all nodes.
Any help would be highly appreciated.
we have a server cluster consisting of 3 nodes with EPYC 7402P 24-Core CPUs and 6 Intel Enterprise SSDs (4620) and 256GB RAM each. Also we have a 10Gbits NIC for Ceph.
SSD performance alone is fine, Jumbo frames are enabled and also iperf gives resonable results in terms of performance with the 10Gbits link.
We have read the Proxmox Benchmark document (https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark) and compared the results to our cluster.
The SSD performance is better, also the CPUs are more powerful that we have.
Anyway if we setup the Ceph cluster and perform the benchmark test, we only get around 180MB/s in average instead of 800-1000MB/s which are mentioned in the benchmark PDF and we don't know why. I think the hardware is pretty powerful so I guess it's a configuration problem.
Our Ceph config:
Code:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 192.168.10.0/24
fsid = d05ccd07-d328-47a1-b39b-fa3c440aa859
mon_allow_pool_delete = true
mon_host = 91.199.162.40 91.199.162.41 91.199.162.42
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 91.199.162.0/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
Crush map:
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class ssd
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class ssd
device 5 osd.5 class ssd
device 6 osd.6 class ssd
device 7 osd.7 class ssd
device 8 osd.8 class ssd
device 9 osd.9 class ssd
device 10 osd.10 class ssd
device 11 osd.11 class ssd
device 12 osd.12 class ssd
device 13 osd.13 class ssd
device 14 osd.14 class ssd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
# buckets
host pve1 {
id -3 # do not change unnecessarily
id -4 class ssd # do not change unnecessarily
# weight 8.730
alg straw2
hash 0 # rjenkins1
item osd.1 weight 1.746
item osd.4 weight 1.746
item osd.2 weight 1.746
item osd.3 weight 1.746
item osd.0 weight 1.746
}
host pve2 {
id -5 # do not change unnecessarily
id -6 class ssd # do not change unnecessarily
# weight 8.730
alg straw2
hash 0 # rjenkins1
item osd.9 weight 1.746
item osd.8 weight 1.746
item osd.5 weight 1.746
item osd.7 weight 1.746
item osd.6 weight 1.746
}
host pve3 {
id -7 # do not change unnecessarily
id -8 class ssd # do not change unnecessarily
# weight 8.730
alg straw2
hash 0 # rjenkins1
item osd.10 weight 1.746
item osd.12 weight 1.746
item osd.13 weight 1.746
item osd.11 weight 1.746
item osd.14 weight 1.746
}
root default {
id -1 # do not change unnecessarily
id -2 class ssd # do not change unnecessarily
# weight 26.191
alg straw2
hash 0 # rjenkins1
item pve1 weight 8.730
item pve2 weight 8.730
item pve3 weight 8.730
}
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
# end crush map
Logs
()
These are the default settings.
We are using bluestore as well as OSD type which was also the default.
Any idea why the Ceph performance is so bad?
Code:
root@pve1:~# echo 3 | tee /proc/sys/vm/drop_caches && sync && rados -p bench bench 60 write --no-cleanup &&rados -p bench bench 60 seq && rados -p bench bench 60 rand && rados -p bench cleanup
3
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects
Object prefix: benchmark_data_pve1_644283
[...]
Total time run: 60.5082
Total reads made: 2993
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 197.857
Average IOPS: 49
Stddev IOPS: 6.97015
Max IOPS: 67
Min IOPS: 38
Average Latency(s): 0.322536
Max latency(s): 3.38795
Min latency(s): 0.00226084
Also we have disabled swap with swapoff -a on all nodes.
Any help would be highly appreciated.