Ceph Cluster performance

XMlabs

Member
Feb 1, 2018
8
0
6
30
Hi All,

I've a ceph Cluster with 3 nodes HPE each with 10xSAS 1TB and 2xnvme 1TB below the config.
The replica and ceph network is 10Gb but the performance are very low...
in VM I got (in sequential mode) Read: 230MBps Write: 65MBps

What I can do/check to tune my storage environment?

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd
device 21 osd.21 class hdd
device 22 osd.22 class hdd
device 23 osd.23 class hdd
device 24 osd.24 class hdd
device 25 osd.25 class hdd
device 26 osd.26 class hdd
device 27 osd.27 class hdd
device 28 osd.28 class hdd
device 29 osd.29 class hdd
device 30 osd.30 class ssd
device 31 osd.31 class ssd
device 32 osd.32 class ssd
device 33 osd.33 class ssd
device 34 osd.34 class ssd
device 35 osd.35 class ssd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host HCI11 {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
id -9 class ssd # do not change unnecessarily
# weight 10.832
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.895
item osd.1 weight 0.895
item osd.2 weight 0.895
item osd.3 weight 0.895
item osd.4 weight 0.895
item osd.5 weight 0.910
item osd.6 weight 0.895
item osd.7 weight 0.895
item osd.8 weight 0.895
item osd.9 weight 0.895
item osd.30 weight 0.931
item osd.31 weight 0.931
}
host HCI12 {
id -5 # do not change unnecessarily
id -6 class hdd # do not change unnecessarily
id -10 class ssd # do not change unnecessarily
# weight 10.818
alg straw2
hash 0 # rjenkins1
item osd.10 weight 0.895
item osd.11 weight 0.895
item osd.12 weight 0.895
item osd.13 weight 0.895
item osd.14 weight 0.895
item osd.15 weight 0.895
item osd.16 weight 0.895
item osd.17 weight 0.895
item osd.18 weight 0.895
item osd.19 weight 0.895
item osd.32 weight 0.931
item osd.33 weight 0.931
}
host HCI13 {
id -7 # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
id -11 class ssd # do not change unnecessarily
# weight 10.818
alg straw2
hash 0 # rjenkins1
item osd.20 weight 0.895
item osd.21 weight 0.895
item osd.22 weight 0.895
item osd.23 weight 0.895
item osd.24 weight 0.895
item osd.25 weight 0.895
item osd.26 weight 0.895
item osd.27 weight 0.895
item osd.28 weight 0.895
item osd.29 weight 0.895
item osd.34 weight 0.931
item osd.35 weight 0.931
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
id -12 class ssd # do not change unnecessarily
# weight 32.468
alg straw2
hash 0 # rjenkins1
item HCI11 weight 10.832
item HCI12 weight 10.818
item HCI13 weight 10.818
}

# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

# end crush map
 
Hi, you have mixed your ssd && hdd.

What do you want to do with your nvme ?


1) if you want to split ssd && hdd ,

you need to create 2 different replicated_rule , and create 2 differents pools.


2) If you wan to speedup your hdd write, you can use your nvme as journal for your hdd. (when you create the osd the hdd, you need to choose the nvme as journal)
 
Hi, you have mixed your ssd && hdd.

What do you want to do with your nvme ?


1) if you want to split ssd && hdd ,

you need to create 2 different replicated_rule , and create 2 differents pools.


2) If you wan to speedup your hdd write, you can use your nvme as journal for your hdd. (when you create the osd the hdd, you need to choose the nvme as journal)
Hi thanks, for jurnal you mean the DB allocation and WAL ?

UPDATE: I've done.. I've configured now the OSD with DB and WAL with auto sizing, I've also added 2 more SAS drive of 1TB but the performance are still too low in my opinion:

each server are configured with:
- 2xCPU (total 24 core)
- 384GB of RAM
- 2x10Gb NIC and Jumbo frame at 9000
- 12x1TB SAS HPE (disk cache enabled)
- 2xnvme SB-ROCKET-1TB 1TB for DB and WAL (connected to PCIe not to the same backplane)

I've created a VM with 8xcore and 8GB RAM with those performance:
- [VM cache write back] w: 330MBps r:350MBps
- [VM cache disabied] w:115MBps r:180MBps

the disk controller is an HP p420i in HBA mode:

Smart Array P420i in Slot 0 (Embedded) Bus Interface: PCI Slot: 0 Serial Number: 0014380319DDAF0 Cache Serial Number: PBKUC0BRH6V743 RAID 6 (ADG) Status: Enabled Controller Status: OK Hardware Revision: B Firmware Version: 8.32-0 Cache Board Present: True Cache Status: Not Configured Total Cache Size: 1.0 Total Cache Memory Available: 0.8 Cache Backup Power Source: Capacitors Battery/Capacitor Count: 1 Battery/Capacitor Status: OK Controller Temperature (C): 59 Cache Module Temperature (C): 28 Capacitor Temperature (C): 19 Number of Ports: 2 Internal only Driver Name: hpsa Driver Version: 3.4.20 HBA Mode Enabled: True PCI Address (Domain:Bus:Device.Function): 0000:02:00.0 Port Max Phy Rate Limiting Supported: False Host Serial Number: CZ3432AKLR Sanitize Erase Supported: False Primary Boot Volume: None Secondary Boot Volume: None

below disk config:

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 917G 0 disk └─ceph--67515f7c--eacc--4039--bfc1--91bb8e487905-osd--block--5ca14715--036a--46f4--8c18--7315a9544f38 253:6 0 917G 0 lvm sdb 8:16 0 917G 0 disk └─ceph--7cbf0277--d045--4a43--90cd--a72d9af51216-osd--block--b8f8b4a2--080e--4fe0--a5bb--4cf6cd17d036 253:8 0 917G 0 lvm sdc 8:32 0 917G 0 disk └─ceph--5f043081--78b5--4d0d--8cca--13b6117dcd44-osd--block--8fd30025--f3e4--47b1--8fbe--b122ba53cd5e 253:10 0 917G 0 lvm sdd 8:48 0 917G 0 disk └─ceph--e428a600--b095--4df1--8966--26ca1f8218a2-osd--block--d711bf24--8de9--4b39--b744--39820c473c66 253:12 0 917G 0 lvm sde 8:64 0 917G 0 disk └─ceph--eb4bb278--01a1--4df3--8f8c--1de9b0ec9311-osd--block--5b479ac7--8bfc--489c--9893--60b0ff35c27d 253:14 0 917G 0 lvm sdf 8:80 0 931.5G 0 disk └─ceph--10f5450c--363d--4c78--83aa--451ba771bf1a-osd--block--0216461f--0813--496b--8416--0ce92bde0591 253:16 0 931.5G 0 lvm sdg 8:96 0 917G 0 disk └─ceph--83270ad8--cb26--4ecf--a5eb--f1efa03275ec-osd--block--688fe9c6--6844--404b--bb21--3a47152643ff 253:18 0 917G 0 lvm sdh 8:112 0 917G 0 disk └─ceph--835e8826--8784--4f01--b884--86687dcd5df9-osd--block--3831888b--e3d7--4b3a--a35e--82b226cf2f7a 253:20 0 917G 0 lvm sdi 8:128 0 917G 0 disk └─ceph--db7f7668--ee16--49cc--bceb--eb2d3232c9b5-osd--block--79e7e8d0--3c77--4a7b--b04d--4ca052cb2276 253:22 0 917G 0 lvm sdj 8:144 0 917G 0 disk └─ceph--928018d5--8a77--4249--9eee--bb9ddc13ddcf-osd--block--9e31e957--1059--4226--83a1--0a0e331ab1db 253:24 0 917G 0 lvm sdk 8:160 0 917G 0 disk └─ceph--2a45bc88--27b4--4c8e--8202--194bcb318eaa-osd--block--87c3b4f4--ef59--431c--a2f9--3e6273b5a2c7 253:26 0 917G 0 lvm sdl 8:176 0 917G 0 disk └─ceph--7ac794dd--17cf--404e--8a10--39a9fabc3f1f-osd--block--9787ded6--074c--4f9e--a8de--7e30b130d83f 253:28 0 917G 0 lvm nvme1n1 259:0 0 953.9G 0 disk ├─ceph--d08e9f68--0003--4e07--aa88--44f53d1a9d32-osd--db--0941ff65--a228--4b6d--abf5--6961b883b001 253:17 0 91.7G 0 lvm ├─ceph--d08e9f68--0003--4e07--aa88--44f53d1a9d32-osd--db--7d82d9f5--9694--41a9--beda--178d3b56d2b9 253:19 0 91.7G 0 lvm ├─ceph--d08e9f68--0003--4e07--aa88--44f53d1a9d32-osd--db--095a6a49--2781--4197--9629--98890abbbdb9 253:21 0 91.7G 0 lvm ├─ceph--d08e9f68--0003--4e07--aa88--44f53d1a9d32-osd--db--c9fe9ea4--db48--48e5--b792--6fc336ac2e0f 253:23 0 91.7G 0 lvm ├─ceph--d08e9f68--0003--4e07--aa88--44f53d1a9d32-osd--db--0dc990f3--6a4b--46ac--b9d2--431b2b04968e 253:25 0 91.7G 0 lvm └─ceph--d08e9f68--0003--4e07--aa88--44f53d1a9d32-osd--db--8bea6195--acad--48fa--b547--ff41e185e213 253:27 0 91.7G 0 lvm nvme0n1 259:1 0 953.9G 0 disk ├─ceph--118d0719--f390--47c5--b261--ef60a6778387-osd--db--cd041df9--8e4e--4538--9398--e7432b505bb0 253:5 0 91.7G 0 lvm ├─ceph--118d0719--f390--47c5--b261--ef60a6778387-osd--db--a9cab0ac--0cc4--4f06--b6f2--2a09c289f6dd 253:7 0 91.7G 0 lvm ├─ceph--118d0719--f390--47c5--b261--ef60a6778387-osd--db--57966d9f--81ae--4f26--af0d--d0e6b7cd973f 253:9 0 91.7G 0 lvm ├─ceph--118d0719--f390--47c5--b261--ef60a6778387-osd--db--7613ddec--e10e--417d--bfa6--6c401d858c9f 253:11 0 91.7G 0 lvm ├─ceph--118d0719--f390--47c5--b261--ef60a6778387-osd--db--36ee751f--9597--479f--baf4--19106cd15c91 253:13 0 91.7G 0 lvm └─ceph--118d0719--f390--47c5--b261--ef60a6778387-osd--db--cbfd4a91--20a9--4896--a5e3--0aef84139626 253:15 0 93.2G 0 lvm

ceph config:
[global] auth_client_required = cephx auth_cluster_required = cephx auth_service_required = cephx cluster_network = xxx.yyy.zzz.26/24 fsid = f40ea693-2069-4b58-b7cb-fdd09af33046 mon_allow_pool_delete = true mon_host = xxx.yyy.zzz.26 xxx.yyy.zzz.27 xxx.yyy.zzz.28 osd_pool_default_min_size = 2 osd_pool_default_size = 2 public_network = xxx.yyy.zzz.26/24 [client] keyring = /etc/pve/priv/$cluster.$name.keyring [mon.HCI11] public_addr = xxx.yyy.zzz.26 [mon.HCI12] public_addr = xxx.yyy.zzz.27 [mon.HCI13] public_addr = xxx.yyy.zzz.28 # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable chooseleaf_stable 1 tunable straw_calc_version 1 tunable allowed_bucket_algs 54 # devices device 0 osd.0 class hdd device 1 osd.1 class hdd device 2 osd.2 class hdd device 3 osd.3 class hdd device 4 osd.4 class hdd device 5 osd.5 class hdd device 6 osd.6 class hdd device 7 osd.7 class hdd device 8 osd.8 class hdd device 9 osd.9 class hdd device 10 osd.10 class hdd device 11 osd.11 class hdd device 12 osd.12 class hdd device 13 osd.13 class hdd device 14 osd.14 class hdd device 15 osd.15 class hdd device 16 osd.16 class hdd device 17 osd.17 class hdd device 18 osd.18 class hdd device 19 osd.19 class hdd device 20 osd.20 class hdd device 21 osd.21 class hdd device 22 osd.22 class hdd device 23 osd.23 class hdd device 24 osd.24 class hdd device 25 osd.25 class hdd device 26 osd.26 class hdd device 27 osd.27 class hdd device 28 osd.28 class hdd device 29 osd.29 class hdd device 30 osd.30 class hdd device 31 osd.31 class hdd device 32 osd.32 class hdd device 33 osd.33 class hdd device 34 osd.34 class hdd device 35 osd.35 class hdd # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 zone type 10 region type 11 root # buckets host HCI11 { id -3 # do not change unnecessarily id -4 class hdd # do not change unnecessarily # weight 11.837 alg straw2 hash 0 # rjenkins1 item osd.0 weight 0.985 item osd.1 weight 0.985 item osd.2 weight 0.985 item osd.3 weight 0.985 item osd.4 weight 0.985 item osd.5 weight 1.001 item osd.6 weight 0.985 item osd.7 weight 0.985 item osd.8 weight 0.985 item osd.31 weight 0.985 item osd.34 weight 0.985 item osd.35 weight 0.985 } host HCI12 { id -5 # do not change unnecessarily id -6 class hdd # do not change unnecessarily # weight 11.821 alg straw2 hash 0 # rjenkins1 item osd.10 weight 0.985 item osd.11 weight 0.985 item osd.12 weight 0.985 item osd.13 weight 0.985 item osd.14 weight 0.985 item osd.15 weight 0.985 item osd.16 weight 0.985 item osd.17 weight 0.985 item osd.18 weight 0.985 item osd.19 weight 0.985 item osd.32 weight 0.985 item osd.33 weight 0.985 } host HCI13 { id -7 # do not change unnecessarily id -8 class hdd # do not change unnecessarily # weight 11.837 alg straw2 hash 0 # rjenkins1 item osd.9 weight 0.985 item osd.20 weight 0.985 item osd.21 weight 0.985 item osd.22 weight 0.985 item osd.23 weight 0.985 item osd.24 weight 0.985 item osd.25 weight 0.985 item osd.26 weight 0.985 item osd.27 weight 0.985 item osd.28 weight 0.985 item osd.29 weight 0.985 item osd.30 weight 1.001 } root default { id -1 # do not change unnecessarily id -2 class hdd # do not change unnecessarily # weight 35.494 alg straw2 hash 0 # rjenkins1 item HCI11 weight 11.837 item HCI12 weight 11.821 item HCI13 weight 11.837 } # rules rule replicated_rule { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!