Advice Regarding 5 node Proxmox Ceph Setup

hadizeid

Renowned Member
Feb 7, 2012
6
0
66
Hi everyone,
just created a 5 node proxmox ceph cluster and would like and advise wether it ok and how to tweak performnace.

My Setup is:
3 Proxmox node are :
Dell Power Edge R730 :
12 x Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz (2 Sockets)
Ram 128 GiB
2x 600 GB Dell sas 15k rpm in Raid1 for OS
3x 4TB WD4000FYYZ for OSD

2 Proxmox node are :
Dell Power Edge R730 :
6 x Intel(R) Xeon(R) CPU E5-2609 v3 @ 1.90GHz (1 Socket)
Ram 32 GiB
2x 600 GB Dell sas 15k rpm in Raid1 for OS
6x 6TB ST6000NM0024 for osd


All server have 4x 1GB ethernet card + 2x 10 GB ethernet card
one of the 10 GB network cards is used for ceph cluster
3 mons are active on the first 3 proxmox
total number of osd is 21 and journal is on the same osd drive for all
osd journal size = 5120


proxmox-ve: 4.4-79 (running kernel: 4.4.35-2-pve)
pve-manager: 4.4-12 (running version: 4.4-12/e71b7a74)
pve-kernel-4.4.6-1-pve: 4.4.6-48 pve-kernel-4.4.13-1-pve: 4.4.13-56 pve-kernel-4.4.13-2-pve: 4.4.13-58 pve-kernel-4.4.35-2-pve: 4.4.35-79 pve-kernel-4.4.15-1-pve: 4.4.15-60
ceph: 0.94.9-1~bpo80+1

Pool Config:
1 pool with size/min = 2/1 pg_num=1050 ruleset =0


Crush Map:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable straw_calc_version 1

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
device 4 osd.4
device 5 osd.5
device 6 osd.6
device 7 osd.7
device 8 osd.8
device 9 osd.9
device 10 osd.10
device 11 osd.11
device 12 osd.12
device 13 osd.13
device 14 osd.14
device 15 osd.15
device 16 osd.16
device 17 osd.17
device 18 osd.18
device 19 osd.19
device 20 osd.20

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host pxmx4 {
id -2 # do not change unnecessarily
# weight 32.700
alg straw
hash 0 # rjenkins1
item osd.0 weight 5.450
item osd.1 weight 5.450
item osd.2 weight 5.450
item osd.3 weight 5.450
item osd.4 weight 5.450
item osd.5 weight 5.450
}
host pxmx5 {
id -3 # do not change unnecessarily
# weight 32.700
alg straw
hash 0 # rjenkins1
item osd.6 weight 5.450
item osd.7 weight 5.450
item osd.8 weight 5.450
item osd.9 weight 5.450
item osd.10 weight 5.450
item osd.11 weight 5.450
}
host pxmx1 {
id -4 # do not change unnecessarily
# weight 10.890
alg straw
hash 0 # rjenkins1
item osd.12 weight 3.630
item osd.13 weight 3.630
item osd.14 weight 3.630
}
host pxmx2 {
id -5 # do not change unnecessarily
# weight 10.890
alg straw
hash 0 # rjenkins1
item osd.15 weight 3.630
item osd.16 weight 3.630
item osd.17 weight 3.630
}
host pxmx3 {
id -6 # do not change unnecessarily
# weight 10.890
alg straw
hash 0 # rjenkins1
item osd.18 weight 3.630
item osd.19 weight 3.630
item osd.20 weight 3.630
}
root default {
id -1 # do not change unnecessarily
# weight 98.070
alg straw
hash 0 # rjenkins1
item pxmx4 weight 32.700
item pxmx5 weight 32.700
item pxmx1 weight 10.890
item pxmx2 weight 10.890
item pxmx3 weight 10.890
}

# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}

# end crush map.


Any Recommendation or advise is most welcome, as performance is below average.

Thanks
 
1/ I would set your PG Number to 2048 (this shouldn't effect performance too much but is a better value than 1050 you have set)

2/ What tests are you doing to check performance and what results are you getting?

As your collocating the Journal on the same disks you will only get a max of 1/2 the performance of a single disk, as you will have a write overhead of 100%.
 
Hi Ashley and thanks for your reply.
Now for PG Numbers is 1024 better or 2048 ?
my :rados -p main bench 60 write --no-cleanup
Total time run: 80.255149
Total writes made: 1237
Write size: 4194304
Bandwidth (MB/sec): 61.6534
Stddev Bandwidth: 44.5561
Max bandwidth (MB/sec): 136
Min bandwidth (MB/sec): 0
Average IOPS: 15
Average Latency(s): 0.944963
Stddev Latency(s): 2.37736
Max latency(s): 24.7503
Min latency(s): 0.0487258



and my rados -p main bench 60 seq

Total time run: 12.882075
Total reads made: 1237
Read size: 4194304
Bandwidth (MB/sec): 384.1
Average IOPS: 96
Average Latency(s): 0.165585
Max latency(s): 12.6926
Min latency(s): 0.00433715

A.what do you think of having the Osd journal on the radi1 instead of the osd drive itself?
B. is having 2 different size of HDD matter?
C. should my pool perform better if size/min is 3/2 ?
 
Hi Ashley and thanks for your reply.
Now for PG Numbers is 1024 better or 2048 ?
my :rados -p main bench 60 write --no-cleanup
Total time run: 80.255149
Total writes made: 1237
Write size: 4194304
Bandwidth (MB/sec): 61.6534
Stddev Bandwidth: 44.5561
Max bandwidth (MB/sec): 136
Min bandwidth (MB/sec): 0
Average IOPS: 15
Average Latency(s): 0.944963
Stddev Latency(s): 2.37736
Max latency(s): 24.7503
Min latency(s): 0.0487258



and my rados -p main bench 60 seq

Total time run: 12.882075
Total reads made: 1237
Read size: 4194304
Bandwidth (MB/sec): 384.1
Average IOPS: 96
Average Latency(s): 0.165585
Max latency(s): 12.6926
Min latency(s): 0.00433715

A.what do you think of having the Osd journal on the radi1 instead of the osd drive itself?
B. is having 2 different size of HDD matter?
C. should my pool perform better if size/min is 3/2 ?

A) It could help depends how good the RAID1 disks are
B) No belongs they are setup correctly
C) Size of 3 you can expect slightly worse performance over 2.

The rados benchmark is what id expected as your hitting around 50% of the drives write performance.
 
Thanks Ashley
now will:
A. in the first 3 servers i have space adding 3 Drives each. Will that help in better performance?
B. how i can change osd journaling to the Raid 1. if you can hint me how to start?
C. what are other options for better performance? also what could be expected based to my hardware?

Thanks
 
Hi @hadizeid, just curious to ask. What's your random read and write speed like inside the VMs? What of the IO, is the performance good generally?

I am asking because I was thinking of a similar setup but wondering if performance will be an issue because of the HDD (no SSD).
 
@KHosting performance was very bad,
basically i was running on my setup: 8 VM
- Microsoft Exchange 2016
- Document Management Software
- 6 Active Directory Domain Controllers

i switched them all to one server having 8 *2TB HDD in Raid 10, with 64 GB RAM
the VMs are running smoothly now all on one machine.

Need to do some more research and testing on the ceph cluster before putting it back to production.
hoping the Guru's where to pinpoint us to the right direction.
will keep you posted if interested with any achievements
 
hi,

i would'nt state me a guru but what i figured is, 10g in a high IO intense setup is a must. as you got.
so what was your issues you run into before givin up?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!