is that best speed i can got?

mada

Member
Aug 16, 2017
99
3
13
36
Hello,

i'm doing Ceph not sure if this the best speed with my configuration? using 3 OSD with 5TB enterprise HardDrive and NVM P3700 Bluestore Journal/DB Disk, My concern that need a lot of space with lot of speed as well so if i add more 5tb version will the speed up? or add more journal?

Dual E5-2660
75 GB RAM
SM863 OS Host
Dual port Mellanox 56Gb/s
3 x OSD 5TB Hard drive Per server 9 total OSD
1 x P3700 Journal per node 3 total

the test result with rados bench -p test 60 write --no-cleanup
Code:
Total time run:         60.319902
Total writes made:      3802
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     252.122
Stddev Bandwidth:       19.7516
Max bandwidth (MB/sec): 284
Min bandwidth (MB/sec): 212
Average IOPS:           63
Stddev IOPS:            4
Max IOPS:               71
Min IOPS:               53
Average Latency(s):     0.253834
Stddev Latency(s):      0.131711
Max latency(s):         1.10938
Min latency(s):         0.0352605


"ceph osd perf" during write bench

Code:
osd commit_latency(ms) apply_latency(ms)
  8                 65                65
  7                 74                74
  6                 52                52
  3                  0                 0
  5                214               214
  0                 70                70
  1                 85                85
  2                 76                76
  4                196               196

rados bench -p rbd -t 16 60 seq

Code:
Total time run:       14.515122
Total reads made:     3802
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   1047.73
Average IOPS:         261
Stddev IOPS:          10
Max IOPS:             277
Min IOPS:             239
Average Latency(s):   0.0603855
Max latency(s):       0.374936
Min latency(s):       0.0161585

rados bench -p rbd -t 16 60 rand

Code:
Total time run:       60.076015
Total reads made:     19447
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   1294.83
Average IOPS:         323
Stddev IOPS:          20
Max IOPS:             364
Min IOPS:             259
Average Latency(s):   0.0488371
Max latency(s):       0.468844
Min latency(s):       0.00179505


Dual port 56Gb/s iperf test without bonding

Code:
 iperf -c 10.1.1.17
------------------------------------------------------------
Client connecting to 10.1.1.17, TCP port 5001
TCP window size: 2.50 MByte (default)
------------------------------------------------------------
[  3] local 10.1.1.16 port 54442 connected with 10.1.1.17 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  24.5 GBytes  21.1 Gbits/sec


ceph conf

Code:
[global]
     auth client required = cephx
     auth cluster required = cephx
     auth service required = cephx
     cluster network = 10.1.2.0/24
     fsid = 4cb23fa8-fab0-41d9-b334-02fe8dede3a8
     keyring = /etc/pve/priv/$cluster.$name.keyring
     mon allow pool delete = true
     osd journal size = 5120
     osd pool default min size = 2
     osd pool default size = 3
     public network = 10.1.1.0/24

[osd]
     keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.c16]
     host = c16
     mon addr = 10.1.1.16:6789

[mon.c18]
     host = c18
     mon addr = 10.1.1.18:6789

[mon.c17]
     host = c17
     mon addr = 10.1.1.17:6789

network/interfaces

Code:
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet manual
    dns-nameservers 8.8.8.8
# dns-* options are implemented by the resolvconf package, if installed

auto eth1
iface eth1 inet manual

auto eth2
iface eth2 inet static
    address  10.xxxx
    netmask  255.255.255.0

iface eth3 inet manual

auto eth4
iface eth4 inet manual

iface eth5 inet manual

auto ib0
iface ib0 inet static
    address  10.1.1.16
    netmask  255.255.255.0
    mtu 65520
    pre-up echo connected > /sys/class/net/ib0/mode

auto ib1
iface ib1 inet static
    address  10.1.2.16
    netmask  255.255.255.0
    mtu 65520
    pre-up echo connected > /sys/class/net/ib1/mode

auto vmbr0

ping -M do -s 8700

Code:
PING 10.1.1.18 (10.1.1.18) 8700(8728) bytes of data.
8708 bytes from 10.1.1.18: icmp_seq=1 ttl=64 time=0.191 ms
8708 bytes from 10.1.1.18: icmp_seq=2 ttl=64 time=0.084 ms
8708 bytes from 10.1.1.18: icmp_seq=3 ttl=64 time=0.173 ms
8708 bytes from 10.1.1.18: icmp_seq=4 ttl=64 time=0.222 ms
8708 bytes from 10.1.1.18: icmp_seq=5 ttl=64 time=0.145 ms
8708 bytes from 10.1.1.18: icmp_seq=6 ttl=64 time=0.197 ms
8708 bytes from 10.1.1.18: icmp_seq=7 ttl=64 time=0.126 ms
8708 bytes from 10.1.1.18: icmp_seq=8 ttl=64 time=0.042 ms
8708 bytes from 10.1.1.18: icmp_seq=9 ttl=64 time=0.163 ms
8708 bytes from 10.1.1.18: icmp_seq=10 ttl=64 time=0.203 ms
8708 bytes from 10.1.1.18: icmp_seq=11 ttl=64 time=0.081 ms
8708 bytes from 10.1.1.18: icmp_seq=12 ttl=64 time=0.131 ms
8708 bytes from 10.1.1.18: icmp_seq=13 ttl=64 time=0.165 ms
8708 bytes from 10.1.1.18: icmp_seq=14 ttl=64 time=0.220 ms
^C
--- 10.1.1.18 ping statistics ---
14 packets transmitted, 14 received, 0% packet loss, time 13305ms
rtt min/avg/max/mdev = 0.042/0.153/0.222/0.053 ms

pveversion --v

Code:
proxmox-ve: 5.2-2 (running kernel: 4.15.17-3-pve)
pve-manager: 5.2-2 (running version: 5.2-2/b1d1c7f4)
pve-kernel-4.15: 5.2-3
pve-kernel-4.15.17-3-pve: 4.15.17-12
ceph: 12.2.5-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-32
libpve-guest-common-perl: 2.0-16
libpve-http-server-perl: 2.0-9
libpve-storage-perl: 5.0-23
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 1.0.0-1
proxmox-widget-toolkit: 1.0-18
pve-cluster: 5.0-27
pve-container: 2.0-23
pve-docs: 5.2-4
pve-firewall: 3.0-11
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-5
qemu-server: 5.0-28
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
 
Last edited:
How much RAM do you have left after allocating to VMs? How many cores do you reserve? Ceph is very CPU dependent/intensive. What disks do you use exactly?

allocating VM = 0 nothing on the system
What disks do you use exactly = Seagate 7200rpm 256MB Cache 6Gb/s
 
Hmmm, you could consider doubling your MTU. What model router / switch do you use in order to connect the servers with each other?

i'm using Mellanox SX6025 Non-blocking Unmanaged 56Gb/s SDN Switch, not sure if this even will works if increases the MTU? i'm using
mtu 65520 is there is anyway to increases it? if yes to how much?
 
thats not bad at all. with such a small number of such slow drives, you're not likely to do much better; there may be room to double the iops but thats still molasses slow.

your network isnt the only bottleneck to consider.
 
thats not bad at all. with such a small number of such slow drives, you're not likely to do much better; there may be room to double the iops but thats still molasses slow.

your network isnt the only bottleneck to consider.

I can add more drives it will gives better performance? Also not sure do i need add more journal or one P3700 will be enough each node?
 
it seems quit strange that i got better performance with 2 x 6tb sata only while using Filestore instead of bluestore?!

rados bench -p test 60 write --no-cleanup

Code:
Total time run:         62.803659
Total writes made:      1696
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     108.019
Stddev Bandwidth:       136.58
Max bandwidth (MB/sec): 568
Min bandwidth (MB/sec): 0
Average IOPS:           27
Stddev IOPS:            34
Max IOPS:               142
Min IOPS:               0
Average Latency(s):     0.591435
Stddev Latency(s):      0.925619
Max latency(s):         7.59932
Min latency(s):         0.0153661


rados bench -p rbd -t 16 60 seq

Code:
Total time run:       4.398247
Total reads made:     1696
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   1542.43
Average IOPS:         385
Stddev IOPS:          29
Max IOPS:             417
Min IOPS:             355
Average Latency(s):   0.0394852
Max latency(s):       0.134471
Min latency(s):       0.00444684

rados bench -p rbd -t 16 60 rand

Code:
Total time run:       60.035321
Total reads made:     28940
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   1928.2
Average IOPS:         482
Stddev IOPS:          70
Max IOPS:             681
Min IOPS:             388
Average Latency(s):   0.0318467
Max latency(s):       0.128632
Min latency(s):       0.00369937


rados bench -p test1 60 write

Code:
Total time run:         60.785559
Total writes made:      2760
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     181.622
Stddev Bandwidth:       186.945
Max bandwidth (MB/sec): 1024
Min bandwidth (MB/sec): 0
Average IOPS:           45
Stddev IOPS:            46
Max IOPS:               256
Min IOPS:               0
Average Latency(s):     0.35141
Stddev Latency(s):      0.706639
Max latency(s):         4.23981
Min latency(s):         0.00781872
Cleaning up (deleting benchmark objects)
Removed 2760 objects
Clean up completed and total clean up time :0.409197
 
Last edited:
  • Like
Reactions: AlexLup

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!