Poor performance

Davidemvm

New Member
Apr 8, 2021
5
0
1
25
www.mvmnet.com
Hi, I am building my a Ceph cluster with Proxmox 6.3, and I am experiencing a low performance instead of the proxmox benchmark (https://www.proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark). Hope you can help me identify where is my bottleneck.

At this moment I am using 3 nodes, with 4 OSDs on each node (all SSD).

Specs per node:
DELL R730XD with 2x Xeon E5-2680 v4 2.40GHz
320 Gb DDR4
4x Samsung s883 960Gb for Ceph
1x Intel s3700 for Proxmox
2x Gigabit NIC 1Gb ( just 1 use for vm traffic and corosync )
2x Gigabit NIC 10Gb ( for ceph in LACP ) - mtu 9000
No journal
Switch ceph network: Cisco catalyst ws6509e ( just for test )


#rados bench -p ssd_pool 10 write
Code:
Object prefix: benchmark_data_pve02_620804

  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)

    0       0         0         0         0         0           -           0

    1      16       118       102   407.971       408   0.0841476    0.140718

    2      16       224       208   415.957       424    0.133007    0.146998

    3      16       333       317   422.611       436   0.0739292    0.147036

    4      16       437       421   420.941       416    0.157942    0.147613

    5      16       549       533   426.338       448    0.306489    0.147389

    6      16       659       643   428.602       440    0.150727    0.147659

    7      16       770       754   430.793       444    0.115754    0.146412

    8      16       878       862   430.936       432    0.158047    0.146082

    9      16       991       975   433.267       452    0.110943      0.1465

   10      16      1097      1081   432.335       424    0.208018     0.14674

Total time run:         10.1027

Total writes made:      1097

Write size:             4194304

Object size:            4194304

Bandwidth (MB/sec):     434.338

Stddev Bandwidth:       14.2922

Max bandwidth (MB/sec): 452

Min bandwidth (MB/sec): 408

Average IOPS:           108

Stddev IOPS:            3.57305

Max IOPS:               113

Min IOPS:               102

Average Latency(s):     0.146708

Stddev Latency(s):      0.0684433

Max latency(s):         0.543467

Min latency(s):         0.0493101



# pveversion -v
Code:
proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)

pve-manager: 6.3-2 (running version: 6.3-2/22f57405)

pve-kernel-5.4: 6.3-1

pve-kernel-helper: 6.3-1

pve-kernel-5.4.73-1-pve: 5.4.73-1

ceph: 15.2.9-pve1

ceph-fuse: 15.2.9-pve1

corosync: 3.0.4-pve1

criu: 3.11-3

glusterfs-client: 5.5-3

ifupdown: 0.8.35+pve1

ksm-control-daemon: 1.3-1

libjs-extjs: 6.0.1-10

libknet1: 1.16-pve1

libproxmox-acme-perl: 1.0.5

libproxmox-backup-qemu0: 1.0.2-1

libpve-access-control: 6.1-3

libpve-apiclient-perl: 3.0-3

libpve-common-perl: 6.2-6

libpve-guest-common-perl: 3.1-3

libpve-http-server-perl: 3.0-6

libpve-storage-perl: 6.3-1

libqb0: 1.0.5-1

libspice-server1: 0.14.2-4~pve6+1

lvm2: 2.03.02-pve4

lxc-pve: 4.0.3-1

lxcfs: 4.0.3-pve3

novnc-pve: 1.1.0-1

proxmox-backup-client: 1.0.5-1

proxmox-mini-journalreader: 1.1-1

proxmox-widget-toolkit: 2.4-3

pve-cluster: 6.2-1

pve-container: 3.3-1

pve-docs: 6.3-1

pve-edk2-firmware: 2.20200531-1

pve-firewall: 4.1-3

pve-firmware: 3.1-3

pve-ha-manager: 3.1-1

pve-i18n: 2.2-2

pve-qemu-kvm: 5.1.0-7

pve-xtermjs: 4.7.0-3

qemu-server: 6.3-1

smartmontools: 7.1-pve2

spiceterm: 3.1-1

vncterm: 1.6-2

zfsutils-linux: 0.8.5-pve1

# cat /etc/pve/ceph.conf
Code:
[global]

 debug asok = 0/0

 debug auth = 0/0

 debug buffer = 0/0

 debug client = 0/0

 debug context = 0/0

 debug crush = 0/0

 debug filer = 0/0

 debug filestore = 0/0

 debug finisher = 0/0

 debug heartbeatmap = 0/0

 debug journal = 0/0

 debug journaler = 0/0

 debug lockdep = 0/0

 debug mds = 0/0

 debug mds balancer = 0/0

 debug mds locker = 0/0

 debug mds log = 0/0

 debug mds log expire = 0/0

 debug mds migrator = 0/0

 debug mon = 0/0

 debug monc = 0/0

 debug ms = 0/0

 debug objclass = 0/0

 debug objectcacher = 0/0

 debug objecter = 0/0

 debug optracker = 0/0

 debug osd = 0/0

 debug paxos = 0/0

 debug perfcounter = 0/0

 debug rados = 0/0

 debug rbd = 0/0

 debug rgw = 0/0

 debug throttle = 0/0

 debug timer = 0/0

 debug tp = 0/0


         auth_client_required = cephx

         auth_cluster_required = cephx

         auth_service_required = cephx

         cluster_network = 10.10.10.151/24

         fsid = 3c727e0a-14f4-40d6-9346-6426a3c7d5fa

         mon_allow_pool_delete = true

         mon_host = 10.10.10.151 10.10.10.152 10.10.10.153

         osd_pool_default_min_size = 2

         osd_pool_default_size = 3

         public_network = 10.10.10.151/24


[client]

         keyring = /etc/pve/priv/$cluster.$name.keyring


[mon.pve01]

         public_addr = 10.10.10.151


[mon.pve02]

         public_addr = 10.10.10.152


[mon.pve03]

         public_addr = 10.10.10.153


# cat /etc/network/interfaces
Code:
auto lo

iface lo inet loopback


iface eno4 inet manual


iface eno3 inet manual


auto eno1

iface eno1 inet manual

        mtu 9000


auto eno2

iface eno2 inet manual


auto bond0

iface bond0 inet static

        address 10.10.10.152/24

        bond-slaves eno1 eno2

        bond-miimon 100

        bond-mode 802.3ad

        mtu 9000


auto vmbr0

iface vmbr0 inet static

        address 192.***.**.212/24

        gateway 192.***.**.3

        bridge-ports eno4

        bridge-stp off

        bridge-fd 0


# iperf -c 10.10.10.151
Code:
------------------------------------------------------------

Client connecting to 10.10.10.151, TCP port 5001

TCP window size:  715 KByte (default)

------------------------------------------------------------

[  3] local 10.10.10.152 port 50168 connected with 10.10.10.151 port 5001

[ ID] Interval       Transfer     Bandwidth

[  3]  0.0-10.0 sec  11.4 GBytes  9.82 Gbits/sec



# ceph status
Code:
cluster:

    id:     3c727e0a-14f4-40d6-9346-6426a3c7d5fa

    health: HEALTH_OK


  services:

    mon: 3 daemons, quorum pve01,pve02,pve03 (age 17h)

    mgr: pve01(active, since 17h), standbys: pve02, pve03

    osd: 12 osds: 12 up (since 17h), 12 in (since 2d)


  data:

    pools:   2 pools, 33 pgs

    objects: 12 objects, 0 B

    usage:   12 GiB used, 10 TiB / 10 TiB avail

    pgs:     33 active+clean

Thank you very much.
 
i see no diference

# rados bench -p ssd_pool 10 write -t 64
Code:
Object prefix: benchmark_data_pve02_680246
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      63       124        61   243.979       244    0.582006    0.563014
    2      63       231       168   335.962       428    0.576075    0.586621
    3      63       341       278   370.619       440    0.621125    0.579898
    4      63       452       389    388.95       444    0.504123    0.580806
    5      63       566       503   402.346       456    0.564579    0.579315
    6      63       672       609   405.943       424    0.556735    0.582073
    7      63       776       713    407.37       416    0.524292    0.582356
    8      63       888       825   412.439       448     0.58721      0.5853
    9      63       995       932   414.162       428    0.610616    0.586792
   10      63      1113      1050   419.938       472    0.515881    0.585366
Total time run:         10.363
Total writes made:      1114
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     429.99
Stddev Bandwidth:       64.0278
Max bandwidth (MB/sec): 472
Min bandwidth (MB/sec): 244
Average IOPS:           107
Stddev IOPS:            16.0069
Max IOPS:               118
Min IOPS:               61
Average Latency(s):     0.569549
Stddev Latency(s):      0.133475
Max latency(s):         1.63962
Min latency(s):         0.111882
 
Ok, here we are, thanks

# rados -p ssd_pool bench 10 write
Code:
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_pve01_127778
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16       221       205   819.905       820   0.0766385   0.0726859
    2      16       428       412   823.894       828    0.114741   0.0749207
    3      16       658       642   855.899       920   0.0498497   0.0739536
    4      16       868       852   851.888       840    0.078703   0.0746972
    5      16      1086      1070   855.885       872   0.0604758   0.0739914
    6      16      1307      1291   860.557       884    0.060381   0.0737078
    7      16      1517      1501   857.606       840   0.0630636   0.0740896
    8      16      1735      1719    859.39       872   0.0351353   0.0739321
    9      16      1958      1942   863.001       892   0.0887309   0.0737401
   10      16      2172      2156    862.29       856   0.0804083   0.0738416
Total time run:         10.0431
Total writes made:      2172
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     865.074
Stddev Bandwidth:       31.3943
Max bandwidth (MB/sec): 920
Min bandwidth (MB/sec): 820
Average IOPS:           216
Stddev IOPS:            7.84857
Max IOPS:               230
Min IOPS:               205
Average Latency(s):     0.0738051
Stddev Latency(s):      0.0331146
Max latency(s):         0.2367
Min latency(s):         0.0181323
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!