Proxmox VE Ceph Benchmark 2018/02

Discussion in 'Proxmox VE: Installation and configuration' started by martin, Feb 27, 2018.

  1. udo

    udo Well-Known Member
    Proxmox Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,835
    Likes Received:
    159
    Hi,
    you read the data, which you write before (from this node) to the pool - if you have read all available data, the benchmark stop.
    Due to faster reading than writing, the job is done in 32 seconds.

    Udo
     
  2. victorhooi

    victorhooi Member

    Joined:
    Apr 3, 2018
    Messages:
    132
    Likes Received:
    6
    Got it.

    Is there any way to figure out what the bottleneck is in the above? (E.g. network, storage drives, or RAM etc) Or if we've hit some hard limitation in Ceph at this scale etc.
     
  3. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,346
    Likes Received:
    213
    You reached your network limits, compare the results from our benchmark paper. To really get the IO/s out of your NVMe drives, you should consider upgrading to 40GbE or even 100GbE (3 nodes, no switch needed).

    Possibly due to the read limitation of your LVM storage, but this is just a shot in the dark.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  4. Alexander Marek

    Alexander Marek New Member
    Proxmox Subscriber

    Joined:
    Apr 6, 2018
    Messages:
    8
    Likes Received:
    0
    Did anybody compare the SM883 with SM863?
    Seems like SM863 is not available on the market anymore!

    I guess performance is approximately the same because it is just a newer modell?

    Thank you in advance

    BR
     
  5. Ronny

    Ronny Member

    Joined:
    Sep 12, 2017
    Messages:
    37
    Likes Received:
    0
    and what is with the Samsung PM883 - any experience with this one?

    regards
    Ronny
     
  6. fips

    fips Member

    Joined:
    May 5, 2014
    Messages:
    141
    Likes Received:
    5
    Here the results of my last benchmarks:

    Code:
    Model        Size    TBW    BW        IOPS
    Intel DC S4500    480GB    900TB    62,4 MB/s    15,0k
    Samsung PM883    240GB    341TB    67,2 MB/s    17,2k
     
  7. Alibek

    Alibek Member

    Joined:
    Jan 13, 2017
    Messages:
    66
    Likes Received:
    5
    Is the next limitation of Ceph is true:
    ~10k IOPS per OSD
    ?
     
  8. Tacid

    Tacid New Member

    Joined:
    Aug 30, 2018
    Messages:
    3
    Likes Received:
    2
    I've got 15k-16k random write IOPS in 16 threads with 4k blocks per bluestore OSD on 40G IB net, but that is the good result, 10k per OSD is not bad. With io-thread=1 I can get only 1400-1600 IOPS of random writes.
    The problem here is the OSD code latencies an WA produced on every operation. OSD itself can take 700 μs (0,7ms) just to execute one IO operation, so even on RAM disk, where kernel IO operations in <10μs you can't barely reach 3k IOPS with a best high freq CPU.

    P.S. Random read is about 35-50k IOPS on the same system, but all of them is just OSD performance mesure (data was read from OSD cache, when testing not disk IO is done)
     
  9. oversite

    oversite New Member

    Joined:
    Jul 13, 2011
    Messages:
    3
    Likes Received:
    0
    I got less than expected results from sm883 (i thought they would be at least as good as SM863) so I ended up using PM963 and even more so PM983. I cannot tell really if it's the nvme or the ssd themselves but these are also considered to read intensive disks but i get much better performance and low latency compared to the sm883. I did not study very much but i suppose the SM 883 and 863 are MLC and the PM983 TLC byt nevertheless they work much better form me used by ceph. I am using the 1TB and one osd on each, no db or wal.
    /Hans

     
  10. Alibek

    Alibek Member

    Joined:
    Jan 13, 2017
    Messages:
    66
    Likes Received:
    5
    Thanks, but no this is not good... Currently single method to maximize utilization of NVMe device (for example Optane with ~500k iops rw with 4kb blocks) is to split more than 10-50 OSD per device. And of course we need wait implementation of io_uring: https://github.com/ceph/ceph/pull/27392
     
  11. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,346
    Likes Received:
    213
    Or using dpdk to access the NVMe devices directly.
    https://github.com/ceph/dpdk
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
    Alibek likes this.
  12. Alibek

    Alibek Member

    Joined:
    Jan 13, 2017
    Messages:
    66
    Likes Received:
    5
    MikeWebb likes this.
  13. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,346
    Likes Received:
    213
    Software with dpdk in mind needs to be recompiled for the software version used and therefore is not a stock solution.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  14. Alibek

    Alibek Member

    Joined:
    Jan 13, 2017
    Messages:
    66
    Likes Received:
    5
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice