1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Ceph - Bad performance in qemu-guests

Discussion in 'Proxmox VE: Installation and configuration' started by raoro, Mar 23, 2015.

  1. raoro

    raoro New Member

    Joined:
    Jan 22, 2014
    Messages:
    12
    Likes Received:
    1
    Hello everyone,

    first of all I want to say thank you to each and everyone in this community!
    I've been a long time reader ( and user of pve ) and could get so much valuable information from this forum!

    Right now the deployment of the Ceph Cluster gives me some trouble.
    We were using DRBD but since we are expanding and the are more nodes in the pve-cluster we decided to switch to Ceph.

    The 3 Ceph-Server-Nodes are connected via a 6*GbE-LACP-Bond with Jumbo-Frames over two stacked switches and the Ceph traffic is on a seperate VLAN.
    Currently there are 9 OSDs (3*15K SAS with BBWC per host).
    The journal is 10GB per OSD and on LVM-Volumes of a SSD-RAID1.
    pg_num and pgp_num are set to 512 for the pool.
    Replication is 3 and the CRUSH-Map is configured to distribute the requests over the 3 hosts.

    The performance of the rados benchmarks is good:
    rados -p test bench 60 write -t 8 --no-cleanup
    Code:
    Total time run:         60.187142
    Total writes made:      1689
    Write size:             4194304
    Bandwidth (MB/sec):     112.250 
    
    Stddev Bandwidth:       48.3496
    Max bandwidth (MB/sec): 176
    Min bandwidth (MB/sec): 0
    Average Latency:        0.28505
    Stddev Latency:         0.236462
    Max latency:            1.91126
    Min latency:            0.053685
    
    rados -p test bench 60 seq -t 8
    Code:
    Total time run:        30.164931
    Total reads made:      1689
    Read size:             4194304
    Bandwidth (MB/sec):    223.969 
    
    Average Latency:       0.142613
    Max latency:           2.78286
    Min latency:           0.003772
    
    rados -p test bench 60 rand -t 8
    Code:
    Total time run:        60.287489
    Total reads made:      4524
    Read size:             4194304
    Bandwidth (MB/sec):    300.162 
    
    Average Latency:       0.106474
    Max latency:           0.768564
    Min latency:           0.003791
    
    What makes me wonder is the "Min bandwidth (MB/sec): 0" and "Max latency: 1.91126" at write - benchmark.

    I've modified the Linux autotuning TCP buffer limits and the rx/tx ring parameters of the Network-Cards (all Intel), which increased the bandwidth, but didn't help with the latency of small IO.

    For example in a wheezy-kvm-guest:
    Code:
    dd if=/dev/zero of=/tmp/test bs=512 count=1000 oflag=direct,dsync
    512000 Bytes (512 kB) kopiert, 9,99445 s, 51,2 kB/s
    
    dd if=/dev/zero of=/tmp/test bs=4k count=1000 oflag=direct,dsync
    4096000 Bytes (4,1 MB) kopiert, 10,0949 s, 406 kB/s
    I also did put flashcache in front of the OSDs but this didn't help much and since there's 1GB of Cache from the RAID-Controller in front of the OSDs I wonder why this is so slow in the guests?
    Compared to the raw performance of the SSDs and the OSDs this is realy bad...
    Code:
    dd if=/dev/zero of=/var/lib/ceph/osd/ceph-2/test bs=512 count=1000 oflag=direct,dsync
    512000 Bytes (512 kB) kopiert, 0,120224 s, 4,3 MB/s
    
    dd if=/dev/zero of=/var/lib/ceph/osd/ceph-2/test bs=4k count=1000 oflag=direct,dsync
    4096000 Bytes (4,1 MB) kopiert, 0,137924 s, 29,7 MB/s
    
    
    dd if=/dev/zero of=/mnt/ssd-test/test bs=512 count=1000 oflag=direct,dsync
    512000 Bytes (512 kB) kopiert, 0,147097 s, 3,5 MB/s
    
    dd if=/dev/zero of=/mnt/ssd-test/test bs=4k count=1000 oflag=direct,dsync
    4096000 Bytes (4,1 MB) kopiert, 0,235434 s, 17,4 MB/s
    
    Running fio from a node directly via rbd gives expected results, but also with some serious deviations:
    Code:
    rbd_iodepth32: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
    fio-2.2.3-1-gaad9
    Starting 1 process
    rbd engine: RBD version: 0.1.8
    Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/13271KB/0KB /s] [0/3317/0 iops] [eta 00m:00s]
    rbd_iodepth32: (groupid=0, jobs=1): err= 0: pid=849098: Mon Mar 23 20:08:25 2015
      write: io=2048.0MB, bw=12955KB/s, iops=3238, runt=161874msec
        slat (usec): min=37, max=27268, avg=222.48, stdev=326.17
        clat (usec): min=13, max=544666, avg=7937.85, stdev=11891.77
         lat (msec): min=1, max=544, avg= 8.16, stdev=11.88
    
    Thanks for reading so far :)
    I know this is my first post, but I have really run out of options here and would really appreciate your help.

    My question are:
    Why is the performance in the guests so much worse?
    What can we do to enhance this for Linux as well as Windows guests?

    Thanks for reading this big post and I hope we can have a nice discussion with a good outcome for everyone, since this is, in my point of view a common issue for a few users.
     
    #1 raoro, Mar 23, 2015
    Last edited: Mar 25, 2015
  2. udo

    udo Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,219
    Likes Received:
    81
    Re: Ceph - Bad performance with small IO

    Hi,
    latency is an problem with ceph... but there are some things for tuning.

    Which version of ceph do you use? Since firefly rbd_cache is enabled by default and this should be, because rbd_cache speed up small IOs if possible (put small IOs together to less bigger IOs).

    Do you use an bigger read_ahead_cache (4096) inside the VM? Very important!!

    Are your measurerments (much) better if you disable scrubbing ("ceph osd set noscrub" + "ceph osd set nodeep-scrub")? In this case, there are settings to minimize the scrubbing impact.

    BTW. I had bad experiences with filebased journaling on lvm! If you have Intel SSDs (DC S3700) you should try this for journaling.

    Udo

    EDIT: For my config I switched from XFS to ext4 on the OSDs and the latency are app. 50%.
     
  3. raoro

    raoro New Member

    Joined:
    Jan 22, 2014
    Messages:
    12
    Likes Received:
    1
    Re: Ceph - Bad performance with small IO

    Hi Udo,

    thanks for your fast reply!

    I use ceph giant - version 0.87.1.
    The Journals are symlinked to LVM-Volumes on Crucial M500 in RAID1.
    Is this then still file-based?

    The read ahead cache is already set to a higher value. If you mean:
    Code:
    blockdev --getra /dev/vda
    32768
    I tested the "noscrub" + "nodeep-scrub" - settings but performance is about the same.

    The rbd_cache seems to be active:
    Code:
    ceph --admin-daemon /var/run/ceph/ceph-osd.2.asok config show | grep rbd_cache
    
      "rbd_cache": "true",
      "rbd_cache_writethrough_until_flush": "true",
      "rbd_cache_size": "33554432",
      "rbd_cache_max_dirty": "25165824",
      "rbd_cache_target_dirty": "16777216",
      "rbd_cache_max_dirty_age": "1",
      "rbd_cache_max_dirty_object": "0",
      "rbd_cache_block_writes_upfront": "false",
    The OSDs are formated with xfs because I read everywhere that this is recommended?
    Would like to go with btrfs but not until it is considered stable... :-/
     
  4. udo

    udo Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,219
    Likes Received:
    81
    Re: Ceph - Bad performance with small IO

    ok
    no, in this case it's on blockdevice like partition based journaling but with the lvm layer between ceph and blockdevice. Are you sure, that the crucial work well over a long time (you don't have trim).
    can you add anothe LV on the SSD and test the performance?
    Code:
    http://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/
    
    no, I mean
    Code:
    echo 4096 > /sys/block/vda/queue/read_ahead_kb
    
    give this an try - you will see an huge different on reads.
    looks good.
    xfs is the standard, but have you checked your fragmentation (we had up to 20%).
    If you use the right settings, you shouldn't have trouble with fragmentation
    Code:
    osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
    filestore_xfs_extsize = true
    
    filestore_xfs_extsize is an new parameter (arrived after we switched to ext4 - should work fine, but i don't have experiences with that).

    Udo
     
  5. phildefer

    phildefer New Member

    Joined:
    Aug 8, 2009
    Messages:
    3
    Likes Received:
    0
    Re: Ceph - Bad performance with small IO

    Hi

    Do you have updated the client too ? When i updated my firefly client to giant client the performance was better
     
  6. raoro

    raoro New Member

    Joined:
    Jan 22, 2014
    Messages:
    12
    Likes Received:
    1
    Re: Ceph - Bad performance with small IO

    The client is also giant and there was some improvement, but still the performance of small io is really bad.
    For example an ATTO Benchmark in a Windows guest:
    ceph-bench-windows-guest.png

    It seems to hit a limitation?

    I've been reading Sebastian Han's blog extensively - lots of good information there!
    Got lots of my Ceph knowledge and inspiration from him! :)

    Looking at the value of the M550 I can get your concerns but the M500 is different.
    I did that exact test before choosing them as journal SSDs: Regard: numjosbs=4
    Code:
    journal-test: (groupid=0, jobs=4): err= 0: pid=654178: Tue Mar 24 12:20:43 2015
      write: io=4419.8MB, bw=75404KB/s, iops=18850, runt= 60012msec
        clat (usec): min=68, max=138432, avg=209.26, stdev=1273.33
         lat (usec): min=68, max=138432, avg=209.59, stdev=1273.33
        clat percentiles (usec):
         |  1.00th=[   98],  5.00th=[  108], 10.00th=[  112], 20.00th=[  118],
         | 30.00th=[  124], 40.00th=[  133], 50.00th=[  141], 60.00th=[  147],
         | 70.00th=[  157], 80.00th=[  169], 90.00th=[  189], 95.00th=[  201],
         | 99.00th=[  262], 99.50th=[  588], 99.90th=[15168], 99.95th=[16320],
         | 99.99th=[59648]
        bw (KB  /s): min= 1009, max=32624, per=25.10%, avg=18922.50, stdev=9588.83
        lat (usec) : 100=1.37%, 250=97.53%, 500=0.57%, 750=0.07%, 1000=0.04%
        lat (msec) : 2=0.05%, 4=0.02%, 10=0.03%, 20=0.30%, 50=0.02%
        lat (msec) : 100=0.02%, 250=0.01%
      cpu          : usr=1.88%, sys=10.24%, ctx=2263458, majf=0, minf=109
      IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued    : total=r=0/w=1131284/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
         latency   : target=0, window=0, percentile=100.00%, depth=1
    
    Run status group 0 (all jobs):
      WRITE: io=4419.8MB, aggrb=75403KB/s, minb=75403KB/s, maxb=75403KB/s, mint=60012msec, maxt=60012msec
    Increasing the read_ahead_cache did not really improve things and I would like to increase the performance for non-linux guests as well:
    Code:
    dd if=/dev/vda of=/dev/null bs=512 count=1000 iflag=direct
    512000 Bytes (512 kB) kopiert, 1,02483 s, 500 kB/s
    
    echo 4096 > /sys/block/vda/queue/read_ahead_kb
    
    dd if=/dev/vda of=/dev/null bs=512 count=1000 iflag=direct
    512000 Bytes (512 kB) kopiert, 0,950503 s, 539 kB/s
    Obviously I didn't use the right settings for XFS -.-
    Code:
    xfs_db -c frag -r /dev/sdb1
    actual 177046, ideal 127871, fragmentation factor 27.78%
    This is just one OSD, the others are all about 25%
    But does this affect performance so much?
    There is plenty of space available and the writes should hit the SSD-Journals first and then the Cache of the SAS-Controller.

    My current mount-options are:
    Code:
    osd mount options xfs = "rw,noatime,nobarrier,logbsize=256k,logbufs=8,inode64"
    I'll try your purposed ones with filestore_xfs_extsize and report back.

    Thanks so far for your input and suggestions udo and phildefer!
     
  7. raoro

    raoro New Member

    Joined:
    Jan 22, 2014
    Messages:
    12
    Likes Received:
    1
    Re: Ceph - Bad performance with small IO

    Just finished defragmenting the OSDs and remounting them with the new mount options.
    Now the fragmentation factor is at maximum 0.34% over all OSDs.
    Also injected the new setting filestore_xfs_extsize.

    But still no improvement :-(

    Just to make sure, below are the tunings I already made to Ceph.
    These are high over the default settings.
    Is there maybe something wrong?
    Code:
            osd recovery max active = 1
            osd max backfills = 1
            osd_disk_threads = 4
            osd_op_threads = 4
            osd target transaction size = 50
            osd mkfs options xfs = "-f -i size=2048"
            osd mount options xfs = "rw,noatime,nobarrier,logbsize=256k,logbufs=8,delaylog,inode64"
    
            filestore_xfs_extsize = true
            filestore max sync interval = 30
            filestore min sync interval = 29
            filestore xattr use omap = true
            filestore flusher = false
            filestore queue max ops = 10000
            filestore queue max bytes = 536870912
            filestore queue committing max ops = 2000
            filestore queue committing max bytes = 536870912
    
    What makes me wonder is the limits the ATTO-Benchmark reaches.
    Is there some way to tune the librbd-access pve uses?
     
  8. udo

    udo Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,219
    Likes Received:
    81
    Re: Ceph - Bad performance with small IO

    Hi,
    "osd_disk_threads = 4" mean 4 threads for housekeeping (scrubbing) - I would leaf this value on 1 (have do the same a time ago).
    "filestore xattr use omap = true" is AFAIK only needed with ext4 (and cephfs?!).

    Have you proof that "filestore min sync interval = 29" is an good idea? I have the default min/max 0.01/10.

    Most of the settings I leaf at default...

    Udo
     
  9. raoro

    raoro New Member

    Joined:
    Jan 22, 2014
    Messages:
    12
    Likes Received:
    1
    Re: Ceph - Bad performance with small IO

    I reverted the settings to default and there is no big difference.
    The Min/Max sync intervals were an experiment. Read this on the ceph-users mailing list,
    but obviously it didn't help much. It was for a much bigger cluster.

    The strange thing really is that with fio benchmarks directly with ioengine=rbd on one of the nodes I get about 1600 IOPS, while in one of the qemu-guests it's just about 110.
    Thats more than 10 times!
    Are we missing here something?

    Guest:
    Code:
    fio --filename=/tmp/test --direct=1 --sync=1 --rw=randwrite --bs=4k --numjobs=1 --iodepth=32 --runtime=60 --time_based --group_reporting --name=iotest
    
    iotest: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=32
    fio-2.2.3-1-gaad9
    Starting 1 process
    Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/480KB/0KB /s] [0/120/0 iops] [eta 00m:00s]
    iotest: (groupid=0, jobs=1): err= 0: pid=27389: Tue Mar 24 17:15:41 2015
      write: io=26876KB, bw=458668B/s, iops=111, runt= 60002msec
        clat (msec): min=4, max=2200, avg= 8.92, stdev=31.90
         lat (msec): min=4, max=2200, avg= 8.92, stdev=31.90
        clat percentiles (msec):
         |  1.00th=[    6],  5.00th=[    6], 10.00th=[    6], 20.00th=[    7],
         | 30.00th=[    7], 40.00th=[    8], 50.00th=[    8], 60.00th=[    8],
         | 70.00th=[    8], 80.00th=[    9], 90.00th=[   10], 95.00th=[   12],
         | 99.00th=[   31], 99.50th=[   56], 99.90th=[  249], 99.95th=[  523],
         | 99.99th=[ 2212]
        bw (KB  /s): min=    1, max=  718, per=100.00%, avg=470.16, stdev=162.94
        lat (msec) : 10=92.90%, 20=5.34%, 50=1.19%, 100=0.34%, 250=0.13%
        lat (msec) : 500=0.03%, 750=0.01%, 1000=0.03%, >=2000=0.01%
      cpu          : usr=0.12%, sys=0.63%, ctx=13454, majf=0, minf=26
      IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued    : total=r=0/w=6719/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
         latency   : target=0, window=0, percentile=100.00%, depth=32
    
    Run status group 0 (all jobs):
      WRITE: io=26876KB, aggrb=447KB/s, minb=447KB/s, maxb=447KB/s, mint=60002msec, maxt=60002msec
    
    Disk stats (read/write):
      vda: ios=0/20185, merge=0/6763, ticks=0/59140, in_queue=59124, util=100.00%
    Host:
    Code:
    fio rbd.fio
    rbd_iodepth32: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
    fio-2.2.3-1-gaad9
    Starting 1 process
    rbd engine: RBD version: 0.1.8
    Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/5475KB/0KB /s] [0/1368/0 iops] [eta 00m:00s]
    rbd_iodepth32: (groupid=0, jobs=1): err= 0: pid=929589: Tue Mar 24 17:21:30 2015
      write: io=387752KB, bw=6437.3KB/s, iops=1609, runt= 60236msec
        slat (usec): min=39, max=12393, avg=196.80, stdev=259.87
        clat (usec): min=122, max=3107.3K, avg=18213.42, stdev=85664.74
         lat (msec): min=1, max=3107, avg=18.41, stdev=85.66
        clat percentiles (msec):
         |  1.00th=[    3],  5.00th=[    3], 10.00th=[    4], 20.00th=[    4],
         | 30.00th=[    5], 40.00th=[    5], 50.00th=[    6], 60.00th=[    7],
         | 70.00th=[    8], 80.00th=[    9], 90.00th=[   15], 95.00th=[   42],
         | 99.00th=[  359], 99.50th=[  611], 99.90th=[ 1467], 99.95th=[ 1614],
         | 99.99th=[ 1795]
        bw (KB  /s): min=  242, max=17984, per=100.00%, avg=7601.46, stdev=5125.56
        lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
        lat (msec) : 2=0.59%, 4=23.04%, 10=60.30%, 20=8.63%, 50=2.79%
        lat (msec) : 100=1.71%, 250=1.70%, 500=0.56%, 750=0.43%, 1000=0.04%
        lat (msec) : 2000=0.19%, >=2000=0.01%
      cpu          : usr=12.10%, sys=1.64%, ctx=250972, majf=0, minf=3941
      IO depths    : 1=0.1%, 2=0.4%, 4=1.6%, 8=9.3%, 16=79.3%, 32=9.3%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=93.9%, 8=1.6%, 16=2.5%, 32=2.0%, 64=0.0%, >=64=0.0%
         issued    : total=r=0/w=96938/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
         latency   : target=0, window=0, percentile=100.00%, depth=32
    
    Run status group 0 (all jobs):
      WRITE: io=387752KB, aggrb=6437KB/s, minb=6437KB/s, maxb=6437KB/s, mint=60236msec, maxt=60236msec
    
    Disk stats (read/write):
        dm-0: ios=30/1304, merge=0/0, ticks=10/74, in_queue=84, util=2.85%, aggrios=428/259124, aggrmerge=0/774, aggrticks=64/6555, aggrin_queue=6563, aggrutil=100.00%
      sdd: ios=428/259124, merge=0/774, ticks=64/6555, in_queue=6563, util=100.00%
    Content of rbd.fio:
    Code:
    [global]
    ioengine=rbd
    clientname=admin
    pool=test
    rbdname=fio_test
    invalidate=0    # mandatory
    rw=randwrite
    bs=4k
    sync=1
    runtime=60
    
    [rbd_iodepth32]
    iodepth=32
     
  10. mir

    mir Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 14, 2012
    Messages:
    3,370
    Likes Received:
    80
    Re: Ceph - Bad performance with small IO

    How is your guest configured?
    Paste /etc/pve/qemu-server/[uid].conf
     
  11. spirit

    spirit Well-Known Member

    Joined:
    Apr 2, 2010
    Messages:
    2,995
    Likes Received:
    72
    Re: Ceph - Bad performance with small IO

    here my ssd config tuning

    Code:
    >> [global]
    >>
    >>          filestore_xattr_use_omap = true
    >>
    >>          debug_lockdep = 0/0
    >>          debug_context = 0/0
    >>          debug_crush = 0/0
    >>          debug_buffer = 0/0
    >>          debug_timer = 0/0
    >>          debug_filer = 0/0
    >>          debug_objecter = 0/0
    >>          debug_rados = 0/0
    >>          debug_rbd = 0/0
    >>          debug_journaler = 0/0
    >>          debug_objectcatcher = 0/0
    >>          debug_client = 0/0
    >>          debug_osd = 0/0
    >>          debug_optracker = 0/0
    >>          debug_objclass = 0/0
    >>          debug_filestore = 0/0
    >>          debug_journal = 0/0
    >>          debug_ms = 0/0
    >>          debug_monc = 0/0
    >>          debug_tp = 0/0
    >>          debug_auth = 0/0
    >>          debug_finisher = 0/0
    >>          debug_heartbeatmap = 0/0
    >>          debug_perfcounter = 0/0
    >>          debug_asok = 0/0
    >>          debug_throttle = 0/0
    >>          debug_mon = 0/0
    >>          debug_paxos = 0/0
    >>          debug_rgw = 0/0
    >>          osd_op_threads = 5
    >>          osd_op_num_threads_per_shard = 1
    >>          osd_op_num_shards = 25
    >>          #osd_op_num_sharded_pool_threads = 25
    >>          filestore_op_threads = 4
    >>
    >>          ms_nocrc = true
    >>          filestore_fd_cache_size = 64
    >>          filestore_fd_cache_shards = 32
    >>          cephx sign messages = false
    >>          cephx require signatures = false
    >>
    >>          ms_dispatch_throttle_bytes = 0
    >>          throttler_perf_counter = false
    >>
    >>
    >> [osd]
    >>          osd_client_message_size_cap = 0
    >>          osd_client_message_cap = 0
    >>          osd_enable_op_tracker = false
    >>
    
    
    disabling debug, cephx, and sharding is really helping.

    Also, please test your ssd for journal with o_dsync
    http://www.sebastien-han.fr/blog/20...-if-your-ssd-is-suitable-as-a-journal-device/
    consumer ssd drives are pretty shitty for this.
     
  12. raoro

    raoro New Member

    Joined:
    Jan 22, 2014
    Messages:
    12
    Likes Received:
    1
    Re: Ceph - Bad performance with small IO

    Here's the config of the exampled linux guest:
    Code:
    balloon: 1024
    boot: dcn
    bootdisk: virtio0
    cores: 4
    ide2: none,media=cdrom
    memory: 4096
    name: dios
    net0: virtio=82:65:63:AF:2E:CF,bridge=vmbr0
    onboot: 1
    ostype: l26
    sockets: 1
    tablet: 0
    vga: qxl
    virtio0: ceph_images:vm-100-disk-1,size=15G
    
    And for the exampled windows guest:
    Code:
    bootdisk: virtio0
    cores: 2
    ide2: none,media=cdrom
    memory: 2048
    name: avmc
    net0: virtio=0F:0E:6E:EF:69:AD,bridge=vmbr11
    ostype: win7
    sockets: 1
    tablet: 0
    unused0: drbd-venus-kvm:vm-111-disk-1
    virtio0: ceph_images:vm-111-disk-1,size=32G
    
    Thanks for posting your config spirit. I'll have a look into that.

    Is "ms_nocrc = true" safe?

    The Crucial M500 actually performs quite well:
    Code:
    dd if=randfile of=/dev/vg_ssd/test bs=4k count=100000 oflag=direct,dsync
    409600000 Bytes (410 MB) kopiert, 10,2335 s, 40,0 MB/s
    
     
  13. raoro

    raoro New Member

    Joined:
    Jan 22, 2014
    Messages:
    12
    Likes Received:
    1
    Re: Ceph - Bad performance with small IO

    Really some good insights in this thread: http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2014-August/042498.html

    But disabling cephx left me with an unusable cluster.
    Had to revert the settings to get back to a working state.
    Do I need to shutdown the cluster completely to get the settings working?

    I changed the thread title to "Ceph - Bad performance in guests",
    because obviously the ceph performance isn't so bad when tested with fio,
    but in the qemu-guests even with virtio it by a factor of 10?!
     
  14. tom

    tom Proxmox Staff Member
    Staff Member

    Joined:
    Aug 29, 2006
    Messages:
    11,964
    Likes Received:
    190
    Re: Ceph - Bad performance with small IO

     
  15. raoro

    raoro New Member

    Joined:
    Jan 22, 2014
    Messages:
    12
    Likes Received:
    1
    Re: Ceph - Bad performance with small IO

    Yeah well, that sure increases performance.
    The hosts are connected to UPSs, but how secure is it?
    Does this use the rbd-cache or RAM of the host?
    Probably the same but just to clarify.
     
  16. phildefer

    phildefer New Member

    Joined:
    Aug 8, 2009
    Messages:
    3
    Likes Received:
    0
    Re: Ceph - Bad performance with small IO

    if you want use rdb-cache you need to use "cache=writeback" or "cache=Writethrough"

    see http://ceph.com/docs/master/rbd/qemu-rbd/#qemu-cache-options

    yes this parametre use rdb-cache, and yes it use RAM of the host (because rdb-cache use RAM for the cache)
     
  17. raoro

    raoro New Member

    Joined:
    Jan 22, 2014
    Messages:
    12
    Likes Received:
    1
    Re: Ceph - Bad performance with small IO

    Thanks phildefer for the link and clarifying the rbd-cache question.

    Changing the cache-setting to writeback and tuning debug and sharding really helped a lot:

    ceph-bench-windows-guest-writeback.png

    Thanks to everyone! :)

    @spirit or maybe someone else can answer:
    To further enhance performance I would also like to disable cephx as the cluster runs in a safe network, but the last time I tried to disable it, it left me with an unusable cluster.

    Do I need to shutdown the ceph-cluster completely in order for this to work?
     
  18. spirit

    spirit Well-Known Member

    Joined:
    Apr 2, 2010
    Messages:
    2,995
    Likes Received:
    72
    Re: Ceph - Bad performance with small IO

    yes, and restart qemu guest too.
     
  19. cloudguy

    cloudguy New Member

    Joined:
    Jan 4, 2012
    Messages:
    24
    Likes Received:
    0
    Re: Ceph - Bad performance with small IO

    I was having a similar issue until recently running a PoC on Hammer. I suspect it's the SSDs not being able to keep up with direct I/O requirements Ceph has. I haven't been able to figure out how to disable direct I/O in ceph, but taking SSDs out of the equation (i.e. no SSD journals) improved quite a bit. Still not the performance level I'm after, but at least its in the right direction.

    With Crucial M500 or Intel 530 OSD journals I was averaging about 3-5 MB/s.

    With Journals on OSD themselves (i.e. sda1 (10-20G) = journal; sda2 (about 4TB) = data). I'm getting 150-200MB/s. 250MB/s during ceph benchmark. Using 12x Seagate 4TB SATA with 2x replica pool.

    My hardware = Supermicro 24-drive chassis with crappy AMD 2346 CPUs with 16GB RAM. Upgrading these to X5570 CPUs and much more RAM. I'm looking into Samsung 850 PRO or Intel S3700 SSDs for journals as well. Would be great to get some perspective what folks out there are using..

    Thanks.


     
  20. udo

    udo Well-Known Member
    Proxmox VE Subscriber

    Joined:
    Apr 22, 2009
    Messages:
    5,219
    Likes Received:
    81
    Re: Ceph - Bad performance with small IO

    Hi,
    I can recommend the Intel S3700 (I wouldn't use an samsung ssd for journaling).

    See also this thread on the ceph mailing list https://www.mail-archive.com/ceph-users@lists.ceph.com/msg18084.html

    Udo
     

Share This Page