RBD : which cache method to decrease iowait ?

Florent

Member
Apr 3, 2012
91
2
8
Hi all,

I run PVE & Ceph (Giant) as RBD storage.

I have high iowaits on my VMs.

All are configured with cache=writeback because I thought that was the best for performance.

Is it really true ? Which cache method do you recommend for RBD ?

Thank you.
Flo
 
Hi all,

I run PVE & Ceph (Giant) as RBD storage.

I have high iowaits on my VMs.

All are configured with cache=writeback because I thought that was the best for performance.

Is it really true ? Which cache method do you recommend for RBD ?

Thank you.
Flo
Hi,
there are some tuning posibilities (ceph mount-options and so on).

For linux guest the most important tuning is an higher read ahead cache (inside the VM):
Code:
echo 4096 > /sys/block/vda/queue/read_ahead_kb
Udo
 
Also, make sure you use virtio-scsi (not standard virtio). That made a huge difference for me.
 
Hi,
there are some tuning posibilities (ceph mount-options and so on).

For linux guest the most important tuning is an higher read ahead cache (inside the VM):
Code:
echo 4096 > /sys/block/vda/queue/read_ahead_kb
Udo

I already customized mount options in Ceph.
I will see about read_ahead_kb. Thank you.

I saw that all my OSD XFS are fragmented (>15%), so I will start with it. I found some information.

Also, make sure you use virtio-scsi (not standard virtio). That made a huge difference for me.

That's interesting :) Why virtio-scsi is better ? Is it the "SCSI" option in PVE GUI ?
 
That's interesting :) Why virtio-scsi is better ? Is it the "SCSI" option in PVE GUI ?

I'm not sure about the underlying implementation details of why it is faster, other than it is 'newer' ;). I primarily moved to virtio-scsi because virtio-blk did not work with 'discard' support which is crucial to enable if you want to reclaim space within Ceph when blocks are deleted in the VM (I run fstrim weekly in my crontab in the vms). Generally though, it appears virtio-blk is going to be deprecated because it is not easy to extend with new features. I just happened to do benchmarks before and after and noticed a significant difference in performance.

Regarding how to enable it, yes, you just choose SCSI as your disk type for your VM, and under the VM options before you start the VM, you need to set your SCSI controller to VirtIO (I think it still defaults to LSI for some reason).
 
I saw that all my OSD XFS are fragmented (>15%), so I will start with it. I found some information.
Hi,
the second parameter is to avoid further fragmentation:
Code:
osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
filestore_xfs_extsize = true
For me ext3 performas better than xfs...

Udo
 
Hi,
the second parameter is to avoid further fragmentation:
Code:
osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
filestore_xfs_extsize = true
For me ext3 performas better than xfs...

Udo

I know I read it. Does XFS extsize parameter will "defragment" disk itself or do I need to run xfs_fsr ? I'm actually running xfs_fsr on some OSD, after setting parameter to true, but it takes very long time...
 
Also, make sure you use virtio-scsi (not standard virtio). That made a huge difference for me.
Hi,
with fio inside the VM I can't see real differences between virtio + scsi-virtio.

I had tested small and big blocksize (droped all caches after first run)
Code:
fio --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=80 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest --size=128m

fio --numjobs=1 --readwrite=read --blocksize=4M --size=8G --ioengine=libaio --direct=1 --name=fiojob
On the VM I had two disks - both on the same ceph storage, one as virtio and one as scsi-virtio. The results are similiar.


Udo
 
Using your test on a iscsi/zfs storage + added extra test using IOMeter access pattern shows that virtio-blk is marginally performing better that virtio-scsi except for the IOMeter access pattern test where it is the opposite.

Code:
virtio-scsi

root@nas-test:/media/vdc# fio --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=80 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest --size=128m
4ktest: (groupid=0, jobs=16): err= 0: pid=3735
  read : io=1639.6MB, bw=45452KB/s, iops=11363 , runt= 36938msec
    slat (usec): min=3 , max=469256 , avg=510.82, stdev=4027.10
    clat (usec): min=125 , max=550760 , avg=17393.24, stdev=18784.41
     lat (usec): min=500 , max=550784 , avg=17905.02, stdev=19217.27
    clat percentiles (msec):
     |  1.00th=[    6],  5.00th=[    7], 10.00th=[    7], 20.00th=[    8],
     | 30.00th=[    9], 40.00th=[   11], 50.00th=[   13], 60.00th=[   15],
     | 70.00th=[   19], 80.00th=[   23], 90.00th=[   30], 95.00th=[   43],
     | 99.00th=[   90], 99.50th=[  115], 99.90th=[  202], 99.95th=[  281],
     | 99.99th=[  490]
    bw (KB/s)  : min=   45, max= 4691, per=6.25%, avg=2841.86, stdev=737.01
  write: io=419204KB, bw=11349KB/s, iops=2837 , runt= 36938msec
    slat (usec): min=4 , max=276688 , avg=528.55, stdev=3867.89
    clat (usec): min=173 , max=549929 , avg=17468.60, stdev=18477.62
     lat (usec): min=613 , max=549938 , avg=17998.21, stdev=18872.98
    clat percentiles (msec):
     |  1.00th=[    6],  5.00th=[    7], 10.00th=[    7], 20.00th=[    8],
     | 30.00th=[    9], 40.00th=[   11], 50.00th=[   13], 60.00th=[   15],
     | 70.00th=[   19], 80.00th=[   23], 90.00th=[   30], 95.00th=[   44],
     | 99.00th=[   91], 99.50th=[  119], 99.90th=[  198], 99.95th=[  262],
     | 99.99th=[  478]
    bw (KB/s)  : min=    7, max= 1254, per=6.26%, avg=710.29, stdev=192.73
    lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.12%
    lat (msec) : 2=0.04%, 4=0.39%, 10=35.65%, 20=38.31%, 50=21.74%
    lat (msec) : 100=2.98%, 250=0.70%, 500=0.05%, 750=0.01%
  cpu          : usr=0.58%, sys=1.68%, ctx=158576, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=419727/w=0/d=104801, short=r=0/w=0/d=0


Run status group 0 (all jobs):
   READ: io=1639.6MB, aggrb=45452KB/s, minb=45452KB/s, maxb=45452KB/s, mint=36938msec, maxt=36938msec
  WRITE: io=419204KB, aggrb=11348KB/s, minb=11348KB/s, maxb=11348KB/s, mint=36938msec, maxt=36938msec


Disk stats (read/write):
  sda: ios=416830/104588, merge=45/26, ticks=4695552/1200496, in_queue=5900452, util=99.84%

root@nas-test:/media/vdc# fio --numjobs=1 --readwrite=read --blocksize=4M --size=8G --ioengine=libaio --direct=1 --name=fiojob
fiojob: (g=0): rw=read, bs=4M-4M/4M-4M, ioengine=libaio, iodepth=1
2.0.8
Starting 1 process
fiojob: Laying out IO file(s) (1 file(s) / 8192MB)
Jobs: 1 (f=1): [R] [100.0% done] [371.7M/0K /s] [92 /0  iops] [eta 00m:00s]
fiojob: (groupid=0, jobs=1): err= 0: pid=4009
  read : io=0 B, bw=380764KB/s, iops=92 , runt= 22031msec
    slat (usec): min=359 , max=2595 , avg=404.96, stdev=72.44
    clat (msec): min=7 , max=72 , avg=10.34, stdev= 3.24
     lat (msec): min=7 , max=72 , avg=10.75, stdev= 3.24
    clat percentiles (usec):
     |  1.00th=[ 7904],  5.00th=[ 8384], 10.00th=[ 8640], 20.00th=[ 9152],
     | 30.00th=[ 9536], 40.00th=[ 9664], 50.00th=[ 9920], 60.00th=[10048],
     | 70.00th=[10176], 80.00th=[10432], 90.00th=[11584], 95.00th=[14784],
     | 99.00th=[18048], 99.50th=[26752], 99.90th=[54528], 99.95th=[56576],
     | 99.99th=[72192]
    bw (KB/s)  : min=288563, max=416958, per=100.00%, avg=380912.14, stdev=29552.10
    lat (msec) : 10=59.18%, 20=40.14%, 50=0.44%, 100=0.24%
  cpu          : usr=0.07%, sys=4.07%, ctx=2074, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=2048/w=0/d=0, short=r=0/w=0/d=0


Run status group 0 (all jobs):
   READ: io=8192.0MB, aggrb=380763KB/s, minb=380763KB/s, maxb=380763KB/s, mint=22031msec, maxt=22031msec


Disk stats (read/write):
  sda: ios=18410/2, merge=0/1, ticks=173160/24, in_queue=173212, util=98.53%


IOMeter access pattern
root@nas-test:/media/vdc# fio --name=iometer --bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10 --rw=randrw --rwmixread=80 --direct=1 --size=4g --ioengine=libaio --iodepth=8
iometer: (g=0): rw=randrw, bs=512-64K/512-64K, ioengine=libaio, iodepth=8
2.0.8
Starting 1 process
iometer: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [m] [100.0% done] [12112K/3220K /s] [10.6K/2725  iops] [eta 00m:00s]
iometer: (groupid=0, jobs=1): err= 0: pid=4120
  read : io=3283.8MB, bw=27633KB/s, iops=9055 , runt=121685msec
    slat (usec): min=5 , max=10547 , avg=13.30, stdev=41.07
    clat (usec): min=63 , max=268592 , avg=680.19, stdev=1559.38
     lat (usec): min=206 , max=268603 , avg=694.36, stdev=1560.29
    clat percentiles (usec):
     |  1.00th=[  322],  5.00th=[  366], 10.00th=[  394], 20.00th=[  430],
     | 30.00th=[  458], 40.00th=[  482], 50.00th=[  506], 60.00th=[  532],
     | 70.00th=[  564], 80.00th=[  620], 90.00th=[  764], 95.00th=[ 1048],
     | 99.00th=[ 3952], 99.50th=[ 8160], 99.90th=[20096], 99.95th=[28288],
     | 99.99th=[51456]
    bw (KB/s)  : min= 4964, max=101421, per=100.00%, avg=27686.73, stdev=19148.19
  write: io=831733KB, bw=6835.2KB/s, iops=2255 , runt=121685msec
    slat (usec): min=5 , max=6910 , avg=15.30, stdev=38.84
    clat (usec): min=105 , max=186141 , avg=722.17, stdev=1709.25
     lat (usec): min=250 , max=186157 , avg=738.37, stdev=1710.98
    clat percentiles (usec):
     |  1.00th=[  346],  5.00th=[  390], 10.00th=[  418], 20.00th=[  458],
     | 30.00th=[  486], 40.00th=[  516], 50.00th=[  540], 60.00th=[  564],
     | 70.00th=[  604], 80.00th=[  660], 90.00th=[  820], 95.00th=[ 1128],
     | 99.00th=[ 4128], 99.50th=[ 8512], 99.90th=[20608], 99.95th=[30080],
     | 99.99th=[59136]
    bw (KB/s)  : min= 1258, max=23397, per=100.00%, avg=6848.21, stdev=4680.35
    lat (usec) : 100=0.01%, 250=0.02%, 500=44.70%, 750=44.34%, 1000=5.40%
    lat (msec) : 2=3.20%, 4=1.35%, 10=0.63%, 20=0.25%, 50=0.10%
    lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%
  cpu          : usr=9.12%, sys=28.87%, ctx=1036092, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=1101887/w=0/d=274445, short=r=0/w=0/d=0


Run status group 0 (all jobs):
   READ: io=3283.8MB, aggrb=27633KB/s, minb=27633KB/s, maxb=27633KB/s, mint=121685msec, maxt=121685msec
  WRITE: io=831733KB, aggrb=6835KB/s, minb=6835KB/s, maxb=6835KB/s, mint=121685msec, maxt=121685msec


Disk stats (read/write):
  sda: ios=1101681/274439, merge=0/24, ticks=724236/193900, in_queue=917680, util=99.73%


Disk stats (read/write):
  vdb: ios=1103021/275594, merge=0/24, ticks=723416/200924, in_queue=923700, util=99.34%

Code:
virtio-blk

root@nas-test:/media/vdb# fio --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=80 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest --size=128m
4ktest: (groupid=0, jobs=16): err= 0: pid=4016
  read : io=1638.9MB, bw=47303KB/s, iops=11825 , runt= 35476msec
    slat (usec): min=3 , max=410409 , avg=511.31, stdev=4184.60
    clat (usec): min=136 , max=2022.3K, avg=16227.41, stdev=21338.45
     lat (usec): min=328 , max=2022.4K, avg=16739.69, stdev=21800.22
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    5], 10.00th=[    7], 20.00th=[    8],
     | 30.00th=[    9], 40.00th=[   10], 50.00th=[   11], 60.00th=[   13],
     | 70.00th=[   16], 80.00th=[   21], 90.00th=[   30], 95.00th=[   46],
     | 99.00th=[   91], 99.50th=[  122], 99.90th=[  306], 99.95th=[  363],
     | 99.99th=[  416]
    bw (KB/s)  : min=  503, max= 7032, per=6.38%, avg=3016.51, stdev=965.56
  write: io=419980KB, bw=11838KB/s, iops=2959 , runt= 35476msec
    slat (usec): min=4 , max=293149 , avg=531.90, stdev=3957.24
    clat (usec): min=280 , max=1619.2K, avg=17673.70, stdev=21646.43
     lat (usec): min=562 , max=1619.2K, avg=18206.74, stdev=22074.91
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    7], 10.00th=[    7], 20.00th=[    8],
     | 30.00th=[    9], 40.00th=[   11], 50.00th=[   12], 60.00th=[   15],
     | 70.00th=[   18], 80.00th=[   22], 90.00th=[   32], 95.00th=[   49],
     | 99.00th=[   94], 99.50th=[  127], 99.90th=[  302], 99.95th=[  367],
     | 99.99th=[  453]
    bw (KB/s)  : min=  108, max= 1896, per=6.37%, avg=754.59, stdev=251.86
    lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.23%, 4=2.31%, 10=42.73%, 20=33.90%, 50=16.47%
    lat (msec) : 100=3.50%, 250=0.67%, 500=0.15%, 750=0.01%, 1000=0.01%
    lat (msec) : 2000=0.01%, >=2000=0.01%
  cpu          : usr=0.53%, sys=1.93%, ctx=170116, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=419533/w=0/d=104995, short=r=0/w=0/d=0


Run status group 0 (all jobs):
   READ: io=1638.9MB, aggrb=47303KB/s, minb=47303KB/s, maxb=47303KB/s, mint=35476msec, maxt=35476msec
  WRITE: io=419980KB, aggrb=11838KB/s, minb=11838KB/s, maxb=11838KB/s, mint=35476msec, maxt=35476msec


Disk stats (read/write):
  vdb: ios=419098/104892, merge=43/16, ticks=4307728/1182740, in_queue=5493564, util=99.75%


Run status group 0 (all jobs):
   READ: io=8192.0MB, aggrb=380763KB/s, minb=380763KB/s, maxb=380763KB/s, mint=22031msec, maxt=22031msec


Disk stats (read/write):
  sda: ios=18410/2, merge=0/1, ticks=173160/24, in_queue=173212, util=98.53%


root@nas-test:/media/vdb# fio --numjobs=1 --readwrite=read --blocksize=4M --size=8G --ioengine=libaio --direct=1 --name=fiojob
fiojob: (g=0): rw=read, bs=4M-4M/4M-4M, ioengine=libaio, iodepth=1
2.0.8
Starting 1 process
fiojob: Laying out IO file(s) (1 file(s) / 8192MB)
Jobs: 1 (f=1): [R] [100.0% done] [506.5M/0K /s] [126 /0  iops] [eta 00m:00s]
fiojob: (groupid=0, jobs=1): err= 0: pid=4044
  read : io=0 B, bw=472066KB/s, iops=115 , runt= 17770msec
    slat (usec): min=324 , max=2712 , avg=363.30, stdev=78.88
    clat (msec): min=5 , max=127 , avg= 8.30, stdev= 4.97
     lat (msec): min=6 , max=127 , avg= 8.67, stdev= 4.98
    clat percentiles (msec):
     |  1.00th=[    6],  5.00th=[    7], 10.00th=[    7], 20.00th=[    7],
     | 30.00th=[    8], 40.00th=[    8], 50.00th=[    8], 60.00th=[    8],
     | 70.00th=[    8], 80.00th=[    9], 90.00th=[   10], 95.00th=[   13],
     | 99.00th=[   25], 99.50th=[   44], 99.90th=[   69], 99.95th=[   71],
     | 99.99th=[  128]
    bw (KB/s)  : min=275770, max=552634, per=100.00%, avg=472386.83, stdev=62756.04
    lat (msec) : 10=90.62%, 20=8.01%, 50=0.98%, 100=0.34%, 250=0.05%
  cpu          : usr=0.25%, sys=4.32%, ctx=2072, majf=0, minf=0
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=2048/w=0/d=0, short=r=0/w=0/d=0


Run status group 0 (all jobs):
   READ: io=8192.0MB, aggrb=472065KB/s, minb=472065KB/s, maxb=472065KB/s, mint=17770msec, maxt=17770msec


Disk stats (read/write):
  vdb: ios=18306/2, merge=0/1, ticks=140544/12, in_queue=140548, util=97.97%


IOMeter access pattern
root@nas-test:/media/vdb# fio --name=iometer --bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10 --rw=randrw --rwmixread=80 --direct=1 --size=4g --ioengine=libaio --iodepth=8iometer: (g=0): rw=randrw, bs=512-64K/512-64K, ioengine=libaio, iodepth=8
2.0.8
iometer: (groupid=0, jobs=1): err= 0: pid=4111
  read : io=3281.6MB, bw=27257KB/s, iops=8954 , runt=123264msec
    slat (usec): min=4 , max=40028 , avg=10.92, stdev=44.05
    clat (usec): min=67 , max=241405 , avg=685.93, stdev=1735.79
     lat (usec): min=221 , max=241419 , avg=697.71, stdev=1736.46
    clat percentiles (usec):
     |  1.00th=[  334],  5.00th=[  378], 10.00th=[  402], 20.00th=[  438],
     | 30.00th=[  466], 40.00th=[  490], 50.00th=[  516], 60.00th=[  540],
     | 70.00th=[  572], 80.00th=[  628], 90.00th=[  764], 95.00th=[ 1020],
     | 99.00th=[ 3792], 99.50th=[ 7200], 99.90th=[21376], 99.95th=[34560],
     | 99.99th=[60160]
    bw (KB/s)  : min= 5670, max=96638, per=100.00%, avg=27318.34, stdev=18635.17
  write: io=834503KB, bw=6770.5KB/s, iops=2236 , runt=123264msec
    slat (usec): min=5 , max=61178 , avg=13.92, stdev=199.87
    clat (usec): min=79 , max=85533 , avg=748.41, stdev=1524.40
     lat (usec): min=253 , max=109802 , avg=763.21, stdev=1555.85
    clat percentiles (usec):
     |  1.00th=[  366],  5.00th=[  414], 10.00th=[  446], 20.00th=[  482],
     | 30.00th=[  516], 40.00th=[  540], 50.00th=[  564], 60.00th=[  596],
     | 70.00th=[  636], 80.00th=[  700], 90.00th=[  860], 95.00th=[ 1160],
     | 99.00th=[ 4048], 99.50th=[ 7712], 99.90th=[21376], 99.95th=[35584],
     | 99.99th=[54528]
    bw (KB/s)  : min= 1341, max=22752, per=100.00%, avg=6785.46, stdev=4586.92
    lat (usec) : 100=0.01%, 250=0.01%, 500=40.58%, 750=47.90%, 1000=6.01%
    lat (msec) : 2=3.21%, 4=1.34%, 10=0.62%, 20=0.20%, 50=0.10%
    lat (msec) : 100=0.02%, 250=0.01%
  cpu          : usr=8.71%, sys=25.27%, ctx=1008481, majf=0, minf=0
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=1103725/w=0/d=275715, short=r=0/w=0/d=0


Run status group 0 (all jobs):
   READ: io=3281.6MB, aggrb=27256KB/s, minb=27256KB/s, maxb=27256KB/s, mint=123264msec, maxt=123264msec
  WRITE: io=834502KB, aggrb=6770KB/s, minb=6770KB/s, maxb=6770KB/s, mint=123264msec, maxt=123264msec


Disk stats (read/write):
  vdb: ios=1103021/275594, merge=0/24, ticks=723416/200924, in_queue=923700, util=99.34%
 
Last edited:
In fact my problem of high IO wait time does not seem related to RBD or disk.

I really don't understand why system is showing high IO, because it does not write a lot on disk :

Code:
# iostat
Linux 3.2.0-4-amd64 (zabbix01-proxy1)     04/10/2015     _x86_64_    (1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.39    0.00    0.52   76.68    0.02   22.40

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
vda              20.47       194.20       122.28      90346      56889
dm-0             22.66       185.93       122.29      86501      56892
dm-1              0.27         1.09         0.00        508          0

I also debug IOs with systemtap (https://sourceware.org/systemtap/SystemTap_Beginners_Guide/iotimesect.html) and I don't have high "times" !

If I use dd to write on disk, I have more than 15 MB/s, which is enough for me. And read at 20 MB/s.

What could cause high IOwait but not related to disk ?

(I rebooted host node to get last 3.10 kernel, nothing changed)

virtio-blk or virtio-scsi does not change anything
 
In fact my problem of high IO wait time does not seem related to RBD or disk.

I really don't understand why system is showing high IO, because it does not write a lot on disk :

Code:
# iostat
Linux 3.2.0-4-amd64 (zabbix01-proxy1)     04/10/2015     _x86_64_    (1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.39    0.00    0.52   76.68    0.02   22.40

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
vda              20.47       194.20       122.28      90346      56889
dm-0             22.66       185.93       122.29      86501      56892
dm-1              0.27         1.09         0.00        508          0
Hi,
the problem here is, you don't see the rbd-IO on the host!

Because qemu-kvm use directly ceph.

Udo
 
the iostat I posted is from guest.

host has no iowait, has free cpu, free mem, Ceph is not overloaded (client io 284 kB/s rd, 9584 kB/s wr, 721 op/s)
 
My problem occurs only on guest running MariaDB.

When I start guest with cache=writeback & mount its ext4 with nobarrier = no iowait
When I start guest with cache=none & mount its ext4 with nobarrier = high iowait.

Can we deduce from this that Ceph is the problem ?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!