Hi,Hi all,
I run PVE & Ceph (Giant) as RBD storage.
I have high iowaits on my VMs.
All are configured with cache=writeback because I thought that was the best for performance.
Is it really true ? Which cache method do you recommend for RBD ?
Thank you.
Flo
echo 4096 > /sys/block/vda/queue/read_ahead_kb
Hi,
there are some tuning posibilities (ceph mount-options and so on).
For linux guest the most important tuning is an higher read ahead cache (inside the VM):
UdoCode:echo 4096 > /sys/block/vda/queue/read_ahead_kb
Also, make sure you use virtio-scsi (not standard virtio). That made a huge difference for me.
That's interesting Why virtio-scsi is better ? Is it the "SCSI" option in PVE GUI ?
Hi,I saw that all my OSD XFS are fragmented (>15%), so I will start with it. I found some information.
osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M"
filestore_xfs_extsize = true
Hi,
the second parameter is to avoid further fragmentation:
For me ext3 performas better than xfs...Code:osd_mount_options_xfs = "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" filestore_xfs_extsize = true
Udo
Hi,Also, make sure you use virtio-scsi (not standard virtio). That made a huge difference for me.
fio --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=80 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest --size=128m
fio --numjobs=1 --readwrite=read --blocksize=4M --size=8G --ioengine=libaio --direct=1 --name=fiojob
Hi,Does XFS extsize parameter will "defragment" disk itself or do I need to run xfs_fsr ?
virtio-scsi
root@nas-test:/media/vdc# fio --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=80 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest --size=128m
4ktest: (groupid=0, jobs=16): err= 0: pid=3735
read : io=1639.6MB, bw=45452KB/s, iops=11363 , runt= 36938msec
slat (usec): min=3 , max=469256 , avg=510.82, stdev=4027.10
clat (usec): min=125 , max=550760 , avg=17393.24, stdev=18784.41
lat (usec): min=500 , max=550784 , avg=17905.02, stdev=19217.27
clat percentiles (msec):
| 1.00th=[ 6], 5.00th=[ 7], 10.00th=[ 7], 20.00th=[ 8],
| 30.00th=[ 9], 40.00th=[ 11], 50.00th=[ 13], 60.00th=[ 15],
| 70.00th=[ 19], 80.00th=[ 23], 90.00th=[ 30], 95.00th=[ 43],
| 99.00th=[ 90], 99.50th=[ 115], 99.90th=[ 202], 99.95th=[ 281],
| 99.99th=[ 490]
bw (KB/s) : min= 45, max= 4691, per=6.25%, avg=2841.86, stdev=737.01
write: io=419204KB, bw=11349KB/s, iops=2837 , runt= 36938msec
slat (usec): min=4 , max=276688 , avg=528.55, stdev=3867.89
clat (usec): min=173 , max=549929 , avg=17468.60, stdev=18477.62
lat (usec): min=613 , max=549938 , avg=17998.21, stdev=18872.98
clat percentiles (msec):
| 1.00th=[ 6], 5.00th=[ 7], 10.00th=[ 7], 20.00th=[ 8],
| 30.00th=[ 9], 40.00th=[ 11], 50.00th=[ 13], 60.00th=[ 15],
| 70.00th=[ 19], 80.00th=[ 23], 90.00th=[ 30], 95.00th=[ 44],
| 99.00th=[ 91], 99.50th=[ 119], 99.90th=[ 198], 99.95th=[ 262],
| 99.99th=[ 478]
bw (KB/s) : min= 7, max= 1254, per=6.26%, avg=710.29, stdev=192.73
lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.12%
lat (msec) : 2=0.04%, 4=0.39%, 10=35.65%, 20=38.31%, 50=21.74%
lat (msec) : 100=2.98%, 250=0.70%, 500=0.05%, 750=0.01%
cpu : usr=0.58%, sys=1.68%, ctx=158576, majf=0, minf=0
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=419727/w=0/d=104801, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=1639.6MB, aggrb=45452KB/s, minb=45452KB/s, maxb=45452KB/s, mint=36938msec, maxt=36938msec
WRITE: io=419204KB, aggrb=11348KB/s, minb=11348KB/s, maxb=11348KB/s, mint=36938msec, maxt=36938msec
Disk stats (read/write):
sda: ios=416830/104588, merge=45/26, ticks=4695552/1200496, in_queue=5900452, util=99.84%
root@nas-test:/media/vdc# fio --numjobs=1 --readwrite=read --blocksize=4M --size=8G --ioengine=libaio --direct=1 --name=fiojob
fiojob: (g=0): rw=read, bs=4M-4M/4M-4M, ioengine=libaio, iodepth=1
2.0.8
Starting 1 process
fiojob: Laying out IO file(s) (1 file(s) / 8192MB)
Jobs: 1 (f=1): [R] [100.0% done] [371.7M/0K /s] [92 /0 iops] [eta 00m:00s]
fiojob: (groupid=0, jobs=1): err= 0: pid=4009
read : io=0 B, bw=380764KB/s, iops=92 , runt= 22031msec
slat (usec): min=359 , max=2595 , avg=404.96, stdev=72.44
clat (msec): min=7 , max=72 , avg=10.34, stdev= 3.24
lat (msec): min=7 , max=72 , avg=10.75, stdev= 3.24
clat percentiles (usec):
| 1.00th=[ 7904], 5.00th=[ 8384], 10.00th=[ 8640], 20.00th=[ 9152],
| 30.00th=[ 9536], 40.00th=[ 9664], 50.00th=[ 9920], 60.00th=[10048],
| 70.00th=[10176], 80.00th=[10432], 90.00th=[11584], 95.00th=[14784],
| 99.00th=[18048], 99.50th=[26752], 99.90th=[54528], 99.95th=[56576],
| 99.99th=[72192]
bw (KB/s) : min=288563, max=416958, per=100.00%, avg=380912.14, stdev=29552.10
lat (msec) : 10=59.18%, 20=40.14%, 50=0.44%, 100=0.24%
cpu : usr=0.07%, sys=4.07%, ctx=2074, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=2048/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=8192.0MB, aggrb=380763KB/s, minb=380763KB/s, maxb=380763KB/s, mint=22031msec, maxt=22031msec
Disk stats (read/write):
sda: ios=18410/2, merge=0/1, ticks=173160/24, in_queue=173212, util=98.53%
IOMeter access pattern
root@nas-test:/media/vdc# fio --name=iometer --bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10 --rw=randrw --rwmixread=80 --direct=1 --size=4g --ioengine=libaio --iodepth=8
iometer: (g=0): rw=randrw, bs=512-64K/512-64K, ioengine=libaio, iodepth=8
2.0.8
Starting 1 process
iometer: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [m] [100.0% done] [12112K/3220K /s] [10.6K/2725 iops] [eta 00m:00s]
iometer: (groupid=0, jobs=1): err= 0: pid=4120
read : io=3283.8MB, bw=27633KB/s, iops=9055 , runt=121685msec
slat (usec): min=5 , max=10547 , avg=13.30, stdev=41.07
clat (usec): min=63 , max=268592 , avg=680.19, stdev=1559.38
lat (usec): min=206 , max=268603 , avg=694.36, stdev=1560.29
clat percentiles (usec):
| 1.00th=[ 322], 5.00th=[ 366], 10.00th=[ 394], 20.00th=[ 430],
| 30.00th=[ 458], 40.00th=[ 482], 50.00th=[ 506], 60.00th=[ 532],
| 70.00th=[ 564], 80.00th=[ 620], 90.00th=[ 764], 95.00th=[ 1048],
| 99.00th=[ 3952], 99.50th=[ 8160], 99.90th=[20096], 99.95th=[28288],
| 99.99th=[51456]
bw (KB/s) : min= 4964, max=101421, per=100.00%, avg=27686.73, stdev=19148.19
write: io=831733KB, bw=6835.2KB/s, iops=2255 , runt=121685msec
slat (usec): min=5 , max=6910 , avg=15.30, stdev=38.84
clat (usec): min=105 , max=186141 , avg=722.17, stdev=1709.25
lat (usec): min=250 , max=186157 , avg=738.37, stdev=1710.98
clat percentiles (usec):
| 1.00th=[ 346], 5.00th=[ 390], 10.00th=[ 418], 20.00th=[ 458],
| 30.00th=[ 486], 40.00th=[ 516], 50.00th=[ 540], 60.00th=[ 564],
| 70.00th=[ 604], 80.00th=[ 660], 90.00th=[ 820], 95.00th=[ 1128],
| 99.00th=[ 4128], 99.50th=[ 8512], 99.90th=[20608], 99.95th=[30080],
| 99.99th=[59136]
bw (KB/s) : min= 1258, max=23397, per=100.00%, avg=6848.21, stdev=4680.35
lat (usec) : 100=0.01%, 250=0.02%, 500=44.70%, 750=44.34%, 1000=5.40%
lat (msec) : 2=3.20%, 4=1.35%, 10=0.63%, 20=0.25%, 50=0.10%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%
cpu : usr=9.12%, sys=28.87%, ctx=1036092, majf=0, minf=0
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1101887/w=0/d=274445, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=3283.8MB, aggrb=27633KB/s, minb=27633KB/s, maxb=27633KB/s, mint=121685msec, maxt=121685msec
WRITE: io=831733KB, aggrb=6835KB/s, minb=6835KB/s, maxb=6835KB/s, mint=121685msec, maxt=121685msec
Disk stats (read/write):
sda: ios=1101681/274439, merge=0/24, ticks=724236/193900, in_queue=917680, util=99.73%
Disk stats (read/write):
vdb: ios=1103021/275594, merge=0/24, ticks=723416/200924, in_queue=923700, util=99.34%
virtio-blk
root@nas-test:/media/vdb# fio --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=80 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest --size=128m
4ktest: (groupid=0, jobs=16): err= 0: pid=4016
read : io=1638.9MB, bw=47303KB/s, iops=11825 , runt= 35476msec
slat (usec): min=3 , max=410409 , avg=511.31, stdev=4184.60
clat (usec): min=136 , max=2022.3K, avg=16227.41, stdev=21338.45
lat (usec): min=328 , max=2022.4K, avg=16739.69, stdev=21800.22
clat percentiles (msec):
| 1.00th=[ 3], 5.00th=[ 5], 10.00th=[ 7], 20.00th=[ 8],
| 30.00th=[ 9], 40.00th=[ 10], 50.00th=[ 11], 60.00th=[ 13],
| 70.00th=[ 16], 80.00th=[ 21], 90.00th=[ 30], 95.00th=[ 46],
| 99.00th=[ 91], 99.50th=[ 122], 99.90th=[ 306], 99.95th=[ 363],
| 99.99th=[ 416]
bw (KB/s) : min= 503, max= 7032, per=6.38%, avg=3016.51, stdev=965.56
write: io=419980KB, bw=11838KB/s, iops=2959 , runt= 35476msec
slat (usec): min=4 , max=293149 , avg=531.90, stdev=3957.24
clat (usec): min=280 , max=1619.2K, avg=17673.70, stdev=21646.43
lat (usec): min=562 , max=1619.2K, avg=18206.74, stdev=22074.91
clat percentiles (msec):
| 1.00th=[ 4], 5.00th=[ 7], 10.00th=[ 7], 20.00th=[ 8],
| 30.00th=[ 9], 40.00th=[ 11], 50.00th=[ 12], 60.00th=[ 15],
| 70.00th=[ 18], 80.00th=[ 22], 90.00th=[ 32], 95.00th=[ 49],
| 99.00th=[ 94], 99.50th=[ 127], 99.90th=[ 302], 99.95th=[ 367],
| 99.99th=[ 453]
bw (KB/s) : min= 108, max= 1896, per=6.37%, avg=754.59, stdev=251.86
lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.23%, 4=2.31%, 10=42.73%, 20=33.90%, 50=16.47%
lat (msec) : 100=3.50%, 250=0.67%, 500=0.15%, 750=0.01%, 1000=0.01%
lat (msec) : 2000=0.01%, >=2000=0.01%
cpu : usr=0.53%, sys=1.93%, ctx=170116, majf=0, minf=0
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=419533/w=0/d=104995, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=1638.9MB, aggrb=47303KB/s, minb=47303KB/s, maxb=47303KB/s, mint=35476msec, maxt=35476msec
WRITE: io=419980KB, aggrb=11838KB/s, minb=11838KB/s, maxb=11838KB/s, mint=35476msec, maxt=35476msec
Disk stats (read/write):
vdb: ios=419098/104892, merge=43/16, ticks=4307728/1182740, in_queue=5493564, util=99.75%
Run status group 0 (all jobs):
READ: io=8192.0MB, aggrb=380763KB/s, minb=380763KB/s, maxb=380763KB/s, mint=22031msec, maxt=22031msec
Disk stats (read/write):
sda: ios=18410/2, merge=0/1, ticks=173160/24, in_queue=173212, util=98.53%
root@nas-test:/media/vdb# fio --numjobs=1 --readwrite=read --blocksize=4M --size=8G --ioengine=libaio --direct=1 --name=fiojob
fiojob: (g=0): rw=read, bs=4M-4M/4M-4M, ioengine=libaio, iodepth=1
2.0.8
Starting 1 process
fiojob: Laying out IO file(s) (1 file(s) / 8192MB)
Jobs: 1 (f=1): [R] [100.0% done] [506.5M/0K /s] [126 /0 iops] [eta 00m:00s]
fiojob: (groupid=0, jobs=1): err= 0: pid=4044
read : io=0 B, bw=472066KB/s, iops=115 , runt= 17770msec
slat (usec): min=324 , max=2712 , avg=363.30, stdev=78.88
clat (msec): min=5 , max=127 , avg= 8.30, stdev= 4.97
lat (msec): min=6 , max=127 , avg= 8.67, stdev= 4.98
clat percentiles (msec):
| 1.00th=[ 6], 5.00th=[ 7], 10.00th=[ 7], 20.00th=[ 7],
| 30.00th=[ 8], 40.00th=[ 8], 50.00th=[ 8], 60.00th=[ 8],
| 70.00th=[ 8], 80.00th=[ 9], 90.00th=[ 10], 95.00th=[ 13],
| 99.00th=[ 25], 99.50th=[ 44], 99.90th=[ 69], 99.95th=[ 71],
| 99.99th=[ 128]
bw (KB/s) : min=275770, max=552634, per=100.00%, avg=472386.83, stdev=62756.04
lat (msec) : 10=90.62%, 20=8.01%, 50=0.98%, 100=0.34%, 250=0.05%
cpu : usr=0.25%, sys=4.32%, ctx=2072, majf=0, minf=0
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=2048/w=0/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=8192.0MB, aggrb=472065KB/s, minb=472065KB/s, maxb=472065KB/s, mint=17770msec, maxt=17770msec
Disk stats (read/write):
vdb: ios=18306/2, merge=0/1, ticks=140544/12, in_queue=140548, util=97.97%
IOMeter access pattern
root@nas-test:/media/vdb# fio --name=iometer --bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10 --rw=randrw --rwmixread=80 --direct=1 --size=4g --ioengine=libaio --iodepth=8iometer: (g=0): rw=randrw, bs=512-64K/512-64K, ioengine=libaio, iodepth=8
2.0.8
iometer: (groupid=0, jobs=1): err= 0: pid=4111
read : io=3281.6MB, bw=27257KB/s, iops=8954 , runt=123264msec
slat (usec): min=4 , max=40028 , avg=10.92, stdev=44.05
clat (usec): min=67 , max=241405 , avg=685.93, stdev=1735.79
lat (usec): min=221 , max=241419 , avg=697.71, stdev=1736.46
clat percentiles (usec):
| 1.00th=[ 334], 5.00th=[ 378], 10.00th=[ 402], 20.00th=[ 438],
| 30.00th=[ 466], 40.00th=[ 490], 50.00th=[ 516], 60.00th=[ 540],
| 70.00th=[ 572], 80.00th=[ 628], 90.00th=[ 764], 95.00th=[ 1020],
| 99.00th=[ 3792], 99.50th=[ 7200], 99.90th=[21376], 99.95th=[34560],
| 99.99th=[60160]
bw (KB/s) : min= 5670, max=96638, per=100.00%, avg=27318.34, stdev=18635.17
write: io=834503KB, bw=6770.5KB/s, iops=2236 , runt=123264msec
slat (usec): min=5 , max=61178 , avg=13.92, stdev=199.87
clat (usec): min=79 , max=85533 , avg=748.41, stdev=1524.40
lat (usec): min=253 , max=109802 , avg=763.21, stdev=1555.85
clat percentiles (usec):
| 1.00th=[ 366], 5.00th=[ 414], 10.00th=[ 446], 20.00th=[ 482],
| 30.00th=[ 516], 40.00th=[ 540], 50.00th=[ 564], 60.00th=[ 596],
| 70.00th=[ 636], 80.00th=[ 700], 90.00th=[ 860], 95.00th=[ 1160],
| 99.00th=[ 4048], 99.50th=[ 7712], 99.90th=[21376], 99.95th=[35584],
| 99.99th=[54528]
bw (KB/s) : min= 1341, max=22752, per=100.00%, avg=6785.46, stdev=4586.92
lat (usec) : 100=0.01%, 250=0.01%, 500=40.58%, 750=47.90%, 1000=6.01%
lat (msec) : 2=3.21%, 4=1.34%, 10=0.62%, 20=0.20%, 50=0.10%
lat (msec) : 100=0.02%, 250=0.01%
cpu : usr=8.71%, sys=25.27%, ctx=1008481, majf=0, minf=0
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=1103725/w=0/d=275715, short=r=0/w=0/d=0
Run status group 0 (all jobs):
READ: io=3281.6MB, aggrb=27256KB/s, minb=27256KB/s, maxb=27256KB/s, mint=123264msec, maxt=123264msec
WRITE: io=834502KB, aggrb=6770KB/s, minb=6770KB/s, maxb=6770KB/s, mint=123264msec, maxt=123264msec
Disk stats (read/write):
vdb: ios=1103021/275594, merge=0/24, ticks=723416/200924, in_queue=923700, util=99.34%
# iostat
Linux 3.2.0-4-amd64 (zabbix01-proxy1) 04/10/2015 _x86_64_ (1 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
0.39 0.00 0.52 76.68 0.02 22.40
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
vda 20.47 194.20 122.28 90346 56889
dm-0 22.66 185.93 122.29 86501 56892
dm-1 0.27 1.09 0.00 508 0
Hi,In fact my problem of high IO wait time does not seem related to RBD or disk.
I really don't understand why system is showing high IO, because it does not write a lot on disk :
Code:# iostat Linux 3.2.0-4-amd64 (zabbix01-proxy1) 04/10/2015 _x86_64_ (1 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.39 0.00 0.52 76.68 0.02 22.40 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn vda 20.47 194.20 122.28 90346 56889 dm-0 22.66 185.93 122.29 86501 56892 dm-1 0.27 1.09 0.00 508 0