[SOLVED] KVM + LVM + DRBD + SSD low performance

cbx

Active Member
Mar 2, 2012
45
1
28
Hi

I know that there is a lot of post of this in the forum, but I test lot of things and have no successful result on my server.
I use 8 small servers (Atom c2750 with 8G RAM DDR· and 1 hdd SSD 256G crucial MX100) in an proxmox cluster (version 3.4).
All servers are connected to other server to have one storage LVM for each 2 servers (all data are replicated in 2 servers) and have KVM in each server.

On each master server I recieve correct result :

dd if=/dev/zero of=/var/testfile bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 2.45458 s, 427 MB/s

But not in VM :

dd if=/dev/zero of=/var/testfile bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1,0 GB) copiados, 5,55088 s, 149 MB/s

All vm are in centos7 with ext4, I have tried to put barrier=0 in fstab and reboot, and also I have tried to active x-data-plane on in file config (http://forum.proxmox.com/threads/18502-KVM-kills-SSD-performance)... but I don't have any better result.

I'am not sure, but I don't thing the problem is the DRBD. I need KVM because openvz it's not compatible con cloudlinux that I use in some case, and not very compatible with drbd.
There is any solution to the problem (I'm deseperate :()?

Thanks a sorry for my very low english level...
 
Last edited:
How are the disks presented to the VM (IDE, SATA, VIRTIO, SCSI)?

For best performance choose VIRTIO.
For cache on DRBD you must use 'Direct Sync' or 'Write Through'
 
spirit ist most certainly right. I thing you're "only" using GBit interconnect.

DRBD reports a successful write IO when it is completed on both sides, so latency is normally not a big problem on Disk, yet it can be with SSD where everything is as fast or faster than your network connection. Throughput is also problem concerning sequential write which is always limited by the network. Best to increase would be using infiniband, yet I doubt that those small boxes have any slots left, have they?

You can measure the IO latency of your local and virtualized IO path by using ioping.

You will also have a problem concerning TRIM on DRBD, which is not supported in the current version of the pve kernel (at least as I inspected a few days ago. drbd kernel driver is 8.4.3 and 8.4.4 includes TRIM/DISCARD). Without TRIM your SSDs will most probably get very slow and will fail faster than expected.
 
Hi firstly, thanks a lot to review our problem!

How are the disks presented to the VM (IDE, SATA, VIRTIO, SCSI)?

For best performance choose VIRTIO.
For cache on DRBD you must use 'Direct Sync' or 'Write Through'

We have correctly VIRTIO and 'Write Through' in the configuration. I have tried 'Direct Sync' but no better result with it.

I think the speed difference is network latency.

you shouldn't use dd as benchmark, try with fio for example, with bigger queue depth et parallel writes.

spirit ist most certainly right. I thing you're "only" using GBit interconnect.

DRBD reports a successful write IO when it is completed on both sides, so latency is normally not a big problem on Disk, yet it can be with SSD where everything is as fast or faster than your network connection. Throughput is also problem concerning sequential write which is always limited by the network. Best to increase would be using infiniband, yet I doubt that those small boxes have any slots left, have they?

You can measure the IO latency of your local and virtualized IO path by using ioping.

You will also have a problem concerning TRIM on DRBD, which is not supported in the current version of the pve kernel (at least as I inspected a few days ago. drbd kernel driver is 8.4.3 and 8.4.4 includes TRIM/DISCARD). Without TRIM your SSDs will most probably get very slow and will fail faster than expected.

All server are connected with an 2,5G switch, so it's not muy probably that is the problem. I have deploy an VM on other similar server, without DRBD, on local SSD disk, and result seem same:

With DRBD
# ./fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=/var/test --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.0.9
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 1024MB)
Jobs: 1 (f=1): [m] [99.2% done] [4515K/1546K /s] [1128 /386 iops] [eta 00m:02s]
test: (groupid=0, jobs=1): err= 0: pid=1904: Thu Aug 6 10:40:03 2015
read : io=785444KB, bw=3201.9KB/s, iops=800 , runt=245314msec
write: io=263132KB, bw=1072.7KB/s, iops=268 , runt=245314msec
cpu : usr=0.72%, sys=3.08%, ctx=26168, majf=0, minf=17
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=196361/w=65783/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
READ: io=785444KB, aggrb=3201KB/s, minb=3201KB/s, maxb=3201KB/s, mint=245314msec, maxt=245314msec
WRITE: io=263132KB, aggrb=1072KB/s, minb=1072KB/s, maxb=1072KB/s, mint=245314msec, maxt=245314msec

Disk stats (read/write):
dm-0: ios=195601/65815, merge=0/0, ticks=181830/15278247, in_queue=15460477, util=100.00%, aggrios=196611/66083, aggrmerge=0/9, aggrticks=189055/15290618, aggrin_queue=15479556, aggrutil=100.00%
vda: ios=196611/66083, merge=0/9, ticks=189055/15290618, in_queue=15479556, util=100.00%
# ioping . -c 10 -C
4.0 KiB from . (ext4 /dev/vda2): request=1 time=12 us
4.0 KiB from . (ext4 /dev/vda2): request=2 time=13 us
4.0 KiB from . (ext4 /dev/vda2): request=3 time=13 us
4.0 KiB from . (ext4 /dev/vda2): request=4 time=13 us
4.0 KiB from . (ext4 /dev/vda2): request=5 time=13 us
4.0 KiB from . (ext4 /dev/vda2): request=6 time=13 us
4.0 KiB from . (ext4 /dev/vda2): request=7 time=12 us
4.0 KiB from . (ext4 /dev/vda2): request=8 time=22 us
4.0 KiB from . (ext4 /dev/vda2): request=9 time=12 us
4.0 KiB from . (ext4 /dev/vda2): request=10 time=13 us

--- . (ext4 /dev/vda2) ioping statistics ---
10 requests completed in 9.0 s, 73.5 k iops, 287.2 MiB/s
min/avg/max/mdev = 12 us / 13 us / 22 us / 2 us

Without DRBD

# ioping . -c 10 -C
4.0 KiB from . (ext4 /dev/vda2): request=1 time=9 us
4.0 KiB from . (ext4 /dev/vda2): request=2 time=31 us
4.0 KiB from . (ext4 /dev/vda2): request=3 time=11 us
4.0 KiB from . (ext4 /dev/vda2): request=4 time=11 us
4.0 KiB from . (ext4 /dev/vda2): request=5 time=10 us
4.0 KiB from . (ext4 /dev/vda2): request=6 time=10 us
4.0 KiB from . (ext4 /dev/vda2): request=7 time=12 us
4.0 KiB from . (ext4 /dev/vda2): request=8 time=10 us
4.0 KiB from . (ext4 /dev/vda2): request=9 time=10 us
4.0 KiB from . (ext4 /dev/vda2): request=10 time=11 us

--- . (ext4 /dev/vda2) ioping statistics ---
10 requests completed in 9.0 s, 80.0 k iops, 312.5 MiB/s
min/avg/max/mdev = 9 us / 12 us / 31 us / 6 us

So it don't appear that the issue is with DRBD. Moreover, LnxBil, you are right, I have searched an solution form trim the drbd disk, and have not found it for now :(, but it seem another problem.

Hope that you can helpme with these informations....
Newly, thanks very much to review this case.
 
I think the speed difference is network latency.

you shouldn't use dd as benchmark, try with fio for example, with bigger queue depth et parallel writes.

Spirit, I have seen an post that talk about Iothreads that is include now in proxmox4, I have read that is similar as x-data-plane, is this can help to resolve our ssd performance?
 
Normally yes. The single I/O thread performance of SSD is not as good as it would be for multiple concurrent threads. For your tests, we archive 2.5 GB/sec while using 16 I/O threads, whereas single thread performance is "only" about 800 MB/sec. Our array consists of 6 enterprise SSDs on RAID10 megaraid.
 
Normally yes. The single I/O thread performance of SSD is not as good as it would be for multiple concurrent threads. For your tests, we archive 2.5 GB/sec while using 16 I/O threads, whereas single thread performance is "only" about 800 MB/sec. Our array consists of 6 enterprise SSDs on RAID10 megaraid.

2,5 GB/sec... what a dream!!! :D
Well I have no budget for such hardware, but I am trying our same server with proxmox4, I have try to create an DRBD9 on 3 clusters node but have no success (when I try to connect other node Error: I/O error while accessing persistent configuration storage), well, not problem because it still an beta and have not seen any solution for now, so I just try an KVM with local disk with direct no cache, and result are very good for our hardware:

[root@localhost ~]# dd if=/dev/zero of=/var/testfile bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1,0 GB) copied, 1,75009 s, 599 MB/s
[root@localhost ~]# dd if=/dev/zero of=/var/testfile bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1,0 GB) copied, 1,86791 s, 561 MB/s
[root@localhost ~]# dd if=/dev/zero of=/var/testfile bs=1M count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1,0 GB) copied, 1,69789 s, 618 MB/s

With cache throught, good result also (450M +/-).
Discard and iothreats options not seem increase or decrease speed in our case. So the perf issue it's seem have a relation with the DRBD system that I have in proxmox3 :(.
 
dd is single threaded and is useless for performance testing. Apart from this your test only shows your cache performance. If dd should give any meaning the test file should be at least twice the size of your RAM and should use directIO.
 
dd is single threaded and is useless for performance testing. Apart from this your test only shows your cache performance. If dd should give any meaning the test file should be at least twice the size of your RAM and should use directIO.

Hi mir, sorry, I have very few experience about test as dd or fio, I join you what you say (VM with 2G of RAM => file of 4G), dd with oflag=direct

Without DRBD

# dd if=/dev/zero of=/var/testfile bs=1M count=4000 oflag=dir ect
4000+0 records in
4000+0 records out
4194304000 bytes (4,2 GB) copied, 13,7974 s, 304 MB/s


With DRBD
# dd if=/dev/zero of=/var/testfile bs=1M count=4000 oflag=direct
4000+0 records in
4000+0 records out
4194304000 bytes (4,2 GB) copied, 50,5698 s, 82,9 MB/s

I seem the difference is clear....
I can do some test with fio if you need it...
 
I have use the fio comand that you say in other post :

Without DRBD
# fio --description="Emulation of Intel IOmeter File Server Access Pattern" --name=iometer --bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10 --rw=randrw --rwmixread=80 --direct=1 --size=4g --ioengine=libaio --iodepth=64
iometer: (g=0): rw=randrw, bs=512-64K/512-64K/512-64K, ioengine=libaio, iodepth=64
fio-2.1.10
Starting 1 process
iometer: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [m] [100.0% done] [5163KB/1115KB/0KB /s] [1157/249/0 iops] [eta 00m:00s]
iometer: (groupid=0, jobs=1): err= 0: pid=2381: Wed Aug 12 22:53:00 2015
Description : [Emulation of Intel IOmeter File Server Access Pattern]
read : io=3274.5MB, bw=6014.5KB/s, iops=986, runt=557430msec
slat (usec): min=12, max=2549, avg=21.28, stdev=12.35
clat (msec): min=4, max=118, avg=50.11, stdev=12.96
lat (msec): min=4, max=118, avg=50.13, stdev=12.96
clat percentiles (msec):
| 1.00th=[ 22], 5.00th=[ 30], 10.00th=[ 34], 20.00th=[ 39],
| 30.00th=[ 43], 40.00th=[ 47], 50.00th=[ 50], 60.00th=[ 53],
| 70.00th=[ 57], 80.00th=[ 62], 90.00th=[ 68], 95.00th=[ 73],
| 99.00th=[ 82], 99.50th=[ 86], 99.90th=[ 94], 99.95th=[ 98],
| 99.99th=[ 105]
bw (KB /s): min= 2949, max=13217, per=100.00%, avg=6020.03, stdev=1809.44
write: io=841688KB, bw=1509.1KB/s, iops=246, runt=557430msec
slat (usec): min=13, max=17362, avg=3928.65, stdev=1985.19
clat (msec): min=6, max=119, avg=54.86, stdev=13.03
lat (msec): min=6, max=123, avg=58.79, stdev=13.06
clat percentiles (msec):
| 1.00th=[ 27], 5.00th=[ 35], 10.00th=[ 39], 20.00th=[ 44],
| 30.00th=[ 48], 40.00th=[ 51], 50.00th=[ 55], 60.00th=[ 58],
| 70.00th=[ 62], 80.00th=[ 67], 90.00th=[ 73], 95.00th=[ 78],
| 99.00th=[ 87], 99.50th=[ 91], 99.90th=[ 100], 99.95th=[ 103],
| 99.99th=[ 112]
bw (KB /s): min= 789, max= 3508, per=100.00%, avg=1511.11, stdev=466.41
lat (msec) : 10=0.01%, 20=0.46%, 50=47.19%, 100=52.29%, 250=0.04%
cpu : usr=1.02%, sys=3.17%, ctx=120140, majf=0, minf=28
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=550156/w=137644/d=0, short=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
READ: io=3274.5MB, aggrb=6014KB/s, minb=6014KB/s, maxb=6014KB/s, mint=557430msec, maxt=557430msec
WRITE: io=841688KB, aggrb=1509KB/s, minb=1509KB/s, maxb=1509KB/s, mint=557430msec, maxt=557430msec

Disk stats (read/write):
dm-1: ios=549947/137631, merge=0/0, ticks=92079/632346, in_queue=724432, util=99.47%, aggrios=550165/137685, aggrmerge=0/2, aggrticks=91025/632284, aggrin_queue=722882, aggrutil=99.40%
vda: ios=550165/137685, merge=0/2, ticks=91025/632284, in_queue=722882, util=99.40%

With DRBD
fio --description="Emulation of Intel IOmeter File Server Access Pattern" --name=iometer --bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10 --rw=randrw --rwmixread=80 --direct=1 --size=4g --ioengine=libaio --iodepth=64
iometer: (g=0): rw=randrw, bs=512-64K/512-64K/512-64K, ioengine=libaio, iodepth=64iometer: (g=0): rw=randrw, bs=512-64K/512-64K/512-64K, ioengine=libaio, iodepth=64
fio-2.1.10
Starting 1 process
iometer: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [m] [99.8% done] [15748KB/3715KB/0KB /s] [3586/864/0 iops] [eta 00m:02s]
iometer: (groupid=0, jobs=1): err= 0: pid=29970: Wed Aug 12 23:15:26 2015
Description : [Emulation of Intel IOmeter File Server Access Pattern]
read : io=3274.5MB, bw=3260.5KB/s, iops=535, runt=1028285msec
slat (usec): min=3, max=1265.5K, avg=51.74, stdev=2323.58
clat (usec): min=1, max=1302.8K, avg=7390.36, stdev=23131.50
lat (usec): min=88, max=1302.9K, avg=7444.23, stdev=23243.98
clat percentiles (usec):
| 1.00th=[ 95], 5.00th=[ 135], 10.00th=[ 183], 20.00th=[ 378],
| 30.00th=[ 812], 40.00th=[ 1864], 50.00th=[ 3152], 60.00th=[ 5024],
| 70.00th=[ 7776], 80.00th=[ 9792], 90.00th=[16320], 95.00th=[22144],
| 99.00th=[61184], 99.50th=[98816], 99.90th=[284672], 99.95th=[350208],
| 99.99th=[1028096]
bw (KB /s): min= 5, max=17431, per=100.00%, avg=3715.96, stdev=2220.55
write: io=841688KB, bw=838180B/s, iops=133, runt=1028285msec
slat (usec): min=18, max=1186.3K, avg=108.84, stdev=5706.02
clat (msec): min=1, max=2379, avg=448.21, stdev=282.07
lat (msec): min=1, max=2379, avg=448.32, stdev=282.13
clat percentiles (msec):
| 1.00th=[ 55], 5.00th=[ 192], 10.00th=[ 277], 20.00th=[ 326],
| 30.00th=[ 347], 40.00th=[ 359], 50.00th=[ 371], 60.00th=[ 383],
| 70.00th=[ 404], 80.00th=[ 457], 90.00th=[ 783], 95.00th=[ 1188],
| 99.00th=[ 1450], 99.50th=[ 1582], 99.90th=[ 2180], 99.95th=[ 2311],
| 99.99th=[ 2343]
bw (KB /s): min= 0, max= 4476, per=100.00%, avg=939.69, stdev=597.63
lat (usec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
lat (usec) : 100=0.99%, 250=10.49%, 500=7.53%, 750=4.21%, 1000=2.66%
lat (msec) : 2=6.97%, 4=11.01%, 10=20.96%, 20=10.35%, 50=3.88%
lat (msec) : 100=0.96%, 250=1.46%, 500=15.11%, 750=1.30%, 1000=0.54%
lat (msec) : 2000=1.51%, >=2000=0.04%
cpu : usr=0.98%, sys=2.77%, ctx=116139, majf=0, minf=28
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=550156/w=137644/d=0, short=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
READ: io=3274.5MB, aggrb=3260KB/s, minb=3260KB/s, maxb=3260KB/s, mint=1028285msec, maxt=1028285msec
WRITE: io=841688KB, aggrb=818KB/s, minb=818KB/s, maxb=818KB/s, mint=1028285msec, maxt=1028285msec

Disk stats (read/write):
dm-0: ios=550906/140063, merge=0/0, ticks=3632603/62112127, in_queue=65793669, util=100.00%, aggrios=552737/140043, aggrmerge=0/35, aggrticks=3707276/62091650, aggrin_queue=65818254, aggrutil=100.00%
vda: ios=552737/140043, merge=0/35, ticks=3707276/62091650, in_queue=65818254, util=100.00%
 
As fio shows your performance is reduced by 50% by using DRBD which is to be expected for writes since every write has be done twice. Why the same pattern is shown for read is strange. I am no DRBD guru so someone else have to step in here.
 
As fio shows your performance is reduced by 50% by using DRBD which is to be expected for writes since every write has be done twice. Why the same pattern is shown for read is strange. I am no DRBD guru so someone else have to step in here.

I'm still testing :
It seem the problem is directly with LVM volume (Now I don't know if the problem is also with DRBD)... I have try VM on new host without any DRBD, with one VM over local disk, and other same VM in same node, over LVM mounted on pve.
The first have correct data....the second case have poor data.
Do you know what can be the problem?
 

I have read now, but I don't see the direct relation about problem SSD / LVM... Is there any special point to review in this links that can helpme to solve the problem LVM/SSD?
For configure drdb I use proxmox doc, but for now, the problem is before the drbd...

Mir, thanks to still answered... I know that is not your problem and for this I appreciate a lot that you still try to helpme! :)
 
Continue with searching information, I found http://www.theirek.com/blog/2014/02/16/patch-dlia-raboty-s-enierghoniezavisimym-keshiem-ssd-diskov (from http://forum.proxmox.com/archive/index.php/t-18002.html post)....

I just try

echo temporary write through > /sys/class/scsi_disk/0\:0\:0\:0/cache_type

on my proxmox node and result VM are very better :

ON LOCAL DISK WITHOUT LVM
Before change
[root@localhost ~]# dd if=/dev/zero of=/var/testfile bs=1M count=4000 oflag=direct
4000+0 records in
4000+0 records out
4194304000 bytes (4,2 GB) copied, 13,5832 s, 309 MB/s

After change
[root@localhost ~]# dd if=/dev/zero of=/var/testfile bs=1M count=4000 oflag=direct
4000+0 records in
4000+0 records out
4194304000 bytes (4,2 GB) copied, 4,5915 s, 913 MB/s

ON LOCAL DISK WITH LVM
Before change
[root@localhost ~]# dd if=/dev/zero of=/var/testfile bs=1M count=4000 oflag=direct
4000+0 records in
4000+0 records out
4194304000 bytes (4,2 GB) copied, 36,954 s, 114 MB/s

After change
[root@localhost ~]# dd if=/dev/zero of=/var/testfile bs=1M count=4000 oflag=direct
4000+0 records in
4000+0 records out
4194304000 bytes (4,2 GB) copied, 19,2069 s, 218 MB/s

There is still an big big difference with and without LVM, but include with LVM, result are 2x speed!

fio is simply incredible speed with this change :

WITH LVM
# fio --description="Emulation of Intel IOmeter File Server Access Pattern" --name=iometer --bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10 --rw=randrw --rwmixread=80 --direct=1 --size=4g --ioengine=libaio --iodepth=64
iometer: (g=0): rw=randrw, bs=512-64K/512-64K/512-64K, ioengine=libaio, iodepth=64
fio-2.1.10
Starting 1 process
iometer: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [m] [100.0% done] [34807KB/8372KB/0KB /s] [8169/1977/0 iops] [eta 00m:00s]
iometer: (groupid=0, jobs=1): err= 0: pid=2351: Fri Aug 14 14:11:25 2015
Description : [Emulation of Intel IOmeter File Server Access Pattern]
read : io=3274.5MB, bw=41868KB/s, iops=6870, runt= 80075msec
slat (usec): min=13, max=4510, avg=21.37, stdev=13.97
clat (usec): min=276, max=26468, avg=7259.57, stdev=2023.38
lat (usec): min=334, max=26490, avg=7281.79, stdev=2023.80
clat percentiles (usec):
| 1.00th=[ 3600], 5.00th=[ 4384], 10.00th=[ 4896], 20.00th=[ 5536],
| 30.00th=[ 6048], 40.00th=[ 6560], 50.00th=[ 7008], 60.00th=[ 7520],
| 70.00th=[ 8096], 80.00th=[ 8896], 90.00th=[ 9920], 95.00th=[10944],
| 99.00th=[12992], 99.50th=[13888], 99.90th=[16512], 99.95th=[17536],
| 99.99th=[21376]
bw (KB /s): min=29248, max=65901, per=100.00%, avg=41889.37, stdev=7796.44
write: io=841688KB, bw=10511KB/s, iops=1718, runt= 80075msec
slat (usec): min=14, max=10538, avg=459.66, stdev=343.80
clat (msec): min=1, max=26, avg= 7.64, stdev= 2.07
lat (msec): min=1, max=26, avg= 8.10, stdev= 2.12
clat percentiles (usec):
| 1.00th=[ 3888], 5.00th=[ 4768], 10.00th=[ 5216], 20.00th=[ 5920],
| 30.00th=[ 6432], 40.00th=[ 6880], 50.00th=[ 7392], 60.00th=[ 7904],
| 70.00th=[ 8512], 80.00th=[ 9280], 90.00th=[10304], 95.00th=[11328],
| 99.00th=[13504], 99.50th=[14400], 99.90th=[17024], 99.95th=[18304],
| 99.99th=[21888]
bw (KB /s): min= 7755, max=16193, per=100.00%, avg=10516.44, stdev=1988.93
lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=2.15%, 10=87.75%, 20=10.07%, 50=0.02%
cpu : usr=7.49%, sys=21.26%, ctx=120128, majf=0, minf=27
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=550156/w=137644/d=0, short=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
READ: io=3274.5MB, aggrb=41868KB/s, minb=41868KB/s, maxb=41868KB/s, mint=80075msec, maxt=80075msec
WRITE: io=841688KB, aggrb=10511KB/s, minb=10511KB/s, maxb=10511KB/s, mint=80075msec, maxt=80075msec

Disk stats (read/write):
dm-1: ios=548202/137233, merge=0/0, ticks=179913/68409, in_queue=248324, util=95.60%, aggrios=550157/137654, aggrmerge=0/3, aggrticks=179315/68276, aggrin_queue=247208, aggrutil=95.14%
vda: ios=550157/137654, merge=0/3, ticks=179315/68276, in_queue=247208, util=95.14%

Without LVM
# fio --description="Emulation of Intel IOmeter File Server Access Pattern" --name=iometer --bssplit=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4:64k/10 --rw=randrw --rwmixread=80 --direct=1 --size=4g --ioengine=libaio --iodepth=64t=512/10:1k/5:2k/5:4k/60:8k/2:16k/4:32k/4 iometer: (g=0): rw=randrw, bs=512-64K/512-64K/512-64K, ioengine=libaio, iodepth=64 fio-2.1.10
Starting 1 process
iometer: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [m] [100.0% done] [34060KB/8049KB/0KB /s] [7950/1897/0 iops] [eta 00m:00s]
iometer: (groupid=0, jobs=1): err= 0: pid=10742: Fri Aug 14 14:09:19 2015
Description : [Emulation of Intel IOmeter File Server Access Pattern]
read : io=3274.5MB, bw=41677KB/s, iops=6839, runt= 80442msec
slat (usec): min=13, max=2844, avg=21.94, stdev=12.48
clat (usec): min=820, max=21324, avg=7289.84, stdev=1989.05
lat (usec): min=858, max=21344, avg=7312.61, stdev=1989.43
clat percentiles (usec):
| 1.00th=[ 3664], 5.00th=[ 4448], 10.00th=[ 4960], 20.00th=[ 5600],
| 30.00th=[ 6112], 40.00th=[ 6560], 50.00th=[ 7072], 60.00th=[ 7584],
| 70.00th=[ 8160], 80.00th=[ 8896], 90.00th=[ 9920], 95.00th=[10944],
| 99.00th=[12864], 99.50th=[13760], 99.90th=[15680], 99.95th=[16512],
| 99.99th=[19328]
bw (KB /s): min=30715, max=65877, per=100.00%, avg=41703.42, stdev=8163.01
write: io=841688KB, bw=10463KB/s, iops=1711, runt= 80442msec
slat (usec): min=14, max=10042, avg=459.75, stdev=342.08
clat (msec): min=1, max=21, avg= 7.69, stdev= 2.03
lat (msec): min=1, max=21, avg= 8.15, stdev= 2.08
clat percentiles (usec):
| 1.00th=[ 3952], 5.00th=[ 4768], 10.00th=[ 5280], 20.00th=[ 5984],
| 30.00th=[ 6496], 40.00th=[ 6944], 50.00th=[ 7456], 60.00th=[ 7968],
| 70.00th=[ 8512], 80.00th=[ 9280], 90.00th=[10432], 95.00th=[11328],
| 99.00th=[13376], 99.50th=[14144], 99.90th=[16320], 99.95th=[17280],
| 99.99th=[19840]
bw (KB /s): min= 7571, max=16055, per=100.00%, avg=10474.52, stdev=2071.75
lat (usec) : 1000=0.01%
lat (msec) : 2=0.01%, 4=1.95%, 10=87.76%, 20=10.28%, 50=0.01%
cpu : usr=7.59%, sys=21.66%, ctx=120126, majf=0, minf=27
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=550156/w=137644/d=0, short=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
READ: io=3274.5MB, aggrb=41677KB/s, minb=41677KB/s, maxb=41677KB/s, mint=80442msec, maxt=80442msec
WRITE: io=841688KB, aggrb=10463KB/s, minb=10463KB/s, maxb=10463KB/s, mint=80442msec, maxt=80442msec

Disk stats (read/write):
dm-1: ios=548998/137406, merge=0/0, ticks=176733/70066, in_queue=246812, util=95.41%, aggrios=550169/137651, aggrmerge=0/0, aggrticks=175726/69855, aggrin_queue=245096, aggrutil=94.97%
vda: ios=550169/137651, merge=0/0, ticks=175726/69855, in_queue=245096, util=94.97%

This only work in proxmox4, the same test in proxmox3.4 not have effect about performance
Is the solution? what have change in proxmox4 to have a so better performance in my case?

Hope someone of proxmox team can review this.
 
Last edited:
Are you testing with "fully-functioning" DRBD? So, are your drbd devices up-to-date on all nodes? Could you please try to disconnect the peer and see if the write penalty is still there? If so, it's on the additional storage layer and the problem is really to analyse in drbd, if not, your problem is the network and not drbd-itself related.
 
Are you testing with "fully-functioning" DRBD? So, are your drbd devices up-to-date on all nodes? Could you please try to disconnect the peer and see if the write penalty is still there? If so, it's on the additional storage layer and the problem is really to analyse in drbd, if not, your problem is the network and not drbd-itself related.

Thanks LnxBil for your answer.
Well, for now, it's not seem an problem of network or an issue of DRBD, but some problem with proxmox3.
In my test of yesterday, I install 2 standalone server (same hardware, withoud cluster, without drbd), one with proxmox3, the other with proxmox4 beta, in both, I install 2VM, one on local disk without LVM, other in local disk, with LVM.

Configuration of VM are same in both server (cache writethought, driver vitro, etc etc). Firstly I do some test, and seem few better result with proxmox4. But when I change the "/sys/class/scsi_disk/0\:0\:0\:0/cache_type" from "write back" to "write through"... the speed increase incrediblemente in proxmox4 beta (without reboot), but don't change in proxmox3....

I don't know how it's possible to have such difference (it's seem strange that performance in our case are "so bad" with proxmox3 and "so good" in proxmox4)...
 
Last edited:
Re: KVM + LVM + DRBD + SSD low performance : SOLVED!!!

Hi

Firstly, thanks so much to all that have lost time to review this post (mir, spirit and LnxBil)!
Finally, as I don't found the solution of problem (bad performance in proxmox3.4 but good in proxmox4) and don't want for now upgrade all to beta proxmox4, I contact support (and had to paid support licence, but they solved my problem).

The solution have been an update of kernel from the standard to the [FONT=Verdana, Arial, Helvetica]3.10.0-11, and this simple things solve problem (It's possible because we only working with kvm!)
[/FONT]
I have also upgrade DRBD to 8.4.5 version (I have failure upgrading to the last 8.4.6) and have good fio result now, with drbd :

READ: io=3071.7MB, aggrb=116010KB/s, minb=116010KB/s, maxb=116010KB/s, mint=27113msec, maxt=27113msec
WRITE: io=1024.4MB, aggrb=38686KB/s, minb=38686KB/s, maxb=38686KB/s, mint=27113msec, maxt=27113msec

The last things that I had to do, is apply the trim/discard in our configuration, do you know how to do it?

Tanks
 
Re: KVM + LVM + DRBD + SSD low performance : SOLVED!!!

Alright, at my last job we had a similar problem with storage performance inside VMs. Outside the VMs on the hypervisor level we had huge performance when doing I/O tests but inside the VMs it was miserable.

Mind you this was not a Promox environment but a Ubuntu OpenStack environment using KVM for visualization.

My colleague at the time worked very hard to find a solution and he finally managed to solve it by implementing the vhost_net kernel module. I believe this is standard in the 3.x kernel but not implemented in the 2.6.x kernels that are used by Proxmox.

After implementing the vhost_net kernel module we saw an increase of 100-300 MB/sec write and read speeds in the VMs (I am not joking the difference was this huge).

Unfortunately I don't know how my colleague fixed the issue exactly but I suspect you might have the same problem since upgrading to 3.x kernel solved it for you.

Here are some links for some extra reading I suggest you really look into this as I highly suspect the lack of vhost_net might have been the cause for your performance issues.

https://blog.codecentric.de/en/2014/09/openstack-crime-story-solved-tcpdump-sysdig-iostat-episode-3/
http://docs.openstack.org/kilo/config-reference/content/kvm.html (Go to bottom "KVM performance tweaks")
http://www.linux-kvm.org/page/UsingVhost
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!