Is 42% less disk performance in VM normal?

jasminj · Dec 19, 2016

Hi!

I am benchmarking my setup and found, that the disk write performance in a VM is appr. 42% less than on the host.

I am using dd for testing. I know it is not ideal, but it should give an
estimation and a lot of others are using it, too.

I have a LSI RAID1 with a BBU configured in "Writeback with BBU" mode with two
drives. One is 100GB (sda1: LVM VG PVE) used for the Proxmox installation, the
other is 1,8TB (sdb2: LVM VG VMs) used for the VM disks.
And there is another 4T Hitachi disk for backups with an ext4 file system.

For testing I created two LV with 2GB on each of the VGs.
There is only ONE VM running in the system. The root disk of this VM is on
a DRBD device, but the second server is disconnected (no influence by DRBD).
The VM test disk in on a RAID backed LVM LV.

On the host RAID (LVM LV 2G on VMs(/dev/sdb1)) I get:
Direct:
dd if=/dev/zero of=/dev/vg_vms/vm-100-disk-2 bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 6,41299 s, 83,7 MB/s
536870912 bytes (537 MB) copied, 6,36552 s, 84,3 MB/s
536870912 bytes (537 MB) copied, 6,39561 s, 83,9 MB/s

No direct:
dd if=/dev/zero of=/dev/vg_vms/vm-100-disk-2 bs=512M count=1
536870912 bytes (537 MB) copied, 1,14586 s, 469 MB/s
536870912 bytes (537 MB) copied, 1,12022 s, 479 MB/s
536870912 bytes (537 MB) copied, 1,09654 s, 490 MB/s

On the host RAID (LVM LV 2G on PVE(/dev/sda1)) I get:
Direct:
dd if=/dev/zero of=/dev/pve/test bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 4,09055 s, 131 MB/s
536870912 bytes (537 MB) copied, 4,20133 s, 128 MB/s
536870912 bytes (537 MB) copied, 4,25934 s, 126 MB/s

No direct:
dd if=/dev/zero of=/dev/pve/test bs=512M count=1
536870912 bytes (537 MB) copied, 5,03491 s, 107 MB/s
536870912 bytes (537 MB) copied, 5,16682 s, 104 MB/s
536870912 bytes (537 MB) copied, 6,31498 s, 85,0 MB/s

On the 4T (mounted partition with ext4) disk I get:
Direct:
dd if=/dev/zero of=/mnt/scratch/xx/ff bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 4,20657 s, 128 MB/s
536870912 bytes (537 MB) copied, 4,18068 s, 128 MB/s
536870912 bytes (537 MB) copied, 4,09591 s, 131 MB/s

No direct:
dd if=/dev/zero of=/mnt/scratch/xx/ff bs=512M count=1
536870912 bytes (537 MB) copied, 1,37137 s, 391 MB/s
536870912 bytes (537 MB) copied, 4,44111 s, 121 MB/s
536870912 bytes (537 MB) copied, 4,27835 s, 125 MB/s

On the VM the 2G(vm-100-disk-2) [cache=directsync] I get:
Direct:
dd if=/dev/zero of=/dev/vdd bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 11.6293 s, 46.2 MB/s
536870912 bytes (537 MB) copied, 10.9080 s, 49.2 MB/s
536870912 bytes (537 MB) copied, 11.0009 s, 48.8 MB/s

No direct:
dd if=/dev/zero of=/dev/vdd bs=512M count=1
536870912 bytes (537 MB) copied, 11.1444 s, 48.2 MB/s
536870912 bytes (537 MB) copied, 11.4851 s, 46.7 MB/s
536870912 bytes (537 MB) copied, 11.0797 s, 48.5 MB/s

On the 4T (mounted partition with ext4):
pveperf /mnt/scratch/xx
CPU BOGOMIPS: 28727.28
REGEX/SECOND: 951930
HD SIZE: 1932.59 GB (/dev/sdc2)
BUFFERED READS: 135.83 MB/sec
AVERAGE SEEK TIME: 16.00 ms
FSYNCS/SECOND: 57.79
DNS EXT: 27.09 ms
DNS INT: 1007.16 ms (anw.at)

On the host RAID (LVM LV 2G on PVE(/dev/sda1)[ ext4 ]):
pveperf /mnt/test
CPU BOGOMIPS: 28727.28
REGEX/SECOND: 931980
HD SIZE: 1.91 GB (/dev/mapper/pve-test)
BUFFERED READS: 108.39 MB/sec
AVERAGE SEEK TIME: 7.22 ms
FSYNCS/SECOND: 50.52
DNS EXT: 27.38 ms
DNS INT: 1006.04 ms (anw.at)

On the host RAID (LVM LV 2G on PVE(/dev/sda1)[ ext3 ]):
pveperf /mnt/test
CPU BOGOMIPS: 28727.28
REGEX/SECOND: 955866
HD SIZE: 1.91 GB (/dev/mapper/pve-test)
BUFFERED READS: 147.09 MB/sec
AVERAGE SEEK TIME: 6.91 ms
FSYNCS/SECOND: 35.23
DNS EXT: 28.51 ms
DNS INT: 1006.71 ms (anw.at)

On the host RAID (LVM LV 2G on PVE(/dev/sda1)[ XFS ]):
pveperf /mnt/test
CPU BOGOMIPS: 28727.28
REGEX/SECOND: 933939
HD SIZE: 1.99 GB (/dev/mapper/pve-test)
BUFFERED READS: 140.27 MB/sec
AVERAGE SEEK TIME: 6.17 ms
FSYNCS/SECOND: 86.79
DNS EXT: 26.49 ms
DNS INT: 1006.47 ms (anw.at)

So there is a big difference which file system is used!

On the host RAID (LVM LV 2G on VMs(/dev/sdb1))[ ext4 ]:
pveperf /mnt/test
CPU BOGOMIPS: 28727.28
REGEX/SECOND: 924504
HD SIZE: 1.91 GB (/dev/mapper//dev/vg_vms/vm-100-disk-2)
BUFFERED READS: 81.18 MB/sec
AVERAGE SEEK TIME: 7.47 ms
FSYNCS/SECOND: 26.23
DNS EXT: 27.70 ms
DNS INT: 1006.62 ms (anw.at)

The LV on the bigger disk performs here also bad compared to the
LV on the smaller one.

a) I can't understand why I have such a big difference on the host with
the two disks provided by the RAID controller.
b) The smaller RAID disk performs worse in NON direct mode!?
c) What could be the reason that the VM disk performance is so bad
(host: 84 MB/s vs. VM: 48 MB/s = -42%)?

I am using the latest PVE 4.4 with a slightly older Kernel.
proxmox-ve: 4.4-76 (running kernel: 4.4.24-1-pve)
pve-manager: 4.4-1 (running version: 4.4-1/eb2d6f1e)
pve-kernel-4.4.35-1-pve: 4.4.35-76
pve-kernel-4.4.24-1-pve: 4.4.24-72
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-101
pve-firmware: 1.1-10
libpve-common-perl: 4.0-83
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-70
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.0-9
pve-container: 1.0-88
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.6-2
lxcfs: 2.0.5-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80

Edit: I just discovered that my BBU is in not optimal state. But this should
have no impact on the questions. Especially why is the VM write
performance so bad.

BR,
Jasmin

Ashley · Dec 19, 2016

You haven't listed what disks are in the Raid1, however the fact your receiving : 469 MB/s your either running Raid1 on SSD's or when running the test within the OS your using the HOST page cache.

When your running the test within the VM if your not setting the cache to Write Back mode within Proxmox you will not get the same results, as your be forcing the write directly to the disk, which if your running standard spinning disks in Raid1 the VM result your getting is closer to reality.

jasminj · Dec 19, 2016

Ashley said:
You haven't listed what disks are in the Raid1

2x Hitachi HUA723020ALA641/ 1.818 TB

Ashley said:
or when running the test within the OS your using the HOST page cache.

Yes, was a test without the "oflag=direct" option.

Ashley said:
...as your be forcing the write directly to the disk, which if your running standard spinning disks in Raid1 the VM result your getting is closer to reality.

I read in several threads, that it is best to use "cache=directsync" when the RAID controller uses a cache.

My test showed 84 MB/s on the host, but 48 MB/s from within the VM. And I don't understand why this isn't nearly the same (I am using virtio disk)? And the disk is a LVM disk without any file system. LVM maps directly to the block device in the Kernel. Ok, there is an overhead from kvm, but 42%?

Ashley · Dec 19, 2016

jasminj said:
2x Hitachi HUA723020ALA641/ 1.818 TB

Yes, was a test without the "oflag=direct" option.

I read in several threads, that it is best to use "cache=directsync" when the RAID controller uses a cache.

My test showed 84 MB/s on the host, but 48 MB/s from within the VM. And I don't understand why this isn't nearly the same (I am using virtio disk)? And the disk is a LVM disk without any file system. LVM maps directly to the block device in the Kernel. Ok, there is an overhead from kvm, but 42%?

Hello,

Fully understand directsync may be suggested, however your comparing the host where your making use of Page Cache and the VM where your not, there will always be some overhead.

However I run 100's of VM's with Write Back with BBU with no issues, belongs your running redundant PSU's and happy with your power supply Write Back should be no issues, however if you wish to run directsync, then you will skip the Page Cache which is benefiting you when running the tests at node level.

Have you tried swapping to Write Back and re-running your tests to see if you reach closer to the host performance level, at least this will answer your question.

With 2 x HUA723020ALA641 in Raid1 your looking at a max real world drive performance of 157MB/Sec, which is obviously only possible during best case and with 0 other I/O hitting the disks causing seektime.

jasminj · Dec 19, 2016

Ashley said:
however your comparing the host where your making use of Page Cache and the VM where your not, there will always be some overhead.

No I am not, at least I don't how the Page Cache would be used with my test:
dd if=/dev/zero of=/dev/vg_vms/vm-100-disk-2 bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 6,41299 s, 83,7 MB/s
As far as I understand "oflag=direct" means do not use the Page Cache.
The same says "cache=directsync" in the VM mapping and there I have only
dd if=/dev/zero of=/dev/vdd bs=512M count=1
536870912 bytes (537 MB) copied, 11.1444 s, 48.2 MB/s
So the performance should be nearly the same, but it is 42% less within the VM and this I don't understand.
Where comes the overhead from? KVM can't be the reason, otherwise nobody would use it.

Ashley said:
however if you wish to run directsync, then you will skip the Page Cache which is benefiting you when running the tests at node level.

The default in Proxmox is "No cache" which means "host don't do cache. guest disk cache is writeback". As long as barriers are used in the guest this should work also.

Ashley said:
Have you tried swapping to Write Back and re-running your tests to see if you reach closer to the host performance level, at least this will answer your question.

Will do that.

mailinglists · Dec 19, 2016

If your host is using normal LVM and you guests are using thin LVM, it will be slower because it needs to delegate disk space.
Try using fio for testing and see if it's matching results better. In my tests using dd writing without cache showed QCOW2 on XFS MUCH faster (5 to 10 times) compared to thin LVM on the same machine. But then I checked for random read write as well as just sequential write with fio (which first creates a file) and found out that Thin LVM is much faster in terms of IOPS (much much) and is comparable to sequential write speeds. Also check for VM caching options. Write back (safe) seems to work the fastest for us.

jasminj · Dec 20, 2016

jasminj said:
Will do that.

I switched the RAID controller to WriteBack and tested again. The cache size is 512MB and it is an Fujitsu/Siemens LSI MegaRAID in a Primergy RX600-S4.

On the host RAID (LVM LV 2G on VMs(/dev/sdb1)) I get:
Direct:
dd if=/dev/zero of=/dev/vg_vms/vm-100-disk-2 bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 2,83498 s, 189 MB/s
536870912 bytes (537 MB) copied, 5,73234 s, 93,7 MB/s
536870912 bytes (537 MB) copied, 5,86097 s, 91,6 MB/s
The second two show a value when the cache is full.

On the VM the 2G(vm-100-disk-2) [cache=directsync] I get:
Direct:
dd if=/dev/zero of=/dev/vdd bs=512M count=1 oflag=direct
536870912 bytes (537 MB) copied, 3.9326 s, 137 MB/s
536870912 bytes (537 MB) copied, 5.86664 s, 91.5 MB/s
536870912 bytes (537 MB) copied, 5.86964 s, 91.5 MB/s
The second two show a value when the cache is full.

So we get host: 189 MB/s --- VM: 137 MB/s
VM has still 27% less performance.
Q) Is this normal?

But why we get the SAME values when the cache is full!
Q) Can someone explain this?

mailinglists said:
If your host is using normal LVM and you guests are using thin LVM

Yes, my host uses LVM, but the VM didn't use anything. I wrote directly to the virtual disk without a file system or another LVM layer.

mailinglists said:
Try using fio for testing and see if it's matching results better.

Here it is:
$ fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test
--filename=/mnt/test/xxx --bs=4k --iodepth=64 --size=512M --readwrite=randrw
--rwmixread=75

test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.1.11
Starting 1 process
Jobs: 1 (f=1): [m(1)] [100.0% done] [3112KB/1088KB/0KB /s] [778/272/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=29625: Tue Dec 20 00:55:13 2016
read : io=392888KB, bw=1788.1KB/s, iops=447, runt=219624msec
write: io=131400KB, bw=612654B/s, iops=149, runt=219624msec
cpu : usr=0.19%, sys=0.72%, ctx=127956, majf=0, minf=5
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued : total=r=98222/w=32850/d=0, short=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
READ: io=392888KB, aggrb=1788KB/s, minb=1788KB/s, maxb=1788KB/s, mint=219624msec, maxt=219624msec
WRITE: io=131400KB, aggrb=598KB/s, minb=598KB/s, maxb=598KB/s, mint=219624msec, maxt=219624msec

I don't know if this is good/bad/normal. Please can someone shine a light on this.

Edit: Two more measurements:
pveperf /var/lib/vz/
CPU BOGOMIPS: 28725.72
REGEX/SECOND: 923854
HD SIZE: 19.56 GB (/dev/mapper/pve-data)
BUFFERED READS: 120.94 MB/sec
AVERAGE SEEK TIME: 7.93 ms
FSYNCS/SECOND: 3020.54
DNS EXT: 29.54 ms
DNS INT: 1007.12 ms (anw.at)

pveperf /mnt/test
CPU BOGOMIPS: 28725.72
REGEX/SECOND: 951621
HD SIZE: 1.91 GB (/dev/mapper/vg_vms-vm--100--disk--2)
BUFFERED READS: 100.17 MB/sec
AVERAGE SEEK TIME: 6.11 ms
FSYNCS/SECOND: 3171.67
DNS EXT: 35.10 ms
DNS INT: 1006.40 ms (anw.at)

BR,
Jasmin

mailinglists · Dec 20, 2016

If you have battery backed raid controller leave it at write back setting.
If you want to get better speeds in VM until raid controller cache is full, set write back (safe) under vm disk options and test other safe options.
Lower IO performance in VM normal. In KVM or in LXC, performance will always be less than in host.
You get the same values when the cache if full, because your storage becomes the bottle neck, not the communication between host and guest.
You speed tests look OK for your hardware.
If you wish, you can also do a full write test with fio, not random read write.

jasminj · Dec 21, 2016

mailinglists said:
If you have battery backed raid controller leave it at write back setting.

I did this now, even my BBU is reported as bad. I changed the batteries and it is still the same, but I know it is full (it reports the voltage). I hope the RAID controller will use the voltage provided by the BBU, even if it things it is bad.

mailinglists said:
If you want to get better speeds in VM until raid controller cache is full, set write back (safe) under vm disk options and test other safe options.

I keep the "directsync" because this description.

mailinglists said:
You speed tests look OK for your hardware.

Thank you very much for looking at this!

My final VM speed inside the VM with a DRBD8 backed virtual disk is now
on the Primergy RX 600 S4:
dd if=/dev/zero of=/mnt/var_nobackup/xx bs=512M count=1
536870912 bytes (537 MB) copied, 6.70416 s, 80.1 MB/s
536870912 bytes (537 MB) copied, 5.88364 s, 91.2 MB/s
536870912 bytes (537 MB) copied, 5.92139 s, 90.7 MB/s
536870912 bytes (537 MB) copied, 5.65515 s, 94.9 MB/s
536870912 bytes (537 MB) copied, 5.58821 s, 96.1 MB/s
536870912 bytes (537 MB) copied, 5.63779 s, 95.2 MB/s
536870912 bytes (537 MB) copied, 5.76106 s, 93.2 MB/s
536870912 bytes (537 MB) copied, 5.78751 s, 92.8 MB/s
536870912 bytes (537 MB) copied, 5.70511 s, 94.1 MB/s

on the Primergy RX 300 S5:
dd if=/dev/zero of=/mnt/var_nobackup/xx bs=512M count=1
536870912 bytes (537 MB) copied, 5.78925 s, 92.7 MB/s
536870912 bytes (537 MB) copied, 5.38628 s, 99.7 MB/s
536870912 bytes (537 MB) copied, 5.00846 s, 107 MB/s
536870912 bytes (537 MB) copied, 4.79641 s, 112 MB/s
536870912 bytes (537 MB) copied, 4.84843 s, 111 MB/s
536870912 bytes (537 MB) copied, 4.77465 s, 112 MB/s
536870912 bytes (537 MB) copied, 5.00373 s, 107 MB/s

BR and many thanks,
Jasmin

Ashley · Dec 21, 2016

jasminj said:
I did this now, even my BBU is reported as bad. I changed the batteries and it is still the same, but I know it is full (it reports the voltage). I hope the RAID controller will use the voltage provided by the BBU, even if it things it is bad.

If your BBU is reporting bad and you wish to use the server in production your defiantly want to swap the BBU out, if you have already tried the battery then may need to swap the whole controller out.

Most cards if they have a failed BBU will disable Write Back unless you force that in the setting.

Search

Search

Is 42% less disk performance in VM normal?

jasminj

Active Member

Ashley

Member

jasminj

Active Member

Ashley

Member

jasminj

Active Member

mailinglists

Renowned Member

jasminj

Active Member

mailinglists

Renowned Member

jasminj

Active Member

Ashley

Member