IO performance loss with ZFS and KVM

manoaratefy

Member
Apr 4, 2019
5
0
21
25
Good morning,

It seems that I have IO performance loss with ZFS and KVM virtualization on Proxmox.

As using default configuration, I get 1.1GB/s transfer rate with direct mounting ZFS volume on host, but only 430MB/s on the VM using "no cache" and "zfs set sync=standard". When using "zfs set sync=disabled", I get 518MB/s.

Is there any settings that I should improve to enhance my VM IO performance?

Thanks in advance.
 
Mounting a ZFS on the host is already different, as it uses a dataset different. So I question your comparison at that point already.
Also there are additional layers involved when running IO from a VM.

What virtual hardware configuration did you use? This also can have a significant impact to system performance and behaviour.

Could you also explain how your test looks like?
 
I was using dd if=/dev/zero of=/zfs0/vz/test.img to test speed. After going on some forums, people told me that dd may be innacurate due to I/O buffering. So, I was asked to use "fio" to get more accurate result.

With fio, I get this result (random read bench):
fio --filename=test --sync=1 --rw=randread --bs=4k --numjobs=1 --iodepth=4 --group_reporting --name=test --filesize=10G --runtime=300

Inside the VM:
READ: bw=27.4MiB/s (28.7MB/s), 27.4MiB/s-27.4MiB/s (28.7MB/s-28.7MB/s), io=8224MiB (8624MB), run=300001-300001msec

Outside the VM:
READ: io=10240MB, aggrb=159649KB/s, minb=159649BK/s, maxb=159649KB/s, mint=65680msec, maxt=65680msec

Note that the VM is running on CloudLinux 7.

About hardware, I use 6xCrucial CT1050MX300SSD1 SSD drives, 2x E5-2620 (total of 32 cores) and 188 GB of RAM (150 GB allocated to the VM).

I understand that inside the VM, there maybe some performance loss due to virtualization layer, but at the current performance value, it seems too limited to be used in production environment (currently the server is not yet used).
 
Ok /dev/zero does not make any sense. It is a pure sequential workload which does not match realistic workloads.
FIO is much better but we can see large differences which is odd (because it is very large).

So what is the vm configuration?
150gb ram ok. But what is the rest? Virtio scsi? Scsi disk? Tell us ;)
 
I was using VirtIO SCSI. After trying VirtIO block, I got 200 MB/s, which is a huge improvment. But tried again after few minutes, I got 40 MB/s.


Currently, the cache on the VirtIO block disk is "no cache". As far as I know, I should keep it on "No cache" as ZFS have already caching, it would make RAM to RAM copy. True?

IO thread is also disabled.
 
That sounds to me that something (perhaps in the ssds) is running in the background.
These ssds are TLC and probably not a very well fit to your workload. Likely you habe filled their "cache" in the first run while receiving "steady State" performance on the second test.

SSDs are very different in their quality and performance characteristics. There is a reason why datacenter SSDs are often 10x or more the price of a consumer SSD (which yours are clearly)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!