Poor ZFS storage performance in VM compared to host

ufear

Active Member
Aug 19, 2017
4
0
41
34
Hi,

This has been bothering me for a while and after spending countless hours trying to figure out how to resolve it figured I'd ask here and see if anybody has helpful pointers;

Situation:
- 1 Server, currently running proxmox 4.3.1
-- Dual E5 2683-v4 on a Supermicro X10DRD-iNT with 128GB of ECC Reg DDR4/2400Mhz
-- 1x Supermicro SuperDOM 16gb containing Proxmox install
-- 1x 512GB Samsung SM961 SSD
-- 3x 4TB 7200rpm disks
- Using ZFS configured within proxmox
-- 1 pool, compression off, dedup off, currently 39% fragmentation
-- Using 100GB of the SM961 as ZIL
-- Using the remaining 377GB of the SM961 as L2ARC

Problem:
I feel that disk I/O within VMs is rather slow/sluggish - for example when installing a bunch of packages through apt-get this can take rather long. In order to investigate I've turned to fio. This is the output on a host;

Code:
root@proxmox:/poolz1/media# fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.1.11
Starting 1 process
Jobs: 1 (f=1): [m(1)] [88.9% done] [492.3MB/163.8MB/0KB /s] [126K/41.1K/0 iops] [eta 00m:01s]
test: (groupid=0, jobs=1): err= 0: pid=26096: Sat Aug 19 15:38:50 2017
  read : io=3071.7MB, bw=404864KB/s, iops=101215, runt=  7769msec
  write: io=1024.4MB, bw=135013KB/s, iops=33753, runt=  7769msec
  cpu          : usr=6.28%, sys=92.17%, ctx=1686, majf=0, minf=349
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=786347/w=262229/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=3071.7MB, aggrb=404863KB/s, minb=404863KB/s, maxb=404863KB/s, mint=7769msec, maxt=7769msec
  WRITE: io=1024.4MB, aggrb=135013KB/s, minb=135013KB/s, maxb=135013KB/s, mint=7769msec, maxt=7769msec

Now within a fresh VM

Code:
root@perf4:~# fio --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.2.10
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [m(1)] [100.0% done] [53144KB/18136KB/0KB /s] [13.3K/4534/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=1562: Sat Aug 19 15:25:27 2017
  read : io=3071.7MB, bw=59448KB/s, iops=14861, runt= 52910msec
  write: io=1024.4MB, bw=19825KB/s, iops=4956, runt= 52910msec
  cpu          : usr=3.60%, sys=24.51%, ctx=786374, majf=0, minf=10
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=786347/w=262229/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=3071.7MB, aggrb=59447KB/s, minb=59447KB/s, maxb=59447KB/s, mint=52910msec, maxt=52910msec
  WRITE: io=1024.4MB, aggrb=19824KB/s, minb=19824KB/s, maxb=19824KB/s, mint=52910msec, maxt=52910msec

Disk stats (read/write):
  sda: ios=784380/168548, merge=11/142, ticks=38564/112752, in_queue=151468, util=74.22%

Performance seems to be roughly 1/6th of the host itself - however, if I run fio with direct=1 then:

Code:
root@perf4:~# fio --direct=1 --randrepeat=1 --ioengine=libaio --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75
test: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio-2.2.10
Starting 1 process
Jobs: 1 (f=1): [m(1)] [100.0% done] [466.1MB/155.5MB/0KB /s] [120K/39.8K/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=1611: Sat Aug 19 15:40:43 2017
  read : io=3071.7MB, bw=444704KB/s, iops=111175, runt=  7073msec
  write: io=1024.4MB, bw=148299KB/s, iops=37074, runt=  7073msec
  cpu          : usr=9.39%, sys=89.88%, ctx=768, majf=0, minf=9
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued    : total=r=786347/w=262229/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=3071.7MB, aggrb=444703KB/s, minb=444703KB/s, maxb=444703KB/s, mint=7073msec, maxt=7073msec
  WRITE: io=1024.4MB, aggrb=148298KB/s, minb=148298KB/s, maxb=148298KB/s, mint=7073msec, maxt=7073msec

Disk stats (read/write):
  sda: ios=766145/255529, merge=0/1, ticks=33420/11544, in_queue=46396, util=93.90%

Everything seems fine - but, I don't think there is a way to force to always access the disk using O_DIRECT if I am correct.

I have the feeling that something is sitting in between my ZFS setup and the VM.

Solutions i've attempted:
- Played around with all emulation modes (IDE/SCSI/SATA/VirtIO) and caching modes
- Played around with the controller emulation (VirtIO SCSI, VirtIO SCSI-Single, LSI)
- I've looked at the zfs blockvolsize, increased it from 8k to 128k (as the recordsize is for the folder where I performed the host test) - did not change anything

While changing these settings has some minor effect, either positive or negative, the amounts of IOPS remains around 15k read/5k write in a VM - while in the host this is consistently 100k/35k'ish.

Anybody have a pointer where to look next? Except for throwing the entire setup in the garbage and setting up a seperate NAS/SAN attached through 10GBe?

Thanks already!

ufear
 
Hi,
zfs is hard to test with tools like fio. Some question:
1. how is look zpool status -v (command line)
2. if I understand you test with fio with bs=4k ?
3. your VM use raw format or not?