Performance: zvol vs. raw files on a dataset

Toxik

Well-Known Member
Jul 11, 2019
56
7
48
Germany
Hi,

while getting some infos about bhyve I stumbled upon this site: https://klarasystems.com/articles/virtualization-showdown-freebsd-bhyve-linux-kvm/

It's actually about comparing bhyve to KVM, but there is another thing: they compare IO performance of VMs installed in a zvol or in a raw file on a dataset.

From this site:
Unlike OpenZFS blocksize, there’s usually a single, clear answer as to what storage type performs best under a given hypervisor. Under Linux’s KVM, there are three primary options—QCOW2 on datasets, RAW files on datasets, and direct access to ZVOLs as block devices.

QCOW2 is a QEMU-specific storage format, and it therefore doesn’t make much sense to try to use it under FreeBSD. Under Linux KVM, QCOW2 can be worth using despite sometimes lower performance than RAW files, because it enables QEMU-specific features, including VM hibernation.

This leaves us with RAW files on OpenZFS datasets, vs OpenZFS ZVOLs passed directly down to the VM as block devices (on Linux) or character devices (on FreeBSD). On paper, ZVOLs seem like the ideal answer to VM storage needs—but we’ve found them terribly unperforming under Linux for many years, so we didn’t want to blindly assume they would be performance winners under FreeBSD either.

And:
We know most people expect zvols to be the highest-performing storage option for virtual machines using ZFS-backed storage—after all, providing the guest with a simple character device seems much more efficient than forcing it to use a raw file as a sort of “fake” device. But the numbers don’t lie—the raw file outperforms the zvol handily here, with more than twice the 1MiB throughput and six times the 4KiB throughput.

Although I suspect this will surprise many readers, it didn’t surprise me personally—I’ve been testing guest storage performance for OpenZFS and Linux KVM for more than a decade, and zvols have performed poorly by comparison each time I’ve tested them.

This was tested on FreeBSD with bhyve. And to be honest: I never really bothered about performance of my VMs, but still I'm curious if anybody has tested this on Linux KVM and has some benchmark results available.

Thanks!
 
  • Like
Reactions: leesteken
OK, did a short test myself.

FreeBSD 14 VMs:

On a zvol:
Code:
# fio --ioengine=psync --filename=test --size=9G --time_based --name=fio-vm --group_reporting --runtime=60 --direct=1 --sync=1 --iodepth=1 --rw=readwrite --bs=4K --numjobs=4
fio-vm: (g=0): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
fio-3.37
Starting 4 processes
fio-vm: Laying out IO file (1 file / 9216MiB)
Jobs: 4 (f=4): [M(4)][100.0%][r=24.2MiB/s,w=24.2MiB/s][r=6186,w=6205 IOPS][eta 00m:00s]
fio-vm: (groupid=0, jobs=4): err= 0: pid=22401: Fri May  3 16:13:22 2024
  read: IOPS=6308, BW=24.6MiB/s (25.8MB/s)(1478MiB/60001msec)
    clat (nsec): min=991, max=166011k, avg=332403.68, stdev=861985.99
     lat (nsec): min=1092, max=166012k, avg=332551.09, stdev=861991.87
    clat percentiles (usec):
     |  1.00th=[    3],  5.00th=[    4], 10.00th=[   39], 20.00th=[   52],
     | 30.00th=[   66], 40.00th=[  105], 50.00th=[  163], 60.00th=[  210],
     | 70.00th=[  293], 80.00th=[  457], 90.00th=[  766], 95.00th=[ 1139],
     | 99.00th=[ 2606], 99.50th=[ 3818], 99.90th=[ 7898], 99.95th=[10814],
     | 99.99th=[21365]
   bw (  KiB/s): min=  928, max=33040, per=100.00%, avg=25250.71, stdev=879.93, samples=476
   iops        : min=  232, max= 8260, avg=6312.67, stdev=219.98, samples=476
  write: IOPS=6309, BW=24.6MiB/s (25.8MB/s)(1479MiB/60001msec); 0 zone resets
    clat (usec): min=81, max=115596, avg=299.27, stdev=633.33
     lat (usec): min=81, max=115597, avg=299.45, stdev=633.34
    clat percentiles (usec):
     |  1.00th=[   95],  5.00th=[  108], 10.00th=[  133], 20.00th=[  157],
     | 30.00th=[  194], 40.00th=[  227], 50.00th=[  249], 60.00th=[  277],
     | 70.00th=[  302], 80.00th=[  347], 90.00th=[  420], 95.00th=[  510],
     | 99.00th=[ 1369], 99.50th=[ 2474], 99.90th=[ 5932], 99.95th=[ 8094],
     | 99.99th=[20317]
   bw (  KiB/s): min=  920, max=33848, per=100.00%, avg=25257.83, stdev=914.94, samples=476
   iops        : min=  230, max= 8462, avg=6314.45, stdev=228.73, samples=476
  lat (nsec)   : 1000=0.01%
  lat (usec)   : 2=0.35%, 4=2.92%, 10=0.90%, 20=0.64%, 50=3.90%
  lat (usec)   : 100=12.11%, 250=37.70%, 500=29.87%, 750=5.41%, 1000=2.29%
  lat (msec)   : 2=2.76%, 4=0.81%, 10=0.29%, 20=0.04%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=0.45%, sys=8.12%, ctx=4788946, majf=1, minf=7
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=378492,378599,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=24.6MiB/s (25.8MB/s), 24.6MiB/s-24.6MiB/s (25.8MB/s-25.8MB/s), io=1478MiB (1550MB), run=60001-60001msec
  WRITE: bw=24.6MiB/s (25.8MB/s), 24.6MiB/s-24.6MiB/s (25.8MB/s-25.8MB/s), io=1479MiB (1551MB), run=60001-60001msec

On a raw file in a dataset:
Code:
# fio --ioengine=psync --filename=test --size=9G --time_based --name=fio-vm --group_reporting --runtime=60 --direct=1 --sync=1 --iodepth=1 --rw=readwrite --bs=4K --numjobs=4
fio-vm: (g=0): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
...
fio-3.37
Starting 4 processes
fio-vm: Laying out IO file (1 file / 9216MiB)
Jobs: 4 (f=4): [M(4)][100.0%][r=32.0MiB/s,w=31.9MiB/s][r=8182,w=8154 IOPS][eta 00m:00s]
fio-vm: (groupid=0, jobs=4): err= 0: pid=46651: Fri May  3 16:22:13 2024
  read: IOPS=7980, BW=31.2MiB/s (32.7MB/s)(1871MiB/60002msec)
    clat (nsec): min=1052, max=230791k, avg=259772.38, stdev=664620.48
     lat (nsec): min=1172, max=230791k, avg=259900.66, stdev=664623.99
    clat percentiles (nsec):
     |  1.00th=[    1832],  5.00th=[    2640], 10.00th=[   34560],
     | 20.00th=[   48896], 30.00th=[   83456], 40.00th=[  117248],
     | 50.00th=[  150528], 60.00th=[  187392], 70.00th=[  257024],
     | 80.00th=[  382976], 90.00th=[  602112], 95.00th=[  847872],
     | 99.00th=[ 1548288], 99.50th=[ 1941504], 99.90th=[ 3784704],
     | 99.95th=[ 5341184], 99.99th=[19267584]
   bw (  KiB/s): min= 8194, max=38160, per=100.00%, avg=31930.09, stdev=975.82, samples=476
   iops        : min= 2047, max= 9540, avg=7982.42, stdev=243.98, samples=476
  write: IOPS=7993, BW=31.2MiB/s (32.7MB/s)(1874MiB/60002msec); 0 zone resets
    clat (usec): min=68, max=230723, avg=239.15, stdev=835.59
     lat (usec): min=68, max=230723, avg=239.29, stdev=835.59
    clat percentiles (usec):
     |  1.00th=[   81],  5.00th=[  105], 10.00th=[  121], 20.00th=[  151],
     | 30.00th=[  178], 40.00th=[  194], 50.00th=[  215], 60.00th=[  229],
     | 70.00th=[  255], 80.00th=[  289], 90.00th=[  338], 95.00th=[  396],
     | 99.00th=[  603], 99.50th=[  963], 99.90th=[ 2900], 99.95th=[ 4228],
     | 99.99th=[15664]
   bw (  KiB/s): min= 8118, max=38896, per=100.00%, avg=31973.53, stdev=979.12, samples=476
   iops        : min= 2028, max= 9724, avg=7993.29, stdev=244.81, samples=476
  lat (usec)   : 2=0.92%, 4=2.55%, 10=0.64%, 20=0.44%, 50=5.74%
  lat (usec)   : 100=8.96%, 250=49.39%, 500=23.64%, 750=4.13%, 1000=1.70%
  lat (msec)   : 2=1.58%, 4=0.25%, 10=0.05%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=0.45%, sys=9.24%, ctx=5729567, majf=1, minf=7
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=478857,479625,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=31.2MiB/s (32.7MB/s), 31.2MiB/s-31.2MiB/s (32.7MB/s-32.7MB/s), io=1871MiB (1961MB), run=60002-60002msec
  WRITE: bw=31.2MiB/s (32.7MB/s), 31.2MiB/s-31.2MiB/s (32.7MB/s-32.7MB/s), io=1874MiB (1965MB), run=60002-60002msec

Indeed the raw file VM is about 25% ahead :oops:
 
  • Like
Reactions: leesteken
zvol:
Code:
# fio --rw=readwrite --name=test --size=2000M --direct=1 --bs=4k
...
Run status group 0 (all jobs):
   READ: bw=34.8MiB/s (36.5MB/s), 34.8MiB/s-34.8MiB/s (36.5MB/s-36.5MB/s), io=999MiB (1048MB), run=28701-28701msec
  WRITE: bw=34.9MiB/s (36.6MB/s), 34.9MiB/s-34.9MiB/s (36.6MB/s-36.6MB/s), io=1001MiB (1049MB), run=28701-28701msec

raw file in dataset:
Code:
# fio --rw=readwrite --name=test --size=2000M --direct=1 --bs=4k
...
Run status group 0 (all jobs):
   READ: bw=49.4MiB/s (51.8MB/s), 49.4MiB/s-49.4MiB/s (51.8MB/s-51.8MB/s), io=999MiB (1048MB), run=20224-20224msec
  WRITE: bw=49.5MiB/s (51.9MB/s), 49.5MiB/s-49.5MiB/s (51.9MB/s-51.9MB/s), io=1001MiB (1049MB), run=20224-20224msec
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!