No QCOW2 on ZFS?

on ssd without compression on the zpool/datasets i see comparable write performance with qcow2 and zvol

scsi1: roland-qcow2:110/vm-110-disk-0.qcow2,aio=threads,iothread=1,detect_zeroes=off,size=50G
scsi2: roland-zvols:vm-110-disk-0,aio=threads,iothread=1,detect_zeroes=off,size=51G


1671482027290.png
 
from my experience, dd is quite ok for basic performance testing when using direct-io, the fio results do not differ much from what i posted, but result is much more difficult to read.

i just wanted to show that qcow2 write performance on top of zfs is not as bad as it's always being told (in comparison to zvol)

maybe someone want's to show why qcow2 should not be used on zfs, though. show me that pathological worst-case scenario.

I read several times you should not use qcow2 on zfs because it's copy-on-write on top of copy-on-write or because it's "nonsense".

so, if performance basically looks ok, i'm not sure what's the reason for not using it.
 
Last edited:
i just wanted to show that qcow2 write performance on top of zfs is not as bad as it's always being told (in comparison to zvol)
thats not quite what your benchmarks show. all you've shown is that writing sequential 1K blocks of zeros appears much faster on qcow files then on zvols, which probably speaks of what the caching algo is doing in both scenarios.

Benchmarks are instructive to produce a "mark" for a particular workload. I'm not certain writing large blocks of zeros is a useful metric- it certainly doesnt match any workload I've encountered.
 
>all you've shown is that writing sequential 1K blocks of zeros appears much faster on qcow files then on zvols,

where do you see that? it's 6,6MB/s on qcow2 , on zvol it's 7,1MB/s. the faster/higher values are at 1024k blocksize.

it was just meant as a very basic performance comparison and i switched off all special "zero data" handling in qemu and zfs.

what exactly does anybody like to see for better comparison?
 
Last edited:
I also throw some benchmarks in the room. Did them on my ITX-Box which isn't the greatest hardware (Atom CPU + DDR3) but I already got a directory storage, ZFS storage and LVM-Thin storage on the same mirrored disks. Disks are Intel S3710 200GB (eMLC NAND + PLP) in striped mirror. Disabled ARC data caching, with LZ4 compression but random (badly compressible) data. VM with "cache mode=none". 4K random sync writes to a ext4 formated LV using virtio SCSI single with SSD emulation and IO threads. Qcow2 and zvol are using ZFS native encryption while LVM-Thin is using LUKS encryption.

Same fio test, run inside same VM on same ZFS pool with same settings. Just once as qcow2 on top of a dataset and once as a zvol. And LVM-Thin on a mdadm raid1 on the same disks as a bonus to see non-CoW performance.

zvol (ZFS): 177 IOPS
Code:
root@Benchmark:~# fio --directory=/var/tmp --name=sync_random_write_4K --rw=rand                                                                                                                                                             write --bs=4K --direct=1 --sync=1 --numjobs=1 --ioengine=psync --iodepth=1 --ref                                                                                                                                                             ill_buffers --size=1G --runtime=180
sync_random_write_4K: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B,                                                                                                                                                              (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=496KiB/s][w=124 IOPS][eta 00m:00s]
sync_random_write_4K: (groupid=0, jobs=1): err= 0: pid=459: Wed Dec 21 02:18:06 2022
  write: IOPS=177, BW=709KiB/s (726kB/s)(125MiB/180001msec); 0 zone resets
    clat (usec): min=1074, max=404677, avg=5618.72, stdev=6586.68
     lat (usec): min=1076, max=404679, avg=5620.38, stdev=6586.70
    clat percentiles (usec):
     |  1.00th=[  1795],  5.00th=[  3589], 10.00th=[  3916], 20.00th=[  4228],
     | 30.00th=[  4490], 40.00th=[  4686], 50.00th=[  4948], 60.00th=[  5145],
     | 70.00th=[  5538], 80.00th=[  6063], 90.00th=[  7308], 95.00th=[  8979],
     | 99.00th=[ 14615], 99.50th=[ 20055], 99.90th=[ 95945], 99.95th=[152044],
     | 99.99th=[256902]
   bw (  KiB/s): min=  207, max= 1176, per=99.99%, avg=709.94, stdev=141.82, samples=358
   iops        : min=   51, max=  294, avg=177.30, stdev=35.50, samples=358
  lat (msec)   : 2=1.73%, 4=10.67%, 10=84.15%, 20=2.94%, 50=0.27%
  lat (msec)   : 100=0.15%, 250=0.08%, 500=0.01%
  cpu          : usr=1.09%, sys=6.64%, ctx=65179, majf=0, minf=11
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,31909,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=709KiB/s (726kB/s), 709KiB/s-709KiB/s (726kB/s-726kB/s), io=125MiB (131MB), run=180001-180001msec

Disk stats (read/write):
  sda: ios=149/94841, merge=0/52519, ticks=93/159005, in_queue=253532, util=98.75%

qcow2 (on top of ZFS dataset): 64 IOPS
Code:
root@Benchmark:~# fio --directory=/var/tmp --name=sync_random_write_4K --rw=randwrite --bs=4K --direct=1 --sync=1 --numjobs=1 --ioengine=psync --iodepth=1 --refill_buffers --size=1G --runtime=180
sync_random_write_4K: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=179KiB/s][w=44 IOPS][eta 00m:00s]
sync_random_write_4K: (groupid=0, jobs=1): err= 0: pid=429: Wed Dec 21 02:11:08 2022
  write: IOPS=64, BW=259KiB/s (265kB/s)(45.5MiB/180159msec); 0 zone resets
    clat (usec): min=1147, max=504376, avg=15441.66, stdev=13758.83
     lat (usec): min=1149, max=504378, avg=15443.43, stdev=13758.84
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    5], 10.00th=[    5], 20.00th=[    9],
     | 30.00th=[   14], 40.00th=[   14], 50.00th=[   15], 60.00th=[   16],
     | 70.00th=[   17], 80.00th=[   19], 90.00th=[   24], 95.00th=[   29],
     | 99.00th=[   41], 99.50th=[   52], 99.90th=[  251], 99.95th=[  313],
     | 99.99th=[  443]
   bw (  KiB/s): min=    8, max=  376, per=99.74%, avg=258.87, stdev=58.06, samples=359
   iops        : min=    2, max=   94, avg=64.45, stdev=14.53, samples=359
  lat (msec)   : 2=0.02%, 4=4.01%, 10=16.90%, 20=62.30%, 50=16.21%
  lat (msec)   : 100=0.32%, 250=0.14%, 500=0.09%, 750=0.01%
  cpu          : usr=0.42%, sys=3.00%, ctx=23579, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,11650,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=259KiB/s (265kB/s), 259KiB/s-259KiB/s (265kB/s-265kB/s), io=45.5MiB (47.7MB), run=180159-180159msec

Disk stats (read/write):
  sda: ios=98/35238, merge=0/16362, ticks=605/220828, in_queue=285678, util=99.37%

LVM-Thin (on top of mdadm raid1): 306 IOPS
Code:
root@Benchmark:~# fio --directory=/var/tmp --name=sync_random_write_4K --rw=randwrite --bs=4K --direct=1 --sync=1 --numjobs=1 --ioengine=psync --iodepth=1 --refill_buffers --size=1G --runtime=180
sync_random_write_4K: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.25
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=1025KiB/s][w=256 IOPS][eta 00m:00s]
sync_random_write_4K: (groupid=0, jobs=1): err= 0: pid=461: Wed Dec 21 02:25:42 2022
  write: IOPS=306, BW=1224KiB/s (1254kB/s)(215MiB/180001msec); 0 zone resets
    clat (usec): min=464, max=75747, avg=3247.96, stdev=2127.19
     lat (usec): min=464, max=75749, avg=3249.54, stdev=2127.35
    clat percentiles (usec):
     |  1.00th=[  619],  5.00th=[  799], 10.00th=[  963], 20.00th=[ 1434],
     | 30.00th=[ 2376], 40.00th=[ 2769], 50.00th=[ 3064], 60.00th=[ 3392],
     | 70.00th=[ 3785], 80.00th=[ 4359], 90.00th=[ 5276], 95.00th=[ 6456],
     | 99.00th=[ 9896], 99.50th=[11994], 99.90th=[19530], 99.95th=[27132],
     | 99.99th=[53216]
   bw (  KiB/s): min=  702, max= 3480, per=100.00%, avg=1225.98, stdev=365.96, samples=359
   iops        : min=  175, max=  870, avg=306.41, stdev=91.50, samples=359
  lat (usec)   : 500=0.08%, 750=3.52%, 1000=7.54%
  lat (msec)   : 2=12.36%, 4=50.77%, 10=24.76%, 20=0.86%, 50=0.08%
  lat (msec)   : 100=0.01%
  cpu          : usr=1.63%, sys=8.75%, ctx=123322, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,55091,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=1224KiB/s (1254kB/s), 1224KiB/s-1224KiB/s (1254kB/s-1254kB/s), io=215MiB (226MB), run=180001-180001msec

Disk stats (read/write):
  sda: ios=314/152918, merge=0/67307, ticks=224/153877, in_queue=212930, util=99.79%

Atleast here on that machine 64 vs 177 vs 306 IOPS aren't a small difference.

And yes, I get very similar results when running the benchmarks multiple times. Performance in general is really bad. I guess that is because of the slow machine. Same benchmark with same disk models runs way faster on the big servers, which are shutdown right now to save some electricity.
 
Last edited:
yes, that's quite some difference, but i think disabling arc cache in zfs and using cache mode=none is a rather exotic use case

here is a 4k randwrite comparison with my setup. not that drastically different results (but noticeable)

Code:
scsi1: roland-qcow2:110/vm-110-disk-0.qcow2,aio=threads,iothread=1,detect_zeroes=off,size=50G

# fio --filename=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 --name=test --refill_buffers --rw=randwrite --bs=4k --direct=1 --sync=1 --numjobs=1 --ioengine=psync --iodepth=1 --size=1g
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [w(1)][99.5%][w=6742KiB/s][w=1685 IOPS][eta 00m:01s]
test: (groupid=0, jobs=1): err= 0: pid=7279: Wed Dec 21 12:45:23 2022
  write: IOPS=1412, BW=5650KiB/s (5785kB/s)(1024MiB/185604msec); 0 zone resets
    clat (usec): min=394, max=68373, avg=701.22, stdev=898.04
     lat (usec): min=394, max=68374, avg=701.65, stdev=898.32
    clat percentiles (usec):
     |  1.00th=[  416],  5.00th=[  437], 10.00th=[  453], 20.00th=[  469],
     | 30.00th=[  486], 40.00th=[  498], 50.00th=[  515], 60.00th=[  529],
     | 70.00th=[  553], 80.00th=[  603], 90.00th=[ 1500], 95.00th=[ 1762],
     | 99.00th=[ 2245], 99.50th=[ 3195], 99.90th=[ 7046], 99.95th=[12780],
     | 99.99th=[42730]
   bw (  KiB/s): min= 1016, max= 8360, per=99.94%, avg=5645.43, stdev=2070.82, samples=371
   iops        : min=  254, max= 2090, avg=1411.33, stdev=517.72, samples=371
  lat (usec)   : 500=40.60%, 750=44.74%, 1000=1.00%
  lat (msec)   : 2=11.73%, 4=1.63%, 10=0.24%, 20=0.03%, 50=0.03%
  lat (msec)   : 100=0.01%
  cpu          : usr=2.09%, sys=6.49%, ctx=785503, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,262144,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=5650KiB/s (5785kB/s), 5650KiB/s-5650KiB/s (5785kB/s-5785kB/s), io=1024MiB (1074MB), run=185604-185604msec

Disk stats (read/write):
  sdb: ios=52/523390, merge=0/0, ticks=9/172832, in_queue=299993, util=99.99%

Code:
scsi2: roland-zvols:vm-110-disk-0,aio=threads,iothread=1,detect_zeroes=off,size=51G

root@lubuntu:~# fio --filename=/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 --name=test --refill_buffers --rw=randwrite --bs=4k --direct=1 --sync=1 --numjobs=1 --ioengine=psync --iodepth=1 --size=512m
test: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.16
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=7519KiB/s][w=1879 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=7290: Wed Dec 21 12:50:01 2022
 
512MiB/72426msec); 0 zone resets
    clat (usec): min=408, max=59914, avg=547.00, stdev=562.33
     lat (usec): min=408, max=59915, avg=547.35, stdev=562.52
    clat percentiles (usec):
     |  1.00th=[  437],  5.00th=[  453], 10.00th=[  461], 20.00th=[  474],
     | 30.00th=[  486], 40.00th=[  494], 50.00th=[  506], 60.00th=[  519],
     | 70.00th=[  529], 80.00th=[  553], 90.00th=[  586], 95.00th=[  627],
     | 99.00th=[ 1221], 99.50th=[ 2376], 99.90th=[ 5407], 99.95th=[ 8029],
     | 99.99th=[23987]
   bw (  KiB/s): min= 4664, max= 7888, per=100.00%, avg=7252.04, stdev=773.54, samples=144
   iops        : min= 1166, max= 1972, avg=1812.99, stdev=193.40, samples=144
  lat (usec)   : 500=44.34%, 750=53.89%, 1000=0.58%
  lat (msec)   : 2=0.53%, 4=0.45%, 10=0.17%, 20=0.02%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=2.15%, sys=8.23%, ctx=392868, majf=0, minf=10
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,131072,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=7239KiB/s (7413kB/s), 7239KiB/s-7239KiB/s (7413kB/s-7413kB/s), io=512MiB (537MB), run=72426-72426msec

Disk stats (read/write):
  sdc: ios=48/261952, merge=0/0, ticks=9/66505, in_queue=112632, util=99.89%

thats (IOPS=1412 / 5785kB/s) for QCOW2 vs (IOPS=1809 / 7413kB/s) for ZVOL


Hardware is Fujitsu Futro s740 ThinClient with Celeron(R) J4105 CPU @ 1.50GHz with Intel D3-S4510 SSD (< 5W in power consumption)
 
Last edited:
  • Like
Reactions: alexskysilk
yes, that's quite some difference, but i think disabling arc cache in zfs and using cache mode=none is a rather exotic use case
Jup, thats my ultimate "hit that storage as hard as possible" setup to see the worst disk performance possible. Enabling cache modes would mean I benchmark also the RAM and not just the disks.
 
Last edited:
Jup, thats my ultimate "hit that storage as hard as possible" setup to see the worst disk performance possible. Enabling cache modes would mean I benchmark also the RAM and not just the disks.
True, but isnt necessarily a useful metric as the storage performance is a consequence of how you'd be using it- which means holistically for the entire storage subsystem. It is certainly likely that the behavior would be different for the same "disks" with a different/faster/better host, but thats still relevant.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!