ZFS QCOW2 vs ZVOL benchmarks

nwm

New Member
Dec 26, 2021
3
0
1
38
So I follow ZFS development quite closely and understand that the ZVOL code in ZFS isn't optimal and need quite a bit of reworking for performance (no one is sponsoring this currently) which made me question why Proxmox chose ZVOLs over QCOW2 (Note QCOW2 isn't COW on COW, the file format just has the ability to do COW given a template). The current Proxmox code for creating QCOW2 files isn't optimal so I had to edit a few files to add `extended_l2=on` and `cluster_size=128k` and finally `l2-cache-size=64M` (l2-cache-size shouldn't matter due to disk size) due to extended_l2 doubling ram requirements.

The VM WITH QCOW2 BACKED STORAGE:
Code:
randrw: (g=0): rw=randrw, bs=(R) 4096B-128KiB, (W) 4096B-128KiB, (T) 4096B-128KiB, ioengine=psync, iodepth=1
...
fio-3.25
Starting 4 processes

randrw: (groupid=0, jobs=4): err= 0: pid=1736: Sat Dec 25 23:13:16 2021
  read: IOPS=5101, BW=242MiB/s (254MB/s)(85.2GiB/360006msec)
    clat (nsec): min=661, max=108429k, avg=598680.35, stdev=1344388.05
     lat (nsec): min=681, max=108429k, avg=598946.47, stdev=1344899.03
    clat percentiles (usec):
     |  1.00th=[   15],  5.00th=[   77], 10.00th=[   92], 20.00th=[  114],
     | 30.00th=[  133], 40.00th=[  155], 50.00th=[  182], 60.00th=[  221],
     | 70.00th=[  314], 80.00th=[  824], 90.00th=[ 1385], 95.00th=[ 2311],
     | 99.00th=[ 6194], 99.50th=[ 8848], 99.90th=[15795], 99.95th=[19006],
     | 99.99th=[27132]
   bw (  KiB/s): min=19352, max=568085, per=100.00%, avg=249212.36, stdev=15444.33, samples=2852
   iops        : min=  296, max= 9405, avg=5119.07, stdev=307.22, samples=2852
  write: IOPS=5101, BW=242MiB/s (254MB/s)(85.1GiB/360006msec); 0 zone resets
    clat (nsec): min=972, max=107342k, avg=168551.40, stdev=842880.09
     lat (nsec): min=1032, max=107819k, avg=170752.52, stdev=848276.64
    clat percentiles (usec):
     |  1.00th=[    3],  5.00th=[    5], 10.00th=[    7], 20.00th=[   10],
     | 30.00th=[   14], 40.00th=[   19], 50.00th=[   25], 60.00th=[   33],
     | 70.00th=[   45], 80.00th=[   73], 90.00th=[  169], 95.00th=[  578],
     | 99.00th=[ 3458], 99.50th=[ 5014], 99.90th=[10159], 99.95th=[13435],
     | 99.99th=[25822]
   bw (  KiB/s): min=18432, max=600231, per=100.00%, avg=248895.02, stdev=15653.44, samples=2852
   iops        : min=  282, max= 9488, avg=5118.89, stdev=310.84, samples=2852
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.09%, 4=2.02%, 10=8.82%, 20=10.89%, 50=15.41%
  lat (usec)   : 100=12.01%, 250=29.20%, 500=6.71%, 750=2.19%, 1000=2.23%
  lat (msec)   : 2=6.37%, 4=2.58%, 10=1.23%, 20=0.21%, 50=0.03%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=2.91%, sys=42.12%, ctx=1864334, majf=0, minf=84
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1836467,1836399,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=242MiB/s (254MB/s), 242MiB/s-242MiB/s (254MB/s-254MB/s), io=85.2GiB (91.5GB), run=360006-360006msec
  WRITE: bw=242MiB/s (254MB/s), 242MiB/s-242MiB/s (254MB/s-254MB/s), io=85.1GiB (91.4GB), run=360006-360006msec

Disk stats (read/write):
  sda: ios=1829214/1803227, merge=0/20385739, ticks=667640/3863084, in_queue=4530725, util=98.23%

The VM with ZVOL BACKED STORAGE:

Code:
randrw: (g=0): rw=randrw, bs=(R) 4096B-128KiB, (W) 4096B-128KiB, (T) 4096B-128KiB, ioengine=psync, iodepth=1
...
fio-3.25
Starting 4 processes

randrw: (groupid=0, jobs=4): err= 0: pid=1737: Sat Dec 25 22:58:57 2021
  read: IOPS=2216, BW=115MiB/s (121MB/s)(40.4GiB/360001msec)
    clat (nsec): min=1283, max=57840k, avg=1349180.85, stdev=1969173.27
     lat (nsec): min=1343, max=57840k, avg=1349616.59, stdev=1969523.11
    clat percentiles (usec):
     |  1.00th=[   63],  5.00th=[  190], 10.00th=[  225], 20.00th=[  289],
     | 30.00th=[  388], 40.00th=[  537], 50.00th=[  709], 60.00th=[  930],
     | 70.00th=[ 1254], 80.00th=[ 1827], 90.00th=[ 3163], 95.00th=[ 4752],
     | 99.00th=[ 9503], 99.50th=[12256], 99.90th=[20055], 99.95th=[24249],
     | 99.99th=[33817]
   bw (  KiB/s): min=48881, max=434584, per=100.00%, avg=117885.82, stdev=9920.76, samples=2860
   iops        : min= 1084, max= 6574, avg=2216.60, stdev=131.79, samples=2860
  write: IOPS=2221, BW=115MiB/s (121MB/s)(40.4GiB/360001msec); 0 zone resets
    clat (nsec): min=1453, max=44148k, avg=382103.17, stdev=1064365.83
     lat (nsec): min=1493, max=44148k, avg=391463.47, stdev=1077514.67
    clat percentiles (usec):
     |  1.00th=[    6],  5.00th=[   19], 10.00th=[   29], 20.00th=[   48],
     | 30.00th=[   67], 40.00th=[   88], 50.00th=[  114], 60.00th=[  147],
     | 70.00th=[  192], 80.00th=[  273], 90.00th=[  668], 95.00th=[ 1876],
     | 99.00th=[ 5276], 99.50th=[ 6783], 99.90th=[11994], 99.95th=[14615],
     | 99.99th=[22152]
   bw (  KiB/s): min=41281, max=453336, per=100.00%, avg=117862.22, stdev=10154.48, samples=2860
   iops        : min=  929, max= 6846, avg=2221.45, stdev=137.61, samples=2860
  lat (usec)   : 2=0.01%, 4=0.27%, 10=0.77%, 20=1.96%, 50=8.16%
  lat (usec)   : 100=12.01%, 250=22.88%, 500=16.98%, 750=8.27%, 1000=6.01%
  lat (msec)   : 2=11.29%, 4=7.12%, 10=3.74%, 20=0.47%, 50=0.06%
  lat (msec)   : 100=0.01%
  cpu          : usr=7.48%, sys=44.27%, ctx=837696, majf=0, minf=78
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=797856,799628,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=115MiB/s (121MB/s), 115MiB/s-115MiB/s (121MB/s-121MB/s), io=40.4GiB (43.4GB), run=360001-360001msec
  WRITE: bw=115MiB/s (121MB/s), 115MiB/s-115MiB/s (121MB/s-121MB/s), io=40.4GiB (43.4GB), run=360001-360001msec

Disk stats (read/write):
  sda: ios=796181/792213, merge=0/9732409, ticks=563273/2183634, in_queue=2746908, util=98.76%

Let me know if more info is required or if something is obviously wrong, setup was defaults except q35 machine and 4cpus for both vms.
 
Last edited:
The current Proxmox code for creating QCOW2 files isn't optimal so I had to edit a few files to add `extended_l2=on` and `cluster_size=128k` and finally `l2-cache-size=64M` (l2-cache-size shouldn't matter due to disk size) due to extended_l2 doubling ram requirements.
Please share what files exactly you changed. Or even better make a patch and submit it as an issue. Maybe it's a good idea to add these code to the core...
 
You can easily do both things. If you add your ZFS as "Directory" type storage, you can use "qcow2" or "raw" format. So this is up to you..
But a more than double performance improvement with proxmox defaulting to the slower one doesn't inspire confidence . Of course my tests could be flawed and I would be interested in knowing if such.
 
cluster_size=128k
I'm unfamiliar with intern QCOW2 stuff, so this is ZFS's volblocksize then? Have you compared qcow cluster_size=128 with default ZVOL volblocksize 8K? If you did that, can you please redo your test with zvolblocksize 128k.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!