zfs read performance bottleneck?

phs

Renowned Member
Dec 3, 2015
38
3
73
im trying to find out why zfs is pretty slow when it comes to read performance,
i have been testing with different systems, disks and seetings
zpool create -f -o ashift=12 iotest /dev/nvem0n1
zfs create -V 8g iotest/iotest
zfs set compression=zstd-fast iotest/iotest
zfs set primarycache=metadata iotest/iotest
zfs set secondarycache=none iotest/iotest
fio --ioengine=libaio --filename=/dev/zvol/iotest/iotest --size=8G --time_based --name=fio --group_reporting --runtime=60 --direct=1 --sync=1 --iodepth=1 --ramp_time=10 --rw=randread --bs=8k --numjobs=256
testing directly on the disk im able to achieve some reasonable numbers not far away from specsheet => 400-650k IOPS (p4510 and some samsung based HPE)
testing on zvol is almost always limited to about 150-170k IOPS and it dosnt matter what cpu or disks im using

is there a bootleneck in zfs … or am i doing something wrong?

any though?
cheers
 
please ignore different pool/zvol names, it is a different system,
i have just created the benchmark zvol, wrote random data with fio and then run fio randread benchmark for 10minutes,
the box is not yet in use.

Code:
zfs create -V 100gb zfs-local/benchmark

Code:
# zpool status
  pool: zfs-local
 state: ONLINE
  scan: scrub repaired 0B in 00:03:45 with 0 errors on Sun May  9 00:27:46 2021
config:

        NAME         STATE     READ WRITE CKSUM
        zfs-local    ONLINE       0     0     0
          mirror-0   ONLINE       0     0     0
            nvme0n1  ONLINE       0     0     0
            nvme1n1  ONLINE       0     0     0

errors: No known data errors
Code:
NAME                 PROPERTY              VALUE                  SOURCE
zfs-local            type                  filesystem             -
zfs-local            creation              Fri May  7 15:15 2021  -
zfs-local            used                  103G                   -
zfs-local            available             5.53T                  -
zfs-local            referenced            96K                    -
zfs-local            compressratio         1.00x                  -
zfs-local            mounted               yes                    -
zfs-local            quota                 none                   default
zfs-local            reservation           none                   default
zfs-local            recordsize            128K                   default
zfs-local            mountpoint            /zfs-local             default
zfs-local            sharenfs              off                    default
zfs-local            checksum              on                     default
zfs-local            compression           zstd-fast              local
zfs-local            atime                 on                     default
zfs-local            devices               on                     default
zfs-local            exec                  on                     default
zfs-local            setuid                on                     default
zfs-local            readonly              off                    default
zfs-local            zoned                 off                    default
zfs-local            snapdir               hidden                 default
zfs-local            aclmode               discard                default
zfs-local            aclinherit            restricted             default
zfs-local            createtxg             1                      -
zfs-local            canmount              on                     default
zfs-local            xattr                 on                     default
zfs-local            copies                1                      default
zfs-local            version               5                      -
zfs-local            utf8only              off                    -
zfs-local            normalization         none                   -
zfs-local            casesensitivity       sensitive              -
zfs-local            vscan                 off                    default
zfs-local            nbmand                off                    default
zfs-local            sharesmb              off                    default
zfs-local            refquota              none                   default
zfs-local            refreservation        none                   default
zfs-local            guid                  16235484721916876615   -
zfs-local            primarycache          metadata               local
zfs-local            secondarycache        none                   local
zfs-local            usedbysnapshots       0B                     -
zfs-local            usedbydataset         96K                    -
zfs-local            usedbychildren        103G                   -
zfs-local            usedbyrefreservation  0B                     -
zfs-local            logbias               latency                default
zfs-local            objsetid              54                     -
zfs-local            dedup                 off                    default
zfs-local            mlslabel              none                   default
zfs-local            sync                  standard               default
zfs-local            dnodesize             legacy                 default
zfs-local            refcompressratio      1.00x                  -
zfs-local            written               96K                    -
zfs-local            logicalused           100G                   -
zfs-local            logicalreferenced     42K                    -
zfs-local            volmode               default                default
zfs-local            filesystem_limit      none                   default
zfs-local            snapshot_limit        none                   default
zfs-local            filesystem_count      none                   default
zfs-local            snapshot_count        none                   default
zfs-local            snapdev               hidden                 default
zfs-local            acltype               off                    default
zfs-local            context               none                   default
zfs-local            fscontext             none                   default
zfs-local            defcontext            none                   default
zfs-local            rootcontext           none                   default
zfs-local            relatime              off                    default
zfs-local            redundant_metadata    all                    default
zfs-local            overlay               on                     default
zfs-local            encryption            off                    default
zfs-local            keylocation           none                   default
zfs-local            keyformat             none                   default
zfs-local            pbkdf2iters           0                      default
zfs-local            special_small_blocks  0                      default
zfs-local/benchmark  type                  volume                 -
zfs-local/benchmark  creation              Fri May 14 16:09 2021  -
zfs-local/benchmark  used                  103G                   -
zfs-local/benchmark  available             5.53T                  -
zfs-local/benchmark  referenced            101G                   -
zfs-local/benchmark  compressratio         1.00x                  -
zfs-local/benchmark  reservation           none                   default
zfs-local/benchmark  volsize               100G                   local
zfs-local/benchmark  volblocksize          8K                     default
zfs-local/benchmark  checksum              on                     default
zfs-local/benchmark  compression           zstd-fast              inherited from zfs-local
zfs-local/benchmark  readonly              off                    default
zfs-local/benchmark  createtxg             123857                 -
zfs-local/benchmark  copies                1                      default
zfs-local/benchmark  refreservation        103G                   local
zfs-local/benchmark  guid                  11286252411344982988   -
zfs-local/benchmark  primarycache          metadata               inherited from zfs-local
zfs-local/benchmark  secondarycache        none                   inherited from zfs-local
zfs-local/benchmark  usedbysnapshots       0B                     -
zfs-local/benchmark  usedbydataset         101G                   -
zfs-local/benchmark  usedbychildren        0B                     -
zfs-local/benchmark  usedbyrefreservation  2.15G                  -
zfs-local/benchmark  logbias               latency                default
zfs-local/benchmark  objsetid              2952                   -
zfs-local/benchmark  dedup                 off                    default
zfs-local/benchmark  mlslabel              none                   default
zfs-local/benchmark  sync                  standard               default
zfs-local/benchmark  refcompressratio      1.00x                  -
zfs-local/benchmark  written               101G                   -
zfs-local/benchmark  logicalused           100G                   -
zfs-local/benchmark  logicalreferenced     100G                   -
zfs-local/benchmark  volmode               default                default
zfs-local/benchmark  snapshot_limit        none                   default
zfs-local/benchmark  snapshot_count        none                   default
zfs-local/benchmark  snapdev               hidden                 default
zfs-local/benchmark  context               none                   default
zfs-local/benchmark  fscontext             none                   default
zfs-local/benchmark  defcontext            none                   default
zfs-local/benchmark  rootcontext           none                   default
zfs-local/benchmark  redundant_metadata    all                    default
zfs-local/benchmark  encryption            off                    default
zfs-local/benchmark  keylocation           none                   default
zfs-local/benchmark  keyformat             none                   default
zfs-local/benchmark  pbkdf2iters           0                      default
 

Attachments

Last edited:
so just to be clear the tests/fio command and setup between the 400k iops and the 170k iops are exactly the same except one with ext4? and the other with zfs ?

is there any specific reason why you've --numjobs=256 configured ? seems rather high to me though not wrong :)

there is also an official benchmark - https://www.proxmox.com/en/downloads/item/proxmox-ve-zfs-benchmark-2020
maybe start with this fio command to have some comparison ?

I also see you're on zfs 2.0 - hmm with this I have no experience yet.
 
so just to be clear the tests/fio command and setup between the 400k iops and the 170k iops are exactly the same except one with ext4? and the other with zfs ?

is there any specific reason why you've --numjobs=256 configured ? seems rather high to me though not wrong :)

there is also an official benchmark - https://www.proxmox.com/en/downloads/item/proxmox-ve-zfs-benchmark-2020
maybe start with this fio command to have some comparison ?

I also see you're on zfs 2.0 - hmm with this I have no experience yet.

by running fio against the disk directly i do manager to achieve about 600k iops, in mirror with mdraid about 1.1M iops, with zfs it dosnt matter if single disk or mirror it seems to be capped to about 150-170k IOPS, which actually means about 80% performance lost with zfs

with 256 just want to see what the system can handle, starting from ~ >32 there isnt not much of difference
proxmox folk seems to have even worse result, sadly the test dosnt provide much insight
 
compression=zstd-fast
Did you tried it without compression or LZ4 atleast? If your CPU needs to decompress every block first its no wonder that IOPS are low.
 
it was one of the first things to test, disabling compression and checksumming,
cpu should not be a bottleneck either, tested with 6242R, 5950X and 8700k
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!