[SOLVED] Advise for zfs over iscsi

Discussion in 'Proxmox VE: Networking and Firewall' started by jacklayne, Oct 3, 2018.

  1. jacklayne

    jacklayne New Member

    Oct 3, 2018
    Likes Received:
    Hello All,

    I would like to have some advice for my setup with proxmox and zfs on my homelab.

    My hardware:
    CPU: i5-7400
    RAM: 32GB
    NIC: 2x1Gb Intel

    Disk in use for test: 2x160gb mirror with 1 Intel ssd for slog. I attached them as raw disks

    proxmox ver 5.2-9

    Trying to switch to zfs, I've been trying severals zfs solution ( zfs on proxmox, freenas with nfs and omnios ).
    My original idea was use FreeNAS as VM and share the dataset by NFS and use the qcow2 format, but I didn't get a good result ( poor write performance ) and same stuff with SMB.

    Then, I created a linux bridge, without interface, named VMBR2 and I assigned the IP
    I deployed an OmniOS VM and I put the virtio nic in the VMBR2. On this interface I enabled the mtu 9000 and I assigned the IP
    How can I set up the mtu=9000 for VMBR2? Is linux bridge supported these config or I have to switch to OVS?

    Then, OmniOS recognized the virtio nic as 1Gb link and I don't know how to set it up at 10Gbps.
    I run an iperf test and I got these results:

    Using proxmox as iper server, I can reach the 10Gbps link, but if I use OmniOS as server I can reach only 300Mbps, so proxmox can write only a 300Mbps instead of the maximum speed. Some advise?

    I set up the ZFS datastore with a blocksize=8k and the dataset on OmniOS as well. Please fell free to give others advise :)

    Jack Layne
    #1 jacklayne, Oct 3, 2018
    Last edited: Oct 3, 2018
  2. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Oct 1, 2014
    Likes Received:
    Yes Linux bridge do support mtu 9000.
    to set it you must edit the /etc/interfaces and set the mtu on the dev
    This is only a driver node and do not affect the real speed what is over 10GBit but this depends on your CPU.

    If you use iperf it is very important to use the same version because if you use different version such things can happen.

    I'm not sure if I do understand correct but do you use zfs on zfs?
    If so this is not recommended because you lose massive speed.
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. jacklayne

    jacklayne New Member

    Oct 3, 2018
    Likes Received:
    ok, but what if I just created a vmbr without dev? If I set up the mtu on the physical dev, this one will change on the vmbr without dev?

    Ok, I got this feeling, but I wasn't sure.

    I have been using iperf3 for all test

    No, I just attached the disks as raw to the OmniVM ( and created the pool on it ) and then I configured the ZFS over iSCSI on proxmox.

    I'm testing the performance using ATTO Benchmark in a Windows machine trying to figure out how to tune ZFS, iSCSI and blocksize.
    Firstly I did the test using a local pool ( native on proxmox ) and then I've been doing the same test using ZFS over iSCSI from OmniOS, trying to get the same performance, but since now, the performance are very different.. any idea?


    another question: In the past days I tested FreeNAS using qcow2 over NFS.. If I well understood from other posts, this isn't recommended, correct?

    Attached Files:

    #3 jacklayne, Oct 4, 2018
    Last edited: Oct 4, 2018
  4. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Oct 1, 2014
    Likes Received:
    Then there is no need to change it. The bridge is virtual so you would not get more speed.
    I'm also talking about minor versions too.
    It is very hard to say without full details. But I would guess it has something to do with caches.
    That is the same problem as ZFS on ZFS.
    To keep it general never use a Cow FS on as Cow FS.
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. jacklayne

    jacklayne New Member

    Oct 3, 2018
    Likes Received:
    ok so I can use default settings

    Ok got it :)

    Which details I should give?

    What you mean with zfs on zfs?

    I got the same feeling about the cache, but I don't know where I'm wrong.. I set up the datastore on proxmox with write cache=enabled and I setup the vdisk with cache=writeback
  6. jacklayne

    jacklayne New Member

    Oct 3, 2018
    Likes Received:
    This is the behavior that I got and I've been trying to fix it

    During the writing, the process goes down to zero ( or almost ) and then it continues to write.
    It seems a timeout during the writing.. could be the old disks that i'm using for test? but with ext4 as fs they work well..

    I used an XPEnology VM ( that usually I use everyday ) to copy a file by SMB.

    I setup the zfs dataset with atime=0, sync=enabled and compression and dedup off.
    I'm using 2 disks ( 160gb sata + intel SSD 520 as log ) with the zfs on proxmox.

    Attached Files:

  7. jacklayne

    jacklayne New Member

    Oct 3, 2018
    Likes Received:
    Ok I have done others test, using ZFS on Proxmox I'm getting an high IO delay. Just to recap, here my hardware config:

    CPU: Intel I5-7400
    RAM: 32GB
    Network: 2x Intel Gigabit

    The config I have been using on the test, is not the final one. In the final solution, I'll have 2 or 3 disks WD RED 4TB in RAID1 or RAIDZ, then I'll use a Samsung EVO 840 256GB for SLOG/L2ARC

    For test, I'm using these disks:

    Hardisk: 1TB SATA 7200RPM
    SSD: Intel SSD 520 180GB

    I created 2x zpool:

    Name: proxmox
    Disk: 1TB SATA 7200RPM
    Options: compression=lz4, sync=disabled, atime=off, ashift=12
    Dataset: 8k

    Name: ssd
    Disk: Intel SSD 520 180GB
    Option: compression=lz4, sync=disabled, atime=off
    Dataset: 8k

    I created 2x ZVOL ( 1 for each zpool ) and I attached them to a VM. The VM is an XPEnology ( Synology ) that is using a btrfs on the top of zvol.
    Then I created two shared folder over SMB, one for each zvol/zpool.

    For the test, I used 14 files ( movie files ) for a total amount of 64GB, copy them from another XPEnology VM on the same server ( no ZFS ) over SMB

    I copied the files on the SSD and HDD, as you can show from the follow images:

    Screen Shot 2018-10-09 at 12.01.50.png

    RED, SSD copy: NO IO Delay
    Yellow, HDD copy: High IO Delay

    Then, I added the ssd as log to the first pool and I enabled the sync, same behavior, as you can see from the Blue test.

    Basically, the server doesn't seems out of resources.. so What can be?

    Here the zfs arc info:

    # arc_summary
    ZFS Subsystem Report                            Tue Oct 09 10:25:34 2018
    ARC Summary: (HEALTHY)
            Memory Throttle Count:                  0
    ARC Misc:
            Deleted:                                96.38M
            Mutex Misses:                           28.56k
            Evict Skips:                            957.97k
    ARC Size:                               10.09%  1.58    GiB
            Target Size: (Adaptive)         10.69%  1.67    GiB
            Min Size (Hard Limit):          6.25%   1001.74 MiB
            Max Size (High Water):          16:1    15.65   GiB
    ARC Size Breakdown:
            Recently Used Cache Size:       99.81%  1.14    GiB
            Frequently Used Cache Size:     0.19%   2.27    MiB
    ARC Hash Breakdown:
            Elements Max:                           2.06M
            Elements Current:               9.60%   197.96k
            Collisions:                             20.88M
            Chain Max:                              7
            Chains:                                 8.56k
    ARC Total accesses:                                     71.39M
            Cache Hit Ratio:                92.45%  66.00M
            Cache Miss Ratio:               7.55%   5.39M
            Actual Hit Ratio:               92.34%  65.92M
            Data Demand Efficiency:         63.38%  13.40M
            Data Prefetch Efficiency:       26.91%  343.35k
              Most Recently Used:           24.29%  16.03M
              Most Frequently Used:         75.59%  49.90M
              Most Recently Used Ghost:     0.16%   106.13k
              Most Frequently Used Ghost:   0.10%   67.05k
              Demand Data:                  12.87%  8.49M
              Prefetch Data:                0.14%   92.39k
              Demand Metadata:              86.97%  57.41M
              Prefetch Metadata:            0.02%   13.30k
              Demand Data:                  91.01%  4.91M
              Prefetch Data:                4.65%   250.96k
              Demand Metadata:              4.09%   220.55k
              Prefetch Metadata:            0.25%   13.38k
    DMU Prefetch Efficiency:                                        85.74M
            Hit Ratio:                      7.03%   6.03M
            Miss Ratio:                     92.97%  79.71M
    ZFS Tunables:
            dbuf_cache_hiwater_pct                            10
            dbuf_cache_lowater_pct                            10
            dbuf_cache_max_bytes                              104857600
            dbuf_cache_max_shift                              5
            dmu_object_alloc_chunk_shift                      7
            ignore_hole_birth                                 1
            l2arc_feed_again                                  1
            l2arc_feed_min_ms                                 200
            l2arc_feed_secs                                   1
            l2arc_headroom                                    2
            l2arc_headroom_boost                              200
            l2arc_noprefetch                                  1
            l2arc_norw                                        0
            l2arc_write_boost                                 8388608
            l2arc_write_max                                   8388608
            metaslab_aliquot                                  524288
            metaslab_bias_enabled                             1
            metaslab_debug_load                               0
            metaslab_debug_unload                             0
            metaslab_fragmentation_factor_enabled             1
            metaslab_lba_weighting_enabled                    1
            metaslab_preload_enabled                          1
            metaslabs_per_vdev                                200
            send_holes_without_birth_time                     1
            spa_asize_inflation                               24
            spa_config_path                                   /etc/zfs/zpool.cache
            spa_load_verify_data                              1
            spa_load_verify_maxinflight                       10000
            spa_load_verify_metadata                          1
            spa_slop_shift                                    5
            zfetch_array_rd_sz                                1048576
            zfetch_max_distance                               8388608
            zfetch_max_streams                                8
            zfetch_min_sec_reap                               2
            zfs_abd_scatter_enabled                           1
            zfs_abd_scatter_max_order                         10
            zfs_admin_snapshot                                1
            zfs_arc_average_blocksize                         8192
            zfs_arc_dnode_limit                               0
            zfs_arc_dnode_limit_percent                       10
            zfs_arc_dnode_reduce_percent                      10
            zfs_arc_grow_retry                                0
            zfs_arc_lotsfree_percent                          10
            zfs_arc_max                                       0
            zfs_arc_meta_adjust_restarts                      4096
            zfs_arc_meta_limit                                0
            zfs_arc_meta_limit_percent                        75
            zfs_arc_meta_min                                  0
            zfs_arc_meta_prune                                10000
            zfs_arc_meta_strategy                             1
            zfs_arc_min                                       0
            zfs_arc_min_prefetch_lifespan                     0
            zfs_arc_p_dampener_disable                        1
            zfs_arc_p_min_shift                               0
            zfs_arc_pc_percent                                0
            zfs_arc_shrink_shift                              0
            zfs_arc_sys_free                                  0
            zfs_autoimport_disable                            1
            zfs_checksums_per_second                          20
            zfs_compressed_arc_enabled                        1
            zfs_dbgmsg_enable                                 0
            zfs_dbgmsg_maxsize                                4194304
            zfs_dbuf_state_index                              0
            zfs_deadman_checktime_ms                          5000
            zfs_deadman_enabled                               1
            zfs_deadman_synctime_ms                           1000000
            zfs_dedup_prefetch                                0
            zfs_delay_min_dirty_percent                       60
            zfs_delay_scale                                   500000
            zfs_delays_per_second                             20
            zfs_delete_blocks                                 20480
            zfs_dirty_data_max                                3361280819
            zfs_dirty_data_max_max                            4294967296
            zfs_dirty_data_max_max_percent                    25
            zfs_dirty_data_max_percent                        10
            zfs_dirty_data_sync                               67108864
            zfs_dmu_offset_next_sync                          0
            zfs_expire_snapshot                               300
            zfs_flags                                         0
            zfs_free_bpobj_enabled                            1
            zfs_free_leak_on_eio                              0
            zfs_free_max_blocks                               100000
            zfs_free_min_time_ms                              1000
            zfs_immediate_write_sz                            32768
            zfs_max_recordsize                                1048576
            zfs_mdcomp_disable                                0
            zfs_metaslab_fragmentation_threshold              70
            zfs_metaslab_segment_weight_enabled               1
            zfs_metaslab_switch_threshold                     2
            zfs_mg_fragmentation_threshold                    85
            zfs_mg_noalloc_threshold                          0
            zfs_multihost_fail_intervals                      5
            zfs_multihost_history                             0
            zfs_multihost_import_intervals                    10
            zfs_multihost_interval                            1000
            zfs_multilist_num_sublists                        0
            zfs_no_scrub_io                                   0
            zfs_no_scrub_prefetch                             0
            zfs_nocacheflush                                  0
            zfs_nopwrite_enabled                              1
            zfs_object_mutex_size                             64
            zfs_pd_bytes_max                                  52428800
            zfs_per_txg_dirty_frees_percent                   30
            zfs_prefetch_disable                              0
            zfs_read_chunk_size                               1048576
            zfs_read_history                                  0
            zfs_read_history_hits                             0
            zfs_recover                                       0
            zfs_recv_queue_length                             16777216
            zfs_resilver_delay                                2
            zfs_resilver_min_time_ms                          3000
            zfs_scan_idle                                     50
            zfs_scan_ignore_errors                            0
            zfs_scan_min_time_ms                              1000
            zfs_scrub_delay                                   4
            zfs_send_corrupt_data                             0
            zfs_send_queue_length                             16777216
            zfs_sync_pass_deferred_free                       2
            zfs_sync_pass_dont_compress                       5
            zfs_sync_pass_rewrite                             2
            zfs_sync_taskq_batch_pct                          75
            zfs_top_maxinflight                               32
            zfs_txg_history                                   0
            zfs_txg_timeout                                   5
            zfs_vdev_aggregation_limit                        131072
            zfs_vdev_async_read_max_active                    3
            zfs_vdev_async_read_min_active                    1
            zfs_vdev_async_write_active_max_dirty_percent     60
            zfs_vdev_async_write_active_min_dirty_percent     30
            zfs_vdev_async_write_max_active                   10
            zfs_vdev_async_write_min_active                   2
            zfs_vdev_cache_bshift                             16
            zfs_vdev_cache_max                                16384
            zfs_vdev_cache_size                               0
            zfs_vdev_max_active                               1000
            zfs_vdev_mirror_non_rotating_inc                  0
            zfs_vdev_mirror_non_rotating_seek_inc             1
            zfs_vdev_mirror_rotating_inc                      0
            zfs_vdev_mirror_rotating_seek_inc                 5
            zfs_vdev_mirror_rotating_seek_offset              1048576
            zfs_vdev_queue_depth_pct                          1000
            zfs_vdev_raidz_impl                               [fastest] original scalar sse2 ssse3 avx2
            zfs_vdev_read_gap_limit                           32768
            zfs_vdev_scheduler                                noop
            zfs_vdev_scrub_max_active                         2
            zfs_vdev_scrub_min_active                         1
            zfs_vdev_sync_read_max_active                     10
            zfs_vdev_sync_read_min_active                     10
            zfs_vdev_sync_write_max_active                    10
            zfs_vdev_sync_write_min_active                    10
            zfs_vdev_write_gap_limit                          4096
            zfs_zevent_cols                                   80
            zfs_zevent_console                                0
            zfs_zevent_len_max                                64
            zil_replay_disable                                0
            zil_slog_bulk                                     786432
            zio_delay_max                                     30000
            zio_dva_throttle_enabled                          1
            zio_requeue_io_start_cut_in_line                  1
            zio_taskq_batch_pct                               75
            zvol_inhibit_dev                                  0
            zvol_major                                        230
            zvol_max_discard_blocks                           16384
            zvol_prefetch_bytes                               131072
            zvol_request_sync                                 0
            zvol_threads                                      32
            zvol_volmode                                      1
    # arcstat
        time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
    10:26:57   128    44     34    44   34     0    0     0    0   3.1G  3.1G


    Attached Files:

  8. guletz

    guletz Active Member

    Apr 19, 2017
    Likes Received:
    ... your setup is the problem ;) If you use a COW FS(btrfs) on top of any other COW(zfs) then the performance is very low especially on rotational disks! So instead btrfs, use a non-COW fs like xfs, ext4!
  9. jacklayne

    jacklayne New Member

    Oct 3, 2018
    Likes Received:
    Thanks, I got an improvement, but I still have some issue ( I guess )

    I have done the same test using a FreeNAS VM, and then using a zvol on ext4 and NTFS.

    Copy on FreeNAS VM ( 1 CPU, 4 Core, RAM 8GB, Disks in Raw )
    I created an 8k dataset shared by SMB:


    Then I added back the pool in proxmox and I attached 2 zvol to these VMs ( as for the first test )

    # zpool create -f -o ashift=12 proxmox /dev/sdc log /dev/sde
    # zfs set compression=lz4 proxmox
    # zfs create proxmox/8k
    # zfs recordsize=8k proxmox/8k
    # zfs get sync
    proxmox     sync      standard  default
    proxmox/8k  sync      standard  default
    # zpool status
      pool: proxmox
     state: ONLINE
      scan: none requested
            NAME        STATE     READ WRITE CKSUM
            proxmox     ONLINE       0     0     0
              sdc       ONLINE       0     0     0
              sde       ONLINE       0     0     0
    errors: No known data errors
    # zfs list
    NAME                       USED  AVAIL  REFER  MOUNTPOINT
    proxmox                   68.3G   831G    96K  /proxmox
    proxmox/8k                68.3G   831G    96K  /proxmox/8k
    proxmox/8k/vm-104-disk-0  51.7G   831G  51.7G  -
    proxmox/8k/vm-302-disk-0  16.7G   831G  16.7G  -
    Copy on XPEnology VM ( Machine 104 ) with EXT4 filesystem:


    Copy on Windows 10 VM ( Machine 302 ) with NTFS fileystem:


    How, finally I got a low io delay on XPEnology and Windows 10, but why the writes are so "unstable"? FreeNAS seems to be more linear.


    Attached Files:

  10. guletz

    guletz Active Member

    Apr 19, 2017
    Likes Received:
    So if understood, the copy is done using a samba-share? If your normal activity/usage it will be to copy large files, then you can du like this:

    - instead of create this VM using 8K, use a large zvol block, like 32K
  11. jacklayne

    jacklayne New Member

    Oct 3, 2018
    Likes Received:
    I'll try and I'll report it back. I already have to understand how to decide the record/block size. I figure out that's depends of workload type, but it isn't straightforward.
  12. jacklayne

    jacklayne New Member

    Oct 3, 2018
    Likes Received:
    With the new disks I got a good performance, so that with the correct recordsize gets me a good result.
    guletz likes this.
  13. guletz

    guletz Active Member

    Apr 19, 2017
    Likes Received:

    So you learn somthing new ... with zfs ;) With the correct recordsize, you can get good results! Another trick - like I said
    if you mostly copy big files, and no othethers clients do not copy again the same files, then it make sense to use zfs cache
    only for meta-data and not for both(data and meta-data), like this:

    zfs set primarycache=metadata rpool/data/vm-xxxxxxxx-disk-1
  14. jacklayne

    jacklayne New Member

    Oct 3, 2018
    Likes Received:
    Ok thanks for the suggestion!
    The pool will be used mainly for the NAS VM and the others container/vm.
    In the NAS VM the files are mainly read/stream.
    What do you mean for "copy the same file in the same time"?
    Can I have an example?

  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice