[SOLVED] Advise for zfs over iscsi

jacklayne

Active Member
Oct 3, 2018
16
2
43
31
Hello All,

I would like to have some advice for my setup with proxmox and zfs on my homelab.

My hardware:
CPU: i5-7400
RAM: 32GB
NIC: 2x1Gb Intel

Disk in use for test: 2x160gb mirror with 1 Intel ssd for slog. I attached them as raw disks

proxmox ver 5.2-9

Trying to switch to zfs, I've been trying severals zfs solution ( zfs on proxmox, freenas with nfs and omnios ).
My original idea was use FreeNAS as VM and share the dataset by NFS and use the qcow2 format, but I didn't get a good result ( poor write performance ) and same stuff with SMB.

Then, I created a linux bridge, without interface, named VMBR2 and I assigned the IP 192.168.5.1
I deployed an OmniOS VM and I put the virtio nic in the VMBR2. On this interface I enabled the mtu 9000 and I assigned the IP 192.168.5.2
How can I set up the mtu=9000 for VMBR2? Is linux bridge supported these config or I have to switch to OVS?

Then, OmniOS recognized the virtio nic as 1Gb link and I don't know how to set it up at 10Gbps.
I run an iperf test and I got these results:

Using proxmox as iper server, I can reach the 10Gbps link, but if I use OmniOS as server I can reach only 300Mbps, so proxmox can write only a 300Mbps instead of the maximum speed. Some advise?

I set up the ZFS datastore with a blocksize=8k and the dataset on OmniOS as well. Please fell free to give others advise :)

Thanks,
Jack Layne
 
Last edited:
How can I set up the mtu=9000 for VMBR2? Is linux bridge supported these config or I have to switch to OVS?
Yes Linux bridge do support mtu 9000.
to set it you must edit the /etc/interfaces and set the mtu on the dev
Then, OmniOS recognized the virtio nic as 1Gb link and I don't know how to set it up at 10Gbps.
This is only a driver node and do not affect the real speed what is over 10GBit but this depends on your CPU.

Using proxmox as iper server, I can reach the 10Gbps link, but if I use OmniOS as server I can reach only 300Mbps, so proxmox can write only a 300Mbps instead of the maximum speed. Some advise?
If you use iperf it is very important to use the same version because if you use different version such things can happen.

I set up the ZFS datastore with a blocksize=8k and the dataset on OmniOS as well. Please fell free to give others advise :)
I'm not sure if I do understand correct but do you use zfs on zfs?
If so this is not recommended because you lose massive speed.
 
Yes Linux bridge do support mtu 9000.
to set it you must edit the /etc/interfaces and set the mtu on the dev

ok, but what if I just created a vmbr without dev? If I set up the mtu on the physical dev, this one will change on the vmbr without dev?

This is only a driver node and do not affect the real speed what is over 10GBit but this depends on your CPU.

Ok, I got this feeling, but I wasn't sure.


If you use iperf it is very important to use the same version because if you use different version such things can happen.

I have been using iperf3 for all test

I'm not sure if I do understand correct but do you use zfs on zfs?
If so this is not recommended because you lose massive speed.

No, I just attached the disks as raw to the OmniVM ( and created the pool on it ) and then I configured the ZFS over iSCSI on proxmox.

I'm testing the performance using ATTO Benchmark in a Windows machine trying to figure out how to tune ZFS, iSCSI and blocksize.
Firstly I did the test using a local pool ( native on proxmox ) and then I've been doing the same test using ZFS over iSCSI from OmniOS, trying to get the same performance, but since now, the performance are very different.. any idea?




another question: In the past days I tested FreeNAS using qcow2 over NFS.. If I well understood from other posts, this isn't recommended, correct?
 

Attachments

  • Screen Shot 2018-10-04 at 12.21.02.png
    Screen Shot 2018-10-04 at 12.21.02.png
    25.7 KB · Views: 77
  • Screen Shot 2018-10-04 at 12.20.47.png
    Screen Shot 2018-10-04 at 12.20.47.png
    28.9 KB · Views: 54
Last edited:
ok, but what if I just created a vmbr without dev? If I set up the mtu on the physical dev, this one will change on the vmbr without dev?
Then there is no need to change it. The bridge is virtual so you would not get more speed.
I have been using iperf3 for all test
I'm also talking about minor versions too.
Firstly I did the test using a local pool ( native on proxmox ) and then I've been doing the same test using ZFS over iSCSI from OmniOS, trying to get the same performance, but since now, the performance are very different.. any idea?
It is very hard to say without full details. But I would guess it has something to do with caches.
another question: In the past days I tested FreeNAS using qcow2 over NFS.. If I well understood from other posts, this isn't recommended, correct?
That is the same problem as ZFS on ZFS.
To keep it general never use a Cow FS on as Cow FS.
 
Then there is no need to change it. The bridge is virtual so you would not get more speed.

ok so I can use default settings

I'm also talking about minor versions too.

Ok got it :)

It is very hard to say without full details. But I would guess it has something to do with caches.

Which details I should give?

That is the same problem as ZFS on ZFS.
To keep it general never use a Cow FS on as Cow FS.

What you mean with zfs on zfs?

I got the same feeling about the cache, but I don't know where I'm wrong.. I set up the datastore on proxmox with write cache=enabled and I setup the vdisk with cache=writeback
 
This is the behavior that I got and I've been trying to fix it

During the writing, the process goes down to zero ( or almost ) and then it continues to write.
It seems a timeout during the writing.. could be the old disks that i'm using for test? but with ext4 as fs they work well..

I used an XPEnology VM ( that usually I use everyday ) to copy a file by SMB.

I setup the zfs dataset with atime=0, sync=enabled and compression and dedup off.
I'm using 2 disks ( 160gb sata + intel SSD 520 as log ) with the zfs on proxmox.
 

Attachments

  • Screen Shot 2018-10-05 at 10.00.59.png
    Screen Shot 2018-10-05 at 10.00.59.png
    9.9 KB · Views: 43
Ok I have done others test, using ZFS on Proxmox I'm getting an high IO delay. Just to recap, here my hardware config:

CPU: Intel I5-7400
RAM: 32GB
Network: 2x Intel Gigabit

The config I have been using on the test, is not the final one. In the final solution, I'll have 2 or 3 disks WD RED 4TB in RAID1 or RAIDZ, then I'll use a Samsung EVO 840 256GB for SLOG/L2ARC

For test, I'm using these disks:

Hardisk: 1TB SATA 7200RPM
SSD: Intel SSD 520 180GB

I created 2x zpool:

Name: proxmox
Disk: 1TB SATA 7200RPM
Options: compression=lz4, sync=disabled, atime=off, ashift=12
Dataset: 8k

Name: ssd
Disk: Intel SSD 520 180GB
Option: compression=lz4, sync=disabled, atime=off
Dataset: 8k

I created 2x ZVOL ( 1 for each zpool ) and I attached them to a VM. The VM is an XPEnology ( Synology ) that is using a btrfs on the top of zvol.
Then I created two shared folder over SMB, one for each zvol/zpool.

For the test, I used 14 files ( movie files ) for a total amount of 64GB, copy them from another XPEnology VM on the same server ( no ZFS ) over SMB

I copied the files on the SSD and HDD, as you can show from the follow images:

Screen Shot 2018-10-09 at 12.01.50.png

RED, SSD copy: NO IO Delay
Yellow, HDD copy: High IO Delay

Then, I added the ssd as log to the first pool and I enabled the sync, same behavior, as you can see from the Blue test.

Basically, the server doesn't seems out of resources.. so What can be?

Here the zfs arc info:

Code:
# arc_summary

------------------------------------------------------------------------
ZFS Subsystem Report                            Tue Oct 09 10:25:34 2018
ARC Summary: (HEALTHY)
        Memory Throttle Count:                  0

ARC Misc:
        Deleted:                                96.38M
        Mutex Misses:                           28.56k
        Evict Skips:                            957.97k

ARC Size:                               10.09%  1.58    GiB
        Target Size: (Adaptive)         10.69%  1.67    GiB
        Min Size (Hard Limit):          6.25%   1001.74 MiB
        Max Size (High Water):          16:1    15.65   GiB

ARC Size Breakdown:
        Recently Used Cache Size:       99.81%  1.14    GiB
        Frequently Used Cache Size:     0.19%   2.27    MiB

ARC Hash Breakdown:
        Elements Max:                           2.06M
        Elements Current:               9.60%   197.96k
        Collisions:                             20.88M
        Chain Max:                              7
        Chains:                                 8.56k

ARC Total accesses:                                     71.39M
        Cache Hit Ratio:                92.45%  66.00M
        Cache Miss Ratio:               7.55%   5.39M
        Actual Hit Ratio:               92.34%  65.92M

        Data Demand Efficiency:         63.38%  13.40M
        Data Prefetch Efficiency:       26.91%  343.35k

        CACHE HITS BY CACHE LIST:
          Most Recently Used:           24.29%  16.03M
          Most Frequently Used:         75.59%  49.90M
          Most Recently Used Ghost:     0.16%   106.13k
          Most Frequently Used Ghost:   0.10%   67.05k

        CACHE HITS BY DATA TYPE:
          Demand Data:                  12.87%  8.49M
          Prefetch Data:                0.14%   92.39k
          Demand Metadata:              86.97%  57.41M
          Prefetch Metadata:            0.02%   13.30k

        CACHE MISSES BY DATA TYPE:
          Demand Data:                  91.01%  4.91M
          Prefetch Data:                4.65%   250.96k
          Demand Metadata:              4.09%   220.55k
          Prefetch Metadata:            0.25%   13.38k


DMU Prefetch Efficiency:                                        85.74M
        Hit Ratio:                      7.03%   6.03M
        Miss Ratio:                     92.97%  79.71M



ZFS Tunables:
        dbuf_cache_hiwater_pct                            10
        dbuf_cache_lowater_pct                            10
        dbuf_cache_max_bytes                              104857600
        dbuf_cache_max_shift                              5
        dmu_object_alloc_chunk_shift                      7
        ignore_hole_birth                                 1
        l2arc_feed_again                                  1
        l2arc_feed_min_ms                                 200
        l2arc_feed_secs                                   1
        l2arc_headroom                                    2
        l2arc_headroom_boost                              200
        l2arc_noprefetch                                  1
        l2arc_norw                                        0
        l2arc_write_boost                                 8388608
        l2arc_write_max                                   8388608
        metaslab_aliquot                                  524288
        metaslab_bias_enabled                             1
        metaslab_debug_load                               0
        metaslab_debug_unload                             0
        metaslab_fragmentation_factor_enabled             1
        metaslab_lba_weighting_enabled                    1
        metaslab_preload_enabled                          1
        metaslabs_per_vdev                                200
        send_holes_without_birth_time                     1
        spa_asize_inflation                               24
        spa_config_path                                   /etc/zfs/zpool.cache
        spa_load_verify_data                              1
        spa_load_verify_maxinflight                       10000
        spa_load_verify_metadata                          1
        spa_slop_shift                                    5
        zfetch_array_rd_sz                                1048576
        zfetch_max_distance                               8388608
        zfetch_max_streams                                8
        zfetch_min_sec_reap                               2
        zfs_abd_scatter_enabled                           1
        zfs_abd_scatter_max_order                         10
        zfs_admin_snapshot                                1
        zfs_arc_average_blocksize                         8192
        zfs_arc_dnode_limit                               0
        zfs_arc_dnode_limit_percent                       10
        zfs_arc_dnode_reduce_percent                      10
        zfs_arc_grow_retry                                0
        zfs_arc_lotsfree_percent                          10
        zfs_arc_max                                       0
        zfs_arc_meta_adjust_restarts                      4096
        zfs_arc_meta_limit                                0
        zfs_arc_meta_limit_percent                        75
        zfs_arc_meta_min                                  0
        zfs_arc_meta_prune                                10000
        zfs_arc_meta_strategy                             1
        zfs_arc_min                                       0
        zfs_arc_min_prefetch_lifespan                     0
        zfs_arc_p_dampener_disable                        1
        zfs_arc_p_min_shift                               0
        zfs_arc_pc_percent                                0
        zfs_arc_shrink_shift                              0
        zfs_arc_sys_free                                  0
        zfs_autoimport_disable                            1
        zfs_checksums_per_second                          20
        zfs_compressed_arc_enabled                        1
        zfs_dbgmsg_enable                                 0
        zfs_dbgmsg_maxsize                                4194304
        zfs_dbuf_state_index                              0
        zfs_deadman_checktime_ms                          5000
        zfs_deadman_enabled                               1
        zfs_deadman_synctime_ms                           1000000
        zfs_dedup_prefetch                                0
        zfs_delay_min_dirty_percent                       60
        zfs_delay_scale                                   500000
        zfs_delays_per_second                             20
        zfs_delete_blocks                                 20480
        zfs_dirty_data_max                                3361280819
        zfs_dirty_data_max_max                            4294967296
        zfs_dirty_data_max_max_percent                    25
        zfs_dirty_data_max_percent                        10
        zfs_dirty_data_sync                               67108864
        zfs_dmu_offset_next_sync                          0
        zfs_expire_snapshot                               300
        zfs_flags                                         0
        zfs_free_bpobj_enabled                            1
        zfs_free_leak_on_eio                              0
        zfs_free_max_blocks                               100000
        zfs_free_min_time_ms                              1000
        zfs_immediate_write_sz                            32768
        zfs_max_recordsize                                1048576
        zfs_mdcomp_disable                                0
        zfs_metaslab_fragmentation_threshold              70
        zfs_metaslab_segment_weight_enabled               1
        zfs_metaslab_switch_threshold                     2
        zfs_mg_fragmentation_threshold                    85
        zfs_mg_noalloc_threshold                          0
        zfs_multihost_fail_intervals                      5
        zfs_multihost_history                             0
        zfs_multihost_import_intervals                    10
        zfs_multihost_interval                            1000
        zfs_multilist_num_sublists                        0
        zfs_no_scrub_io                                   0
        zfs_no_scrub_prefetch                             0
        zfs_nocacheflush                                  0
        zfs_nopwrite_enabled                              1
        zfs_object_mutex_size                             64
        zfs_pd_bytes_max                                  52428800
        zfs_per_txg_dirty_frees_percent                   30
        zfs_prefetch_disable                              0
        zfs_read_chunk_size                               1048576
        zfs_read_history                                  0
        zfs_read_history_hits                             0
        zfs_recover                                       0
        zfs_recv_queue_length                             16777216
        zfs_resilver_delay                                2
        zfs_resilver_min_time_ms                          3000
        zfs_scan_idle                                     50
        zfs_scan_ignore_errors                            0
        zfs_scan_min_time_ms                              1000
        zfs_scrub_delay                                   4
        zfs_send_corrupt_data                             0
        zfs_send_queue_length                             16777216
        zfs_sync_pass_deferred_free                       2
        zfs_sync_pass_dont_compress                       5
        zfs_sync_pass_rewrite                             2
        zfs_sync_taskq_batch_pct                          75
        zfs_top_maxinflight                               32
        zfs_txg_history                                   0
        zfs_txg_timeout                                   5
        zfs_vdev_aggregation_limit                        131072
        zfs_vdev_async_read_max_active                    3
        zfs_vdev_async_read_min_active                    1
        zfs_vdev_async_write_active_max_dirty_percent     60
        zfs_vdev_async_write_active_min_dirty_percent     30
        zfs_vdev_async_write_max_active                   10
        zfs_vdev_async_write_min_active                   2
        zfs_vdev_cache_bshift                             16
        zfs_vdev_cache_max                                16384
        zfs_vdev_cache_size                               0
        zfs_vdev_max_active                               1000
        zfs_vdev_mirror_non_rotating_inc                  0
        zfs_vdev_mirror_non_rotating_seek_inc             1
        zfs_vdev_mirror_rotating_inc                      0
        zfs_vdev_mirror_rotating_seek_inc                 5
        zfs_vdev_mirror_rotating_seek_offset              1048576
        zfs_vdev_queue_depth_pct                          1000
        zfs_vdev_raidz_impl                               [fastest] original scalar sse2 ssse3 avx2
        zfs_vdev_read_gap_limit                           32768
        zfs_vdev_scheduler                                noop
        zfs_vdev_scrub_max_active                         2
        zfs_vdev_scrub_min_active                         1
        zfs_vdev_sync_read_max_active                     10
        zfs_vdev_sync_read_min_active                     10
        zfs_vdev_sync_write_max_active                    10
        zfs_vdev_sync_write_min_active                    10
        zfs_vdev_write_gap_limit                          4096
        zfs_zevent_cols                                   80
        zfs_zevent_console                                0
        zfs_zevent_len_max                                64
        zil_replay_disable                                0
        zil_slog_bulk                                     786432
        zio_delay_max                                     30000
        zio_dva_throttle_enabled                          1
        zio_requeue_io_start_cut_in_line                  1
        zio_taskq_batch_pct                               75
        zvol_inhibit_dev                                  0
        zvol_major                                        230
        zvol_max_discard_blocks                           16384
        zvol_prefetch_bytes                               131072
        zvol_request_sync                                 0
        zvol_threads                                      32
        zvol_volmode                                      1

# arcstat
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz     c
10:26:57   128    44     34    44   34     0    0     0    0   3.1G  3.1G

Thanks,
Jack
 

Attachments

  • Screen Shot 2018-10-09 at 12.01.50.png
    Screen Shot 2018-10-09 at 12.01.50.png
    341.8 KB · Views: 31
Basically, the server doesn't seems out of resources.. so What can be?
... your setup is the problem ;) If you use a COW FS(btrfs) on top of any other COW(zfs) then the performance is very low especially on rotational disks! So instead btrfs, use a non-COW fs like xfs, ext4!
 
... your setup is the problem ;) If you use a COW FS(btrfs) on top of any other COW(zfs) then the performance is very low especially on rotational disks! So instead btrfs, use a non-COW fs like xfs, ext4!

Thanks, I got an improvement, but I still have some issue ( I guess )

I have done the same test using a FreeNAS VM, and then using a zvol on ext4 and NTFS.

Copy on FreeNAS VM ( 1 CPU, 4 Core, RAM 8GB, Disks in Raw )
I created an 8k dataset shared by SMB:



Then I added back the pool in proxmox and I attached 2 zvol to these VMs ( as for the first test )

Code:
# zpool create -f -o ashift=12 proxmox /dev/sdc log /dev/sde
# zfs set compression=lz4 proxmox
# zfs create proxmox/8k
# zfs recordsize=8k proxmox/8k
# zfs get sync
NAME        PROPERTY  VALUE     SOURCE
proxmox     sync      standard  default
proxmox/8k  sync      standard  default

# zpool status
  pool: proxmox
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        proxmox     ONLINE       0     0     0
          sdc       ONLINE       0     0     0
        logs
          sde       ONLINE       0     0     0

errors: No known data errors

# zfs list
NAME                       USED  AVAIL  REFER  MOUNTPOINT
proxmox                   68.3G   831G    96K  /proxmox
proxmox/8k                68.3G   831G    96K  /proxmox/8k
proxmox/8k/vm-104-disk-0  51.7G   831G  51.7G  -
proxmox/8k/vm-302-disk-0  16.7G   831G  16.7G  -

Copy on XPEnology VM ( Machine 104 ) with EXT4 filesystem:



Copy on Windows 10 VM ( Machine 302 ) with NTFS fileystem:



How, finally I got a low io delay on XPEnology and Windows 10, but why the writes are so "unstable"? FreeNAS seems to be more linear.

Thanks!
 

Attachments

  • freenas_copy.png
    freenas_copy.png
    282.5 KB · Views: 31
  • ext4_copy.png
    ext4_copy.png
    210.3 KB · Views: 30
  • windows_copy.png
    windows_copy.png
    286.2 KB · Views: 29
So if understood, the copy is done using a samba-share? If your normal activity/usage it will be to copy large files, then you can du like this:

- instead of create this VM using 8K, use a large zvol block, like 32K
 
So if understood, the copy is done using a samba-share? If your normal activity/usage it will be to copy large files, then you can du like this:

- instead of create this VM using 8K, use a large zvol block, like 32K

I'll try and I'll report it back. I already have to understand how to decide the record/block size. I figure out that's depends of workload type, but it isn't straightforward.
 
With the new disks I got a good performance, so that with the correct recordsize gets me a good result.
 
  • Like
Reactions: guletz
Hi,

So you learn somthing new ... with zfs ;) With the correct recordsize, you can get good results! Another trick - like I said
if you mostly copy big files, and no othethers clients do not copy again the same files, then it make sense to use zfs cache
only for meta-data and not for both(data and meta-data), like this:

zfs set primarycache=metadata rpool/data/vm-xxxxxxxx-disk-1
 
Ok thanks for the suggestion!
The pool will be used mainly for the NAS VM and the others container/vm.
In the NAS VM the files are mainly read/stream.
What do you mean for "copy the same file in the same time"?
Can I have an example?

Thanks
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!