ZFS over iSCSI backend recommendations?

aarcane

Active Member
Jul 28, 2015
32
1
28
So I'm looking into using ZFS over iSCSI, and I'm trying to find a good target. I've tried both TrueNAS Core and TrueNAS Scale, but both fail on a missing CLI utility. Being an appliance, I don't trust them to continue functioning through an update if I forcibly install those CLI utilities, and per what searching I have done, neither is well supported, despite being the most popular Options for large scale SMB and Homelab storage.

So round 3, I'm not giving up yet. I install bog standard debian 11 on the SAN system, and after struggling through some MOK stuff (Can you say maintenance nightmare?) for secure boot, I've got a functional targetcli and zfs installed. I've got a proof of concept up. It's working. However, this doesn't feel very 'production ready'. Are we really expected to install, configure, and harden a general purpose OS on what should otherwise be an enterprise SAN appliance?

So what are the recommended SAN Storage Appliances to install for running ZFS over iSCSI for Proxmox?
 

UdoB

Well-Known Member
Nov 1, 2016
286
72
48
Germany
Hello,

while I am NOT using Zfs-over-iScsi I just gave it a try - just because I'm curious :)

However, this doesn't feel very 'production ready'.

That's why I took an existing PVE (Test-) node as a target. This way I will get the same stability as for the initiator - directly from Proxmox!


You didn't disclose which documentation you followed. On a quick search I found a German blogpost:
It was easy to follow - although it is not just copy-n-paste. And because of the good results I want to share my experience:

The performance was overwhelming (for me). This is just plain 1GBit/s Ethernet. My initiator is an old(!) Xeon and the Target runs on a Intel Core i5 with mirrored Enterprise Intel SSDs. Both are members of my PVE-Cluster. And while I have separate physical NICs for "SAN" and other networks all traffic to the target goes through one single cable, just using VLANs. This is for testing purposes only - a really separated network is highly recommended of course.

In a VM with 8GiB Ram and Quad Cores I got this result:

Code:
root@core /opt/fio# fio --name=randrw  --ioengine=libaio  --direct=1 --bs=4k --iodepth=64 --size=8G --rw=randrw --rwmixread=75 --gtod_reduce=1
randrw: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.25
Starting 1 process
randrw: Laying out IO file (1 file / 8192MiB)
Jobs: 1 (f=1): [m(1)][98.4%][r=106MiB/s,w=35.3MiB/s][r=27.1k,w=9040 IOPS][eta 00m:01s]
randrw: (groupid=0, jobs=1): err= 0: pid=5203: Sun May  8 11:00:52 2022
  read: IOPS=25.8k, BW=101MiB/s (106MB/s)(6141MiB/60997msec)
   bw (  KiB/s): min=73768, max=108864, per=100.00%, avg=103137.39, stdev=8346.57, samples=121
   iops        : min=18442, max=27216, avg=25784.35, stdev=2086.64, samples=121
  write: IOPS=8607, BW=33.6MiB/s (35.3MB/s)(2051MiB/60997msec); 0 zone resets
   bw (  KiB/s): min=24984, max=38192, per=100.00%, avg=34440.36, stdev=2749.44, samples=121
   iops        : min= 6246, max= 9546, avg=8610.09, stdev=687.34, samples=121
  cpu          : usr=8.60%, sys=23.47%, ctx=340937, majf=0, minf=9
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued rwts: total=1572145,525007,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
   READ: bw=101MiB/s (106MB/s), 101MiB/s-101MiB/s (106MB/s-106MB/s), io=6141MiB (6440MB), run=60997-60997msec
  WRITE: bw=33.6MiB/s (35.3MB/s), 33.6MiB/s-33.6MiB/s (35.3MB/s-35.3MB/s), io=2051MiB (2150MB), run=60997-60997msec

Maybe you would expect good results, for me it was a really positive surprise.

Worth to mention: I did not optimize anything and I did not disable any caches. This quick test just uses the default configuration.


Best regards


Edit: added an excerpt from my actual storage.cfg:
Code:
zfs: frompveg
        blocksize 4k
        iscsiprovider LIO
        pool rpool/iscsitarget
        portal 10.3.16.7
        target iqn.2003-01.org.linux-iscsi.pveg.x8664:sn.4f1bcbb4d551
        content images
        lio_tpg tpg1
        nodes pvef
        nowritecache 1
        sparse 1
 
Last edited:

aarcane

Active Member
Jul 28, 2015
32
1
28
You didn't disclose which documentation you followed.
I didn't follow any documentation or guide. I read the included manual to make sure I was pasting the correct values in the correct boxes, but that's it. Targetcli is pretty basic really, once you've used it for a few years
 

mir

Famous Member
Apr 14, 2012
3,559
120
83
Copenhagen, Denmark
So what are the recommended SAN Storage Appliances to install for running ZFS over iSCSI for Proxmox?
I can recommend using OmniosCE. Have been running it here for more than 10 years without any problems or issues just remember to separate iSCSI traffic on a different network than the network for proxmox cluster. Preferably using 10Gbit or better is using it in an enterprise setup.
 

aarcane

Active Member
Jul 28, 2015
32
1
28
I can recommend using OmniosCE. Have been running it here for more than 10 years without any problems or issues just remember to separate iSCSI traffic on a different network than the network for proxmox cluster. Preferably using 10Gbit or better is using it in an enterprise setup.
Meaning no offense, but replacing a common general purpose server os with an obscure general purpose os, while probably very stable and suitable at the enterprise level, is hardly a step up for the smb/soho environment that's looking for a set it and forget it appliance. It's the sort of thing I'd love a good excuse to play with, and I just might, but hardly the sort of thing I want to trust my lab or office storage to for daily rapid deployment and easy maintenance. Same problems as debían, less documentation.
 

LnxBil

Famous Member
Feb 21, 2015
6,279
773
163
Saarland, Germany
Went to try it out today. Project looks dead. D. E. D. Dead. Repo is down, project owner not responding to tickets or inquiry as to when it'll be back.
Just go the Debian way, it works and is in a supported OS with a lot of documentation.

I also adopted it to be used with fiberchannel.
 
Mar 25, 2022
77
20
8
Went to try it out today. Project looks dead. D. E. D. Dead. Repo is down, project owner not responding to tickets or inquiry as to when it'll be back.
Last commit to github repo was 26 days ago. Dead? Perhaps yes, but the whole project is a set of 3 patches (for ZFSPlugin.pm, apidoc.js, pvemanagerlib.js) and a plugin (FreeNAS.pm). You can install them manually.
 
Last edited:

aarcane

Active Member
Jul 28, 2015
32
1
28
Just go the Debian way, it works and is in a supported OS with a lot of documentation.

I also adopted it to be used with fiberchannel.
Dude, fibre channel is awesome! I wish they would extend the zfs over iscsi to be zfs over fc with npiv portability. I don't care if the npiv is attached to the VM sriov style or to the host qemu. Either way, it would truly be awesome and would enable truly enterprise level SAN support in proxmox. For now I can only really use it bare metal on servers and desktops :(
 

LnxBil

Famous Member
Feb 21, 2015
6,279
773
163
Saarland, Germany
Dude, fibre channel is awesome! I wish they would extend the zfs over iscsi to be zfs over fc with npiv portability. I don't care if the npiv is attached to the VM sriov style or to the host qemu. Either way, it would truly be awesome and would enable truly enterprise level SAN support in proxmox. For now I can only really use it bare metal on servers and desktops :(
Yes, I tried to look at this, but it is unfortunately not that easy to integrate. I would really like to use my FC infrastructure as it is.
 

bbgeek17

Active Member
Nov 20, 2020
729
137
43
www.blockbridge.com
I wish they would extend the zfs over iscsi to be zfs over fc with npiv portability.
The "they" would really need to be committed to semi-dead technology and have a lot of spare time on their hands.

Is it possible to script around "symcli/navicli/hpcli/API/etc" to automatically create RAID Groups, LUNs, programmatically assign them to right ports/processors, expose them to proper WWN, then program FC switches so that you can have a LUN per VM disk as with ZFS/iSCSI? Yes, its possible in theory. I know big financials that have done it, investing hundreds of thousands into development by their internal teams. They had multiple people with six digit salaries dedicated to nothing but maintenance of these custom pieces of code for rapidly aging infrastructure.

Think of how a large scale setup might look: 100 VMs, each VM with boot disk, data disk, cloud-init. Thats 300 FC luns - an absolute nightmare to manage with Dell/HP/IBM legacy FC environment. Just the zoning updates to add a new LUN would take potentially minutes.

Those SAN systems were not designed for dynamic programmatic provisioning and management. The API is often an afterthought bolted-on for a marketing checkmark. Neither PVE folks nor the big SAN vendors will ever invest in this, simply because the Venn diagram of common customers is almost two separate circles.

For now I can only really use it bare metal on servers and desktops
Enterprise SAN support does exist, but not from the storage cartel members.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 

LnxBil

Famous Member
Feb 21, 2015
6,279
773
163
Saarland, Germany
Is it possible to script around "symcli/navicli/hpcli/API/etc" to automatically create RAID Groups, LUNs, programmatically assign them to right ports/processors, expose them to proper WWN, then program FC switches so that you can have a LUN per VM disk as with ZFS/iSCSI? Yes, its possible in theory. I know big financials that have done it, investing hundreds of thousands into development by their internal teams. They had multiple people with six digit salaries dedicated to nothing but maintenance of these custom pieces of code for rapidly aging infrastructure.

Think of how a large scale setup might look: 100 VMs, each VM with boot disk, data disk, cloud-init. Thats 300 FC luns - an absolute nightmare to manage with Dell/HP/IBM legacy FC environment. Just the zoning updates to add a new LUN would take potentially minutes.

Those SAN systems were not designed for dynamic programmatic provisioning and management. The API is often an afterthought bolted-on for a marketing checkmark. Neither PVE folks nor the big SAN vendors will ever invest in this, simply because the Venn diagram of common customers is almost two separate circles.
Sadly true.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!