PVE4 & SRP storage from another ZFS pool via SRPT

elurex

Active Member
Oct 28, 2015
204
15
38
Taiwan
Hi All

I am new to PV4 (less than 24 hrs) but I am very familiar with ubuntu, infiniband and zfs

My setup

PVE4 node: Mellanox ConnectX-3 VPI dual port card (and sr-iov can be initiated but some issue with iommu grouping now)
SAN node: Mellanox ConnectX-3 VPI dual port card running ZFS pool serving virtual block device over RDMA ( ubuntu 14.04.3 but It can be another PVE4 node as well)

in /etc/modules
Code:
mlx4_en
mlx4_ib
ib_ipoib
ib_umad
ib_srp

and I have a startup script execute the following
Code:
echo "id_ext=xxxxxx,ioc_guid=xxxxxx,dgid=xxxxx,pkey=ffff,service_id=xxxxx" > /sys/class/infiniband_srp/srp-mlx4_0-1/add_target

The disks are found (controller SCST_BIO)
Code:
: ~# lsscsi
[0:0:1:0]    cd/dvd  MATSHITA DVD-ROM SR-8178  PZ16  /dev/sr0
[4:2:0:0]    disk    LSI      MegaRAID SAS RMB 1.40  /dev/sda
[5:0:0:0]    disk    SCST_BIO kvm-node0         310  /dev/sdb
[5:0:0:1]    disk    SCST_BIO kvm-node1         310  /dev/sdc
[5:0:0:2]    disk    SCST_BIO kvm-node2         310  /dev/sdd
[5:0:0:3]    disk    SCST_BIO kvm-node3         310  /dev/sde
[5:0:0:4]    disk    SCST_BIO kvm-node4         310  /dev/sdf
[5:0:0:5]    disk    SCST_BIO kvm-node5         310  /dev/sdg

I am assuming adding them to VM should follow
pve wiki Physical_disk_to_kvm
( i can't even have url in my post yet)

Is there a more a native way that PVE can support virtual block device other than ZFS over iscsi? knowing those disks are actually not local (but multipath) so that when I setup PVE clusters can do live migration without the need to move data (because its actually handled by SAN)

Or I need to use ceph storage cluster method described by
pve wiki Storage:_Ceph
(I can run ceph on top of zfs) and also
mellanox docs/DOC-2141
. The downside is its not RDMA over infiniband network but RDMA over Ethernet network (huge performance drop)

I am trying moving away from VMWare to PVE4 due to it supports LXC and ZFS natively (FS on Root! however, many uses zpool in stripe-mirror mode, now is limited to raid1 or raidZx) Eventually I plan to run 4 PVE4 nodes with 2 of them as SAN (backup can be done using pve's zsync which basically is zfs send|zfs recv script) and all of them are interconnected with Mellanox ConnectX-3 VPI

PVE4 supports both inifiniband and zfs, somehow when both are used together, some of the HPC features are missing.
 
Last edited: