Shared Remote ZFS Storage

Hello,

Thanks everyone for mentioning StarWind. I wanted to clarify couple things. StarWind VSAN have had Linux version available for years.


You can get it from our website by requesting the version for Proxmox. https://www.starwindsoftware.com/starwind-virtual-san#download
You will receive the download link and key after submitting the form.

There is no 10TB limit for free version. In addition, free version for KVM doesn't have limitations. https://www.starwindsoftware.com/vsan-free-vs-paid

As for I/O overhead, of course there is some due to increased data path. However, if you want to be able to squeeze high I/O, we recommend using PCIe Passthrough of storage devices for StarWind CVM. Here is an example of performance we achieved in our lab. https://www.starwindsoftware.com/bl...wind-vsan-proxmox-hci-performance-comparison/

FYI, we regularly check our solution with Proxmox in our QA cycle.

Feel free to reach me out, if you have any questions.

Best regards,
Alex
I don't see a bare metal installation... so i'm supposed to tax the PVE with not only sharing it's i/O with the VMs it's hosting, but, share it's cpu time with storage management and replication... can you say CEPH ???? while a PITA, it's free and included....
 
I don't see a bare metal installation... so i'm supposed to tax the PVE with not only sharing it's i/O with the VMs it's hosting, but, share it's cpu time with storage management and replication... can you say CEPH ???? while a PITA, it's free and included....
At least with Ceph the storage is running in parallel with hypervisor. Here your hypervisor depends on the VM that its supposed to run... You get what you pay for.
 
I don't see a bare metal installation... so i'm supposed to tax the PVE with not only sharing it's i/O with the VMs it's hosting, but, share it's cpu time with storage management and replication... can you say CEPH ???? while a PITA, it's free and included....
Hello,

Yes, you will need to share CPU time with storage management and replication, but ceph also needs CPU time even running on bare-metal.
https://ceph.io/en/news/blog/2022/ceph-osd-cpu-scaling/
We currently do not support running our software directly on Proxmox, it is on our roadmap and will be added in the future.
As I've mentioned before, VSAN free is available for different KVM flavors (Proxmox included).

BTW, I love ceph and run it in my lab.

Best regards,
Alex
 
in light of the discussion here, I am looking at using "Syncthing" to get two truenas machines to stay very close to, if not immediately in sync...
Syncthing is a bust for iscsi. Truenas presents the isci "lun" only to it's Ui. So even as a DR, its only viable with NFS \ SAMB \ CIFS shares.
 
I would like to make a product request: would Proxmox be able to modify the PBS Code to deliver a ZFS over ISCSI pool? call it simply Proxmox Storage server. it doesn't have to be "HA" as it could use the "ZFS Replication" from the Truenas scale code. maybe even a merge of the truenas scale code and this https://marcelliot.net/zfs-over-iscsi-for-proxmox-and-freenas/

It doesn't have to be a feature rich , no need to share SMB or CIFS or even NFS to end users... just a rock solid implementation of zfs over iscsi
@dcsapak thoughts? ive been using "https://github.com/TheGrandWazoo/freenas-proxmox" as product and a guide to set up a storage target the seems a little overkill with all of the Truenas Scale features.
 
Last edited:
@jt_telrite - I think you are misunderstanding, iSCSI does not implement target partitioning and provisioning. You attach an iSCSI target to your virtual machine, then your OS can create a ZFS pool (or any disk system) on top of that. If you wanted a GUI that does that before you connect it to your VM, then your VM would have to import the pool from a 'foreign' source which is "not recommended" to do blindly (as you don't know whether another client still has access to it). What would happen if you attach your ZFS pool to an OS that does not support ZFS, it may ask you to 'initialize' your disk and wipe it.

If you mean that iSCSI on top of ZFS; TrueNAS does that, as does Illumos and derivatives, Houston (GUI plugin from 45Drives on top of the native RHEL and Ubuntu options) and Ceph (the thing that is already in Proxmox) also has iSCSI support.
 
Last edited:
  • Like
Reactions: Johannes S
@jt_telrite - I think you are misunderstanding, iSCSI does not implement target partitioning and provisioning. You attach an iSCSI target to your virtual machine, then your OS can create a ZFS pool (or any disk system) on top of that. If you wanted a GUI that does that before you connect it to your VM, then your VM would have to import the pool from a 'foreign' source which is "not recommended" to do blindly (as you don't know whether another client still has access to it). What would happen if you attach your ZFS pool to an OS that does not support ZFS, it may ask you to 'initialize' your disk and wipe it.

If you mean that iSCSI on top of ZFS; TrueNAS does that, as does Illumos and derivatives, Houston (GUI plugin from 45Drives on top of the native RHEL and Ubuntu options) and Ceph (the thing that is already in Proxmox) also has iSCSI support.
I think you are, because the OP was over complicated ("My Bad"), misinterpreting the need\ask. Yes, Truenas presents ISCI targets. The first issue is that Proxmox doesn't avail itself of what's presented without a plugin, and that plugin is a "community" addition that could end at any time. The Second issue is that Proxmox doesn't support the ISCSI target implementation protocols that are presented by Truenas and others without CLI modifications.

https://forums.truenas.com/t/add-li...roxmox-zfs-over-iscsi-can-work-natively/25374

The ask should have been less complicated. i.e." Please support an official Truenas "ZFS Over ISCSI" connection" or "Please Expand PBS to export a ZFS Over ISCSI connection for shared storage or fork PBS as PSS (Proxmox Storage Server).

Reply
 
I think you mean, please improve iSCSI target scanning. The iSCSI protocol exposes a block device (disk), not a partition (which is what ZFS is), hence the ZFS over iSCSI is moot. It's iSCSI, whatever partition scheme your guest uses (GPT, MBR) is irrelevant and can't be "discovered" until you have authenticated.

In regards the discovery, you would have to specify the type your iSCSI endpoint supports (eg. iSNS) and that's not always trivial to find out, you would still need the authentication piece etc. I don't know if TrueNAS exposes the discovery of targets, but you can use iscsiadm on the command line to figure out what your particular flavor of NAS does or doesn't do and how it works. In the end having a GUI for iscsiadm may be nice, but there are loads of knobs, it's a lot of work for something that's being replaced with NVMeoF.

There is nothing preventing you from writing a better plugin, I'm personally working on an NVMeoF although I only have 1 vendor to test with.
 
I think you mean, please improve iSCSI target scanning.
I think there is a disconnect here.

What you have said about iSCSI is all correct. However, OP is looking at a particular combination of multiple technologies.
In PVE speak, what OP is after is "ZFS over iSCSI". It has an unfortunate and confusing name. It's a historical legacy now and won't change.

What this specific storage scheme is doing at a high level:
a) ability of Client (PVE) to SSH into the NAS to create an internal slice/volume. The native PVE plugin only uses SSH, some 3rd party may use API.
b) expose the newly created slice via iSCSI. In this case this is a ZFS slice. However, on the front-end it is presented as raw block device, since obviously iSCSI protocol cant expose anything else.
This exposure requires running CLI commands by PVE over SSH inside the NAS. PVE has only been "taught" how to address specific iSCSI daemons. Apparently, none of those iSCSI daemons are used by the NAS in question.
c) Assuming all of the above was successful, PVE uses basic iSCSI toolset to establish an iSCSI session.

Expanding this scheme to the point of providing all of the components as an appliance (regardless of how it's packaged) will depend on the Proxmox GmbH appetite for becoming a storage company on a side.
The storage veterans, here at Blockbridge, can attest that storage is never as easy as it seems.

Cheers.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Last edited:
  • Like
Reactions: Johannes S and UdoB
@bbgeek17: Language matters - but for what you describe there is a plugin which was referenced above which calls the TrueNAS API and does all the work.
Yes, I am aware of it. That's exactly what I referenced as a 3rd party API plugin. It essentially re-implements built-in PVE ZFS/iSCSI plugin which is non-vendor specific.

Cheers.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Johannes S
@floh8:
- TrueNAS Enterprise
- PetaSAN
- Lustre
- Blockbridge
- LinStor
- StarWind
- Nexenta
- HoustonUI
Not to talk about the hundreds of other shared storage options over NFS, iSCSI, NVMeoF. I think you don't realize what you're asking for when you want synchronously replicated, shared storage, you can think through it quite simple:
- set up two iSCSI targets on two different system, RAID1 them in your OS.
- what happens when one of your nodes goes down, it comes back, the other goes down before the replication is finished
That is what you're asking for. Replication is always snapshot-based, asynchronous and you can never automatically engage an asynchronous replica without someone guaranteeing that your original storage is dead and you are ready to accept the data loss.

If you want shared storage over iSCSI, you can do that with Proxmox today - set up an LXC container or VM (eg. TrueNAS) that shares out iSCSI of a shared fabric (SAS, FibreChannel, Ethernet etc), set up HA, whenever a node goes down, the system starts itself on another node and imports the pool. This is exactly how TrueNAS Enterprise, Nexenta etc works.

The market really is rife with options and many of the above are free or have community editions. Not to speak that the tools to build this are built-in to every major Linux distro.
 
  • Like
Reactions: UdoB and Johannes S
@floh8:
- TrueNAS Enterprise
- PetaSAN
- Lustre
- Blockbridge
- LinStor
- StarWind
- Nexenta
- HoustonUI
Not to talk about the hundreds of other shared storage options over NFS, iSCSI, NVMeoF. I think you don't realize what you're asking for when you want synchronously replicated, shared storage, you can think through it quite simple:
- set up two iSCSI targets on two different system, RAID1 them in your OS.
- what happens when one of your nodes goes down, it comes back, the other goes down before the replication is finished
That is what you're asking for. Replication is always snapshot-based, asynchronous and you can never automatically engage an asynchronous replica without someone guaranteeing that your original storage is dead and you are ready to accept the data loss.

If you want shared storage over iSCSI, you can do that with Proxmox today - set up an LXC container or VM (eg. TrueNAS) that shares out iSCSI of a shared fabric (SAS, FibreChannel, Ethernet etc), set up HA, whenever a node goes down, the system starts itself on another node and imports the pool. This is exactly how TrueNAS Enterprise, Nexenta etc works.

The market really is rife with options and many of the above are free or have community editions. Not to speak that the tools to build this are built-in to every major Linux distro.
No, you are wrong. In my enhencement ticket i forgot o mention whats my Requirements. Thinprovisioning, compression, deduplication, self healing, fail over cluster, GUI, zfs over iscsi or similar, NFS, shared JBOD, Nvme support, zfs over nvme o. Tcp and the most important: an exceptable price tag -> so which of your vendor offer this? -> none....your turn
 
No, you are wrong. In my enhencement ticket i forgot o mention whats my Requirements. Thinprovisioning, compression, deduplication, self healing, fail over cluster, GUI, zfs over iscsi or similar, NFS, shared JBOD, Nvme support, zfs over nvme o. Tcp and the most important: an exceptable price tag -> so which of your vendor offer this? -> none....your turn
Hey, I have a budget for Yugo. I want someone to build me a Porsche 911 equivalent out of parts from AliExpress. Can you help?
 
No, you are wrong. In my enhencement ticket i forgot o mention whats my Requirements. Thinprovisioning, compression, deduplication, self healing, fail over cluster, GUI, zfs over iscsi or similar, NFS, shared JBOD, Nvme support, zfs over nvme o. Tcp and the most important: an exceptable price tag -> so which of your vendor offer this? -> none....your turn
"Exceptable price tag" and many features(especially those missing in other products) tend to conflict with each other. How should such a thing be funded when up to now Proxmox Server Solutions GmbH has already enough in their TODOs for their existing products? I mean their newest product is still in Alpha stage (Datacenter Manager) so I'm assume that their managment prefer their developers to work on that product (which is quite important for many companys migrating from VMWare thus is a save bet in terms of return of investment) .
 
  • Like
Reactions: UdoB
Your requirements are in places contradictory and make no sense (fabric-on-ZFS or ZFS-over-fabric), you can’t have synchronous data over asynchronous systems. But those that aren’t in what you need is in Ceph and TrueNAS, both are plenty performant and scale to many PBs. If Ceph isn’t performant on your hardware, then layering on more software isn’t going to help.

TrueNAS Enterprise is free with the hardware that runs it, if you want to run TrueNAS Scale and put in your own HA, locking and STONITH based on your hardware, you can do that today. And therein lies the problem, any shared hardware storage solution requires tight integration with hardware, that is why ZFS is not a cluster system and Ceph can be rather complicated - that layer you don’t get from hardware needs to be reimplemented in software. In the end any solution you come up with will be Ceph-adjacent and have the same problems.

As I said, there are plenty of options. Lustre, PetaSAN, LinStor and various others have nice GUI on top of cluster software.

The future of storage seems to be NVMoF with a containerized compute layer (Kubernetes) for data distribution, legacy and file sharing protocols. That’s a project that would be worth investigating a GUI, but the development cost would be significant and I’m pretty sure the BlueField DPU style cards you need will take some time before getting to the average homelab.
 
Last edited:
Men guys. Slow down. At first read the ticket carefully and then react. If such a product would impossible or contradictory I wouldn't Suggest it. The one vendor with a solution near to it and with an acceptedable price tag is open-e. But they miss some important features like zfs-over-iscsi and nvme stuff. And for the ceph fan boys: its a really great solution but as I wrote it is slow by design according with solution on the same HW. They could look at the drbd implementation to be faster but they didn't. U need price Intensive storage device and network to be fast. Cephfs is still a lot slower than RBD. There is no data cache function or autotiering. They need 10 years to implement deduplication. In the beginning their developer say that's to hard to implement because of the distributed design.
The most vendors have one of 2 problems. Either they choose a restricted technology or they habe no clever product manager. I think that proxmox have good product manager.
The thing is that all my mentioned features and protocols exists already as open source. I already built such a solution by hand, but i prefere to use a gui and to have better monitoring. Proxmox have already integrated this feature in their existing products excluding NFS server, iscsi target and nvme stuff. The only thing they have to do now is to combine them in a new product. Of course that's much work and they need more man power - no question. But this is solvable. Simply hire them. And I think we are all d'accord when proxmox increases the price of there products they will still be attractive.
 
Last edited: