Shared Remote ZFS Storage

To me it looks like a ceph cluster (as hyper converged on pve or separate) generates much often and heavy trouble - as seen here in threads every week new - as any nas server while a ha-nas like a netapp/isilon just has a problem when power is off which can be serviced with a metro (2nd coupled) installation while switches run with emergency power. In a time range of 15y I've seen 5 ha systems crashed at 1 customer and checkup and service/data repairing such a system which was in chaos takes a bunch time. Sometimes it would be just better to stop a service, check and run otherwise than running into an increasingly troubles to data while ha thinks it can any further do it's job but it don't.
 
In these cases waltar, people usually don't read enough, and afterwards don't test before pushing to production, and then write to forums with cries for help. CEPH is really battle-tested software, and i've worked with numerous companies who would never leave it. But on the other side, i've also worked with companies who had really bad ceph experience(the network part) ,and they moved usually to single node PVE.
As always, get a good engineer(hyper+network) and you should aleviate all problems.
 
We actually have more trouble with our Isilon (now PowerScale) than Ceph. Both have cluster availability of 100% over the past few years but we regularly run into bugs on individual nodes on the PowerScale during provisioning. Currently on a 2 week bender with Dell on their remote support feature not working properly and we are talking to HPE about hardware because of support issues. But we’ve had issues with non-standard NFS and SMB, performance issues that are unclear and undocumented and obviously hardware issues.

Problem with proprietary stuff is that *you* can’t fix it, Ceph, any issue I can get fixed in hours, I can read the source code and understand the problem and fix it myself if necessary, the Isilon is FreeBSD and a lot of it is readable Python or other code but it is not running said BSD in a standard in any way and most of it is not documented.

As far as hardware failures, they do happen, I’ve seen VMware clusters go under because a single proprietary SAN controller goes down and then the blame game begins between several vendors. Luckily most VMware shops are now going to KVM+Ceph in some fashion.
 
  • Like
Reactions: Johannes S
Yeah, the Isilon/PowerScale OS is a little be funny, you can change nfs server options which are active as confirmed and functioning on client ... but reset again to default when doing a nfs service restart (all on webui) - who the hell programmed that and what talk the customer about to ?!? You should even manipulate customer DNS service to integrate into a isilon but estimated is/should be the isilon integrate into customer network and not the opposite. Isilon file service reads tends to slow and isn't really an often platform for virtualization. Special switch backend and even special at node growing.
HA-nfs works really cool at Isilon but in summery it's a special with pros and cons to deal with.
 
  • Like
Reactions: Johannes S
In these cases waltar, people usually don't read enough, and afterwards don't test before pushing to production, and then write to forums with cries for help. CEPH is really battle-tested software, and i've worked with numerous companies who would never leave it. But on the other side, i've also worked with companies who had really bad ceph experience(the network part) ,and they moved usually to single node PVE.
As always, get a good engineer(hyper+network) and you should aleviate all problems.
They also have empty wallets.....
 
Yeah, the Isilon/PowerScale OS is a little be funny, you can change nfs server options which are active as confirmed and functioning on client ... but reset again to default when doing a nfs service restart (all on webui) - who the hell programmed that and what talk the customer about to ?!? You should even manipulate customer DNS service to integrate into a isilon but estimated is/should be the isilon integrate into customer network and not the opposite. Isilon file service reads tends to slow and isn't really an often platform for virtualization. Special switch backend and even special at node growing.
HA-nfs works really cool at Isilon but in summery it's a special with pros and cons to deal with.
I ran NetApp for years, never had an issue. You get what you pay for, with ceph it feels more like you get half.... Spend twice as much on network, compute and memory just to get it working correctly.
 
https://github.com/xrobau/zfs

That builds a NFS server from a standard Ubuntu machine. iSCSI is making it massively overcomplicated and far harder to back up and disaster recovery.

You can also tie it in with https://github.com/xrobau/zfs-replicate which sets up automatic replication jobs between servers.

This is not terribly complicated, which is why I suspect it's not a commercial product!
 
That builds a NFS server from a standard Ubuntu machine.
That's never been an issue. making a iscsi host is relatively trivial, and using something like Trunas quite beginner friendly.

You can also tie it in with https://github.com/xrobau/zfs-replicate which sets up automatic replication jobs between servers.
Thats good for disaster recovery, not so much for HA (high availability.)

This is not terribly complicated, which is why I suspect it's not a commercial product!
Without guest quiescence and application management it's also not particularly dependable. consider that anything that is in the guest's RAM cache and not committed to disk will not be on the sent snapshot, leaving files in open/partial/outdated state. Its not a commercial product because its not of commercially required quality.
 
  • Like
Reactions: jtremblay
Without guest quiescence and application management it's also not particularly dependable. consider that anything that is in the guest's RAM cache and not committed to disk will not be on the sent snapshot, leaving files in open/partial/outdated state.
As someone who *has* written and managed a real HA solution (I do VoIP), I assure you that you are 100% in the wrong market.

If you are caring about 'never losing a transaction regardless of what fails', you need to be looking into IBM Z-Series mainframes or the equivalent. What you want has (effectively) three VMs running in lockstep, and writing the data to 4 different places, with every write call not returning until it's committed to at least three.

Luckily for us, most people in the real world are happy for a 500 error and to click 'refresh' to finish what they're doing. A great example of how this works in the real world is Netflix's "Chaos Monkey" which is more than happy to instakill machines and even whole clusters - without writing anything in memory to disk.

The reason behind using a standard distro is so that nothing is locked behind a confusing UI, and you're going to be running exactly the same packages a million other machines are using.
 
If justification for the product request is what's missing go to reddit and search r/proxmox for "shared storage". A good portion of the posts are from people in the same situation, looking to "sandbox" post VMware configurations for smb sized installations...
 
A Dell ME5024 (or any PowerVault system) stuffed with disks would cost in the neighborhood of $75k in HA mode with FC or iSCSI once you include all the licenses and management. Not much difference than buying a few extra nodes with disks.

There is nothing preventing you from doing that today, Proxmox supports it, so I don't see why you would need Proxmox to build a special interface, just go into datacenter -> storage -> add -> iscsi
Just $75k, lololol
 
Just $75k, lololol
Not as an endorsement, but... 75k is only if you buy it full of the most expensive drives dell will sell you. and they ARE expensive. HOWEVER... an enclosure with 2 controllers RETAILS around 15k, and depending on your relationship with dell can be had new for about half that. As for drives- you'd need to buy those no matter what. I'll leave this here:

https://www.dell.com/support/manual...350066-6d55-4b5c-bdb3-6f8b18c01549&lang=en-us

Here's the thing. When you buy a STORAGE product, it goes through a LOT more design, development, and testing then nearly all other components, because there is more liability attached to the failure of a storage device- any other device failing is an inconvenience by comparison. Consequently, having someone stand behind such a product is a lot more expensive for the provider- it stands to reason that cost has to be borne by the end user. If you're ok providing your own engineering and support- you have that option too.
 
Yeah, for 75k you are in the ME5084 price class having aound that first enclosure full with 84x 14 TB HDD's.
 
  • Like
Reactions: Johannes S
in light of the discussion here, I am looking at using "Syncthing" to get two truenas machines to stay very close to, if not immediately in sync...
 
Hello,

Thanks everyone for mentioning StarWind. I wanted to clarify couple things. StarWind VSAN have had Linux version available for years.

As far as I can see, it's primarily hosted on Windows and has a community limit of 10 tb, and isn't easy to get.
You can get it from our website by requesting the version for Proxmox. https://www.starwindsoftware.com/starwind-virtual-san#download
You will receive the download link and key after submitting the form.

There is no 10TB limit for free version. In addition, free version for KVM doesn't have limitations. https://www.starwindsoftware.com/vsan-free-vs-paid

As for I/O overhead, of course there is some due to increased data path. However, if you want to be able to squeeze high I/O, we recommend using PCIe Passthrough of storage devices for StarWind CVM. Here is an example of performance we achieved in our lab. https://www.starwindsoftware.com/bl...wind-vsan-proxmox-hci-performance-comparison/

FYI, we regularly check our solution with Proxmox in our QA cycle.

Feel free to reach me out, if you have any questions.

Best regards,
Alex
 
  • Like
Reactions: Johannes S