Sorry to hear about your experience with ZFS. But i think you are underestimating ZFS a bit. Sounds like you lost HDDs due to bad cables which may have made more than permissible HDD failure causing array lost. Same thing would have happened with physical RAID. ZFS is extremely resilient to consider it as Enterprise grade mission critical storage system. Before we moved to Ceph we used to use ZFS for long time. Never had any issue.
Actually incorrect, at least with software raid the mount drops to read only the moment you've lost 1 drive too many.
This gives the opportunity to repair it without damage to data, after getting the drives back online a resync will happen.
ZFS does not do this, it will happily continue on writing garbage to the array.
One network & DC operator CEO told me the same story, tho to him it was just a testing node without anything important and that was the end of the story. This was actually near, being the neighbouring rooms were his DCs
Yes our target market is completely different than yours. For us data safety and redundancy comes above anything. Given the size of our cluster, nature of our customer data and need to keep historical data , replica 3 is very much acceptable. We also have 3rd ZFS+Gluster setup for data cold storage which is completely offsite. As you can tell from my signature we have a cloud business, anything we use goes through months of tests before we put it in production.
There are several experts in ZFS in this forum who can give you even greater details on ZFS mechanics. Mir is one of them who i know.
ZFS unfortunately is for me end of the story, even if the idiocracies is fixed, the design is faulty (activating all disks for single I/O) and for bulk of my needs, not suitable, only for backups. But because of the aforementioned issues, i think i'd be happier just doing plain ol' software raid + ext4, i know i will sleep better at least. Too many nights ZFS has ruined my sleep, i remember very well a 2 week sprint 24/7 trying to recover from ZFS caused issues ...
If data safety doesnt matter at all, then i think i should go with gluster or ZFS+gluster. Very low initial cost and it just works. You already have experience with ZFS, so you already know.
Due to the mentioned performance issues i would not put our customer data on ZFS, for that reason alone.
OK Good, thank you for clarifying that, it has been a big question for meYes, if the ceph cluster goes down all at once or within few minutes of each nodes and they are rebooted , ceph is able to do its own check and bring cluster back to healthy status.
It's only on the case of total power outage.
About the UPS, the way your customer are you are saying, you can get away without any protection at all including UPS. If up time not important, just let all nodes shutdown. Of course you will not be able to gracefully shutdown your server which could be bad. You can also modify a cheap UPS and add some batteries to it to give you just enough time to shut down everything properly.
Already have been considering this, but usually then the maximum power output becomes an issue. In any case, when we invest we will probably go for a rack sized unit sourced from china. Surprisingly cheap, and has been confirmed to be of high quality standards! Cheap enough for me not to consider an DIY solution
With IPoIB you will never get full bandwidth. With enough tweaking you can push close to 20gbps. It is mainly because IPoIB overhead. But thats 20gbps at much less cost than 10gbps ethernet.
We use 36 port Mellanox IB switches and dual port Mellanox ConnectX-3 cards.
OH i would totally have expected it to be able to push around 32Gbps mark! Good to know!
**EDIT EDIT: Wrong window submit