async writes is a function of the filesystem and not of RSF-1. RSF-1 brings the functions of load balancing and HA to zfs. zfs make the filesystem stuff. But you didnt comprehend the meaning of async writes from my prospective. async write are never persistent for all file system also not for ceph. Thats why zfs offers slog for sync writes. And of course slog is HA compatible.I haven't found the information on the RSF-1 webpage about to ensure ALL writes to be consistent on the JBOD drives. Synchronous writes work via multiple JBOD slog devices, but asynchronous ones are not immediately written down to the disk, so acknowledged asynchronous writes can be lost if you unplug the power to the machine. This is the main problem with any ZFS HA implementation I've ever seen and I'm curious on how they solved this if at all.
The question of fail over time was also in my interests. RSF-1 need max. 16 sec is the statement from RSF-1. My tests with nothing shared RSF-1 Cluster show 10 sec, but without load. Therefor RSF-1 is slower than a hardware vendor, but the normal file sytem time out stay at 30 sec. So this is in limit.You will also have higher latencies in a failover case as with any other storage solution I've ever seen, e.g. in enterprise SANs, in which you share also the cache itself between controllers.
I read in your PDF that you value ZFS-over-iSCSI faster than Ceph (fast as in performance or slower as in IOPS), which may be true with SAS/SATA, but not true with NVMe. Setting preferred reading on local disks on ceph, you will outperfom the network, which is the bottleneck in any setup, especially with iSCSI and not having multiple 400G links.
With ceph NVME and local read tuning you can be faster than network of course, but you need expensive NVME for the hole storage. But write performance is always slower because of the ceph design. Each write have to be acknowledged by every ceph node. U could configure local persistent write cache, but to make it high available u need technology that ceph not offer or additional IT know how to implement it and with a HDD-pool u get no read cache. So with zfs-over-iscsi u can have both write and read cache also for HDD-pool and that fast over iscsi and possibly with rdma if that works with libiscsi. Even if u compare nvme ceph with high available write cache and read tuning with zfs-over-iscsi with nvme r/w cache than both will be quite the same in performance but nevertheless in functionality zfs-over-iscsi beat ceph. See block size and deduplication.
Yes, but expensiveAre there multi-chassis/multi-port NVMe available yet?
NVME is the norm for ceph but "ceph NVME" not for storage.NVMe CEPH is the norm nowadays
This is stupid with 3 Ceph nodes, because if a replicated node fail u only have 1 data copy.You can also have 2/3 copies on ceph, if you use erasure coded pools.
Last edited: