If you want a technical discussion, you need to do it on technical terms.
I have.
In fact, the data started flowing with my second comment on this thread. You're only
just getting around to this
now.
Moreover, you're benchmarking 1 thing (Windows?) which, if you know anything about storage - Windows NTFS always tells the subsystem it is doing an async write (yeah, NTFS is bad for your data), whereas with Ceph + QEMU any write is sync.
I'd be able to run more tests if ceph was faster. (Heck, as of this writing, it finally
just finished rebooting from the
first round of Win11 post-install updates.)
The drives themselves are capable of 100 MB/s each. Clearly, ceph isn't using anything
close to this capability.
You can find some benchmarks from 2013 when Ceph was really young that already showed Ceph already winning various major points against Gluster, IOPS, latency but not throughput necessarily.
Are you talking about ceph
replication or erasure coding? (Again, when people talk about ceph,
most of them talk about ceph in the context of replication, and
not in the context of erasure coding.
On ancient hardware, there is all kinds of reasons modern code won't do well. On consumer hardware even worse. Consumer hardware has a tendency to lie about data consistency as well and (especially SSD) will cache writes 'by default', Ceph (like ZFS) tends to avoid those problems even on consumer hardware.
Again, as I've said, you can
literally run your own calculations as to how much of the drive you're
actually using for
any system/drives that you have access to where you're able to run your own benchmarks.
You don't have to take my word for it. Deploy your own EC(k,m) ceph pool and then you can test it yourself and again, this isn't rocket science. You can literally run the calculation yourself and see for yourself.
I'd
love to see your results from your own tests on your own hardware on your own EC(k,m) ceph pool.
Again, a lot of words expended, but still nothing that actually talks to nor speaks about the fact that a ceph EC pool only uses about 5% of a drive's capabilities.
Again, if you watch the video from 45Drives, the best that they're able to get is about 8% of the drive's capability and they have access to newer hardware and even then, it can still only muster just 8% of the drive's performance capability.
Again, run your own tests and then calculate how much of the drive's capability does your own ceph EC pool use from your drives.
It's super easy.
So this is a consistent trope on this forum that ZFS and Ceph is 'slower' on consumer hardware than let's call it "naive storage" because "naive" benchmarks show a big (often unrealistic) gap. You see the opposite on real server hardware though. But Gluster is built really for hardware RAID with BBU, an open source alternative to the GPFS and other proprietary SAN systems, whereas Ceph (and ZFS) was literally invented to manage disks directly on cheap hardware, without mediator because around the early 2000s professionals started noticing from real life disaster stories that proprietary/hardware/software RAID, even with BBU cannot be trusted and does not provide the data guarantees and does not scale. I can tell you, Gluster won't scale past ~8 nodes, it will choke during rebuilds at today's scales (several TB), your data won't be consistent and available at all times in real world scenarios when things go wrong, being offline while a brick gets rebuilt is not great for business.
Yes and no.
1) You're basing your opinion from your experience from you said 2000s-ish timeframe, right? (if it was 15 years ago, then it would be ca. 2011.) So that begs the question, have you ever tried it
since then or is your opinion still based on 15-year-old data?
(That'd be like if you were to base your opinion of ceph from 15-year-old ceph. But that's not what/how you're getting your opinions about ceph. In other words, you're comparing ceph now vs. gluster 15 year ago, right?)
2) a) ceph EC has a synchronisation overhead which is no different than MPI overhead as the number of processes increases. You can literally look up any HPC MPI scalability plot as a function of the number of CPU cores and you will find that going from 4096 cores to 8192 cores
won't double your performance or cut your total wall clock run time in half.
b) Thus, one way that you can "mitigate" this synchronisation overhead when you have a lot of nodes and/or OSDs, is you limit the number of nodes/OSDs you're trying to synchronise/maintain concurrency. Hybrid OpenMP/MPI solved this (for LS-DYNA) something like at least a decade ago.
This is no different. Therefore; if you have a ceph OSD that's comprised of a ZFS pool, then it's the same thing that gluster recommends for a deployment, except that gluster figured this out whenever gluster first published gluster on ZFS.
3) You state that gluster is having performance issues ca. 15 years ago. ceph, especially EC ceph pool, is having performance issues
now.
5% of a HDD that's capable of 150 MB/s sequential read = 7.5 MB/s.
5% of a U.2 or E1.S EDSFF NVMe 5.0 x4 SSD that's capable of 12 GB/s is only 600 MB/s. And whilst yes, 600 MB/s is faster than 7.5 MB/s, in both cases, you're still only using 5% of what the respective drive is capable of.
Buying a U.2 or E1.S EDSFF NVMe 5.0 x4 SSD is just throwing money to mask the fact that an EC ceph pool is only using 5% of the drive and the order of magnitude that you're paying more for said U.2 and/or E1.S EDSFF NVMe 5.0 x4 SSD would only barely move the needle from ~5% utilisation to ~8% utilisation (but it costs more than an order of magnitude more).
It's still only 5-8%.
(to be continued...time for me to go put the kids to bed)