Here's the thing.
Not all data is equivalent, and shouldnt be treated so. While I dont have any details about your particular use case, there is nothing stopping you from deploying a NAS with 20TB usable in a raid6 configuration, and park all your "idle data" there, leaving your "production" storage need much smaller and easier (cheaper) deployable via ceph.
That's absolutely a valid strategy. Most of the Data that's idle is SMB Shares with files for daily work. So it's not totally idle, just not used very much, but still used. Lots of that is documents and invoices in our document management system and financial accounting.
There is not much of a need for big performance, so Ceph on HDD is totally fine. But since this is very important archival data, we really like the hardiness of ceph in regards to disk or node failure. (Yes there is an offline Backup on tape
)
With our old NAS we already had instances of 2 failed disks at once on that RAID6 system (after some road contruction took place and the whole building was trembling from time to time) A few months after that, one disk after another failed, 6 of 12 in total. That was very very scary.
If you can (or would like to) describe the configuration, performance expectation, and measure performance it may be possible to tune it better.
Not everything can be solved by throwing more hardware at. As I hinted above, configuration and elimination of bottlenecks would likely yield results even without that (and possibly better.)
Our Server Cluster was never really designed to be a hyperconverged setup. The cluster is about 7-8 Years old and the concept of hyperconverged, virtualised systems hadn't really reached us back then.
We have been planning for a new cluster for a while now and hopefully next year we will get to order the hardware.
We will have Terminal Servers for about 150 users, File Storage, Databases, a Mailsystem, Application Servers, everything really.
I'm not quite sure how i can describe the configuration and performance expectation, because I'm not sure which metrics might be helpful.
We run a pool of 24 mixed SAS/SATA HDDs 7200rpm distributed on 4 of the 6 cluster nodes. The other nodes don't have space for a DB/WAL SSD.
Our SSD Pool consists of 12x 1.6TB Kyoxia PM5V SSDs SAS 12Gb/s evenly distributed on all 6 Nodes.
Nodes are connected via a Ceph dedicated 10G Ethernet, a dedicated 10G network facing the users, and a dedicated 1Gbit Network for Corosync.
On the HDD pool we become bottlenecked when IO reaches around 1200-1500, sometimes 1800 IO/s. Read and write performance greatly scales with parallelity, which is to be expected. For running applications, Databases or fetching a small document, I guess access latency is the most important. But I don't really know how to measure and compare this.
Sure, i can do a test with fio, and i can see OSD Commit Latency. But I don't know if a latency of around 30-50ms yay or nay.
One of the biggest pains in this setup is the slow speed for backups. There are some locations with hundreds of thousands of files 4k-1MB in size and on the SMB shares, there are even over 1million files of various sizes, scattered all around.
These take very very long to tape. We get speeds between 24-70MB/s whereas databases and other large blobs from the SSD pool come in at about 180MB/s (I think thats the limit of the LTO 8 Tape Library) Blobs from the HDD pool come in at around 50-70MB/s.
Thanks very much, for taking a look at this.