Hi,
First off, I'd recommend for you to read at least the first two sections of our Ceph documentation, if not done already:
https://pve.proxmox.com/pve-docs/chapter-pveceph.html
Second, in general it is not really useful to provide some general advice to introduce some additional caching layer, those things often show only benefits in very specific applications. Just adding another cache, and thus layer of indirection, may help on some small or specific setup limitation of the underlying storage, but they cannot work magic and make a slow snail a fast speedster, the cache will be full at some time and then the back-pressure to writing/reading to/from the underlying storage is the limiting factor again.
Ceph and Linux already have battle-proofed and widely used file caching mechanisms for general IO, that should cover most needs, and when not it often means that the underlying storage limitations are of issue.
Don't get me wrong, caching can help, but there's already some present in the layer you ask for, and so I'd either recommend checking for either a cache in front of your application layer above, e.g., redis if it's some web app, or actually make the underlying storage faster.
In general there are two key points to make a ceph setup go fast: cluster network (where it replicates and exchanges data internally) and OSD disks:
- ceph backend network, 10G is here really recommended as lower limit, with three nodes you can do full-mesh with three dual port NICs, so even 25G or 40G would be possible without an additional/expensive switch.
- disk models for OSDs. SSD is key here, and not too expensive any more nowadays. In any way, ceph works better rather more, but mid-size (say, 1 TiB to 7.68 TiB), SSDs than a smaller number of huge spinners, improves power usage, redundancy, parallel access (so IOPS) and reduces the data amount affected by failure (and thus in need for rebalancing), cost may be higher though (but IMO worth if, especially if that setup should last 5+ years).
Here it's good to remember that Ceph is very scalable, so you do not need to put in all the storage you expect the setup to need in the next years now.
Note, you can also have a mixed ceph cluster, disk-wise, and create two pools with different rules for where to place the data objects, meaning that one pool can be told to use only SSDs and the other only HDDs, so you could have a faster smaller and slower bigger pool, if that fits your application.
For more concrete advice it would be good to know some key data for your planned usage:
- What workload will run on cluster
- how many VMs/CTs,
- what is their job, basic office infrastructure, hosting lots of databases, dev-ops VMs for developers, ...
- What's the data usage plan
- initially
- Estimated growth rate for X years?
- What are key limitations, fixed factors when choosing HW
- Is the server HW already there and fixed, or are you still in the planning phase
- Are disks already fixed or is there still the option to choose any
- ..?
Hope that helps.