Proxmox+Ceph and a caching mechanism

thedarkside

New Member
Apr 24, 2021
2
0
1
29
Hi,

I'm totally new to Proxmox and Ceph. What I want to achieve is a 3 server configured with Proxmox and with Ceph used as the distributed storage platform. However, my main goal is to use one of the available open-source caching mechanisms out there (such as EnhanceIO, OpenCAS, Bcache, ...) as the I/O cache to improve performance. Now I'm wondering, where should this mechanism be implemented? My guess is that it should be inside Proxmox itself because that's the one dealing with I/O requests coming from application layer, right? Or should I do it on VM side? Or maybe inside Ceph (I think it can't be inside Ceph cause it has its own cache thing)?

I would appreciate your help and sorry if it's a very basic and obvious quetion.

Thanks
 
Hi,

First off, I'd recommend for you to read at least the first two sections of our Ceph documentation, if not done already:
https://pve.proxmox.com/pve-docs/chapter-pveceph.html

Second, in general it is not really useful to provide some general advice to introduce some additional caching layer, those things often show only benefits in very specific applications. Just adding another cache, and thus layer of indirection, may help on some small or specific setup limitation of the underlying storage, but they cannot work magic and make a slow snail a fast speedster, the cache will be full at some time and then the back-pressure to writing/reading to/from the underlying storage is the limiting factor again.
Ceph and Linux already have battle-proofed and widely used file caching mechanisms for general IO, that should cover most needs, and when not it often means that the underlying storage limitations are of issue.

Don't get me wrong, caching can help, but there's already some present in the layer you ask for, and so I'd either recommend checking for either a cache in front of your application layer above, e.g., redis if it's some web app, or actually make the underlying storage faster.

In general there are two key points to make a ceph setup go fast: cluster network (where it replicates and exchanges data internally) and OSD disks:
  • ceph backend network, 10G is here really recommended as lower limit, with three nodes you can do full-mesh with three dual port NICs, so even 25G or 40G would be possible without an additional/expensive switch.
  • disk models for OSDs. SSD is key here, and not too expensive any more nowadays. In any way, ceph works better rather more, but mid-size (say, 1 TiB to 7.68 TiB), SSDs than a smaller number of huge spinners, improves power usage, redundancy, parallel access (so IOPS) and reduces the data amount affected by failure (and thus in need for rebalancing), cost may be higher though (but IMO worth if, especially if that setup should last 5+ years).
    Here it's good to remember that Ceph is very scalable, so you do not need to put in all the storage you expect the setup to need in the next years now.
Note, you can also have a mixed ceph cluster, disk-wise, and create two pools with different rules for where to place the data objects, meaning that one pool can be told to use only SSDs and the other only HDDs, so you could have a faster smaller and slower bigger pool, if that fits your application.

For more concrete advice it would be good to know some key data for your planned usage:
  1. What workload will run on cluster
    • how many VMs/CTs,
    • what is their job, basic office infrastructure, hosting lots of databases, dev-ops VMs for developers, ...
  2. What's the data usage plan
    • initially
    • Estimated growth rate for X years?
  3. What are key limitations, fixed factors when choosing HW
    • Is the server HW already there and fixed, or are you still in the planning phase
    • Are disks already fixed or is there still the option to choose any
    • ..?
Hope that helps.
 
  • Like
Reactions: herzkerl and Tmanok
Hi,

First off, I'd recommend for you to read at least the first two sections of our Ceph documentation, if not done already:
https://pve.proxmox.com/pve-docs/chapter-pveceph.html

Second, in general it is not really useful to provide some general advice to introduce some additional caching layer, those things often show only benefits in very specific applications. Just adding another cache, and thus layer of indirection, may help on some small or specific setup limitation of the underlying storage, but they cannot work magic and make a slow snail a fast speedster, the cache will be full at some time and then the back-pressure to writing/reading to/from the underlying storage is the limiting factor again.
Ceph and Linux already have battle-proofed and widely used file caching mechanisms for general IO, that should cover most needs, and when not it often means that the underlying storage limitations are of issue.

Don't get me wrong, caching can help, but there's already some present in the layer you ask for, and so I'd either recommend checking for either a cache in front of your application layer above, e.g., redis if it's some web app, or actually make the underlying storage faster.

In general there are two key points to make a ceph setup go fast: cluster network (where it replicates and exchanges data internally) and OSD disks:
  • ceph backend network, 10G is here really recommended as lower limit, with three nodes you can do full-mesh with three dual port NICs, so even 25G or 40G would be possible without an additional/expensive switch.
  • disk models for OSDs. SSD is key here, and not too expensive any more nowadays. In any way, ceph works better rather more, but mid-size (say, 1 TiB to 7.68 TiB), SSDs than a smaller number of huge spinners, improves power usage, redundancy, parallel access (so IOPS) and reduces the data amount affected by failure (and thus in need for rebalancing), cost may be higher though (but IMO worth if, especially if that setup should last 5+ years).
    Here it's good to remember that Ceph is very scalable, so you do not need to put in all the storage you expect the setup to need in the next years now.
Note, you can also have a mixed ceph cluster, disk-wise, and create two pools with different rules for where to place the data objects, meaning that one pool can be told to use only SSDs and the other only HDDs, so you could have a faster smaller and slower bigger pool, if that fits your application.

For more concrete advice it would be good to know some key data for your planned usage:
  1. What workload will run on cluster
    • how many VMs/CTs,
    • what is their job, basic office infrastructure, hosting lots of databases, dev-ops VMs for developers, ...
  2. What's the data usage plan
    • initially
    • Estimated growth rate for X years?
  3. What are key limitations, fixed factors when choosing HW
    • Is the server HW already there and fixed, or are you still in the planning phase
    • Are disks already fixed or is there still the option to choose any
    • ..?
Hope that helps.
Thank you for your very thorough response.

Yes, I'd had a glance at the link you provided but it won't hurt to go over it again (I'm also studying ceph documentations more carefully). I was also partially aware that introducing a cache layer in this configuration may not do much of a help in general, because of the optimizations we already have in different layers (esp., ceph itself). But the thing is that, for my work, I have to use this caching mechanism, maybe even to prove its ineffectiveness. Also, my configuration is somewhat fixed in terms of server and network. The only thing is the capacity of my storage, where I will use HDD as the disk subsystem and SSD as cache layer.

The thing that caught my eyes in your response was the mixed ceph cluster. I'm now wondering, technically, I will be able to introduce two different ceph clusters, one comprised of HDDs and the other SSDs, and use one as the cache for the other, right? Is this possible in your opinion?

Thanks
 
I do need to use nvme as a cache for HDD drives for some reason.
I've previously used the SSD + HDD hybrid disk architecture in VMware VSAN, which has given me a huge read and write performance boost. So I hope to find a corresponding solution in the Ceph store.
Thanks
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!