Intel CAS/Open CAS Linux and Proxmox/ceph

Aug 4, 2020
13
2
23
63
I'm being told to install and configure some Intel Optane in some Proxmoxen, and some how incorporate it as disk cache in a 4-node cluster. There is a rather large mssql server in there which I believe is the primary target management is wanting to improve.

I've been doing some searching and I'm not finding any references to anyone installing and using Open CAS Linux or Intel CAS in a proxmox environment. The Intel CAS Administration documentation has a section on using it with generic ceph, which is the storage we're using in this existing cluster.

Has anyone tested using Open CAS Linux for caching ceph storage under Proxmox?
 
I'm being told to install and configure some Intel Optane in some Proxmoxen, and some how incorporate it as disk cache in a 4-node cluster.
Ceph can use a separate DB/WAL to host its OSD database on the faster device. This will improve performance on the Ceph's side. Besides that, please elaborate more on your cluster setup.

There is a rather large mssql server in there which I believe is the primary target management is wanting to improve.
Well, how much performance is needed for that database? Often database servers can be tuned in its own right.

I've been doing some searching and I'm not finding any references to anyone installing and using Open CAS Linux or Intel CAS in a proxmox environment. The Intel CAS Administration documentation has a section on using it with generic ceph, which is the storage we're using in this existing cluster.
In short, I can't recommend it.

The installation contains the build of a separate kernel module, to be able to use the CAS device. Proxmox VE packages are in a rolling release, where the kernel is rather often updated. This will introduce more complexity and another failure domain. Besides that it is something outside of the Ceph ecosystem that has to be maintained.
 
open cas linux is 100% compatible with PVE 6.3.3 and it is very easily to compile and can be added to dkms.conf to auto compile

I don't see any issue with using it with pve+ceph 15
1613723171027.png
1613725274854.png
 
Last edited:
  • Like
Reactions: eduncan911
I agree @elurex . Installing kernel modules to build on dkms/kernel updates is quite easy. Given, it's out of scope for Proxmox and their support contracts, it's just Debian under the hood. So it comes down to rather you want Enterprise Support, or not really.

There's now been some testing with OpenCAS and Ceph:

https://01.org/blogs/tingjie/2020/research-performance-tuning-hdd-based-ceph-cluster-using-open-cas

To the OP (which I am sure has come up with a different solution by now), or for anyone wanting performance out of MySql and ZFS, note that MyISAM uses a 8k block size and InnoDB uses a 16k block size for data and 128k for logs. So set your ashift to at least 13 (8k) or even 14 (16k) for InnoDB.

And use the Optane drive as a massive ZFS ZIL SLOG performance boost. lol
 
Last edited:
Has anyone had any success or got any pointers for installing OpenCAS on Proxmox 7?

I'm trying to run some testing with it in my lab setup, on the no-subscription repo, latest version of proxmox.

I've installed all the prerequisites (I think) - commands run so far are:

apt install sed make gcc pve-headers python3 lshw libelf-dev python-argparse git
git clone https://github.com/Open-CAS/open-cas-linux
cd open-cas-linux
git submodule update --init
./configure

And at this point I get the error:
ERROR! Following steps failed while preparing config:
1_bd_first_part.conf

Any ideas what I'm missing or if there's another way that works?
 
Has anyone had any success or got any pointers for installing OpenCAS on Proxmox 7?

I'm trying to run some testing with it in my lab setup, on the no-subscription repo, latest version of proxmox.

I've installed all the prerequisites (I think) - commands run so far are:

apt install sed make gcc pve-headers python3 lshw libelf-dev python-argparse git
git clone https://github.com/Open-CAS/open-cas-linux
cd open-cas-linux
git submodule update --init
./configure

And at this point I get the error:
ERROR! Following steps failed while preparing config:
1_bd_first_part.conf

Any ideas what I'm missing or if there's another way that works?

Review the contents of config.out; the error is from the 1_bd_first_part.conf script in ./configure.d, and it suggests your host is lacking some header files. Devs tend to forget not every production server has a complete set of kernel header packages installed.
 
Contents of config.out look like the below.

Given that pve-headers is installed any suggestions what I should try next?

Is there another kernel header package?

Code:
1_append_bio.conf 2
1_bdev_nr_sectors.conf 1
1_bdev_lookup.conf 1
1_bdev_whole.conf 1
1_bd_first_part.conf X
1_bdget_disk.conf 2
1_bio_dev.conf 1
1_bio_clone.conf 3
1_bio_discard.conf 2
1_bio_err.conf 1
1_bio_flags.conf 1
1_bio_iter.conf 1
1_bio_gendisk.conf 2
1_bio_split.conf 1
1_biovec.conf 1
1_blk_mq.conf 2
1_blk_status.conf 1
1_blk_end_req.conf 1
1_deamonize.conf 2
1_block_pc.conf 2
1_dentry.conf 1
1_discard_zeros.conf 2
1_err_no_to_blk_sts.conf 1
1_hlist.conf 1
1_global_page_state.conf 1
1_flush_flag.conf 3
1_kallsyms_on_each_symbol.conf 2
1_inode.conf 1
1_make_request.conf 3
1_mq_flags.conf 3
1_munmap.conf 3
1_queue_bounce.conf 2
1_queue_flag_set.conf 1
1_queue_chunk_sectors.conf 1
1_reread_partitions.conf 1
1_queue_lock.conf 2
1_queue_limits.conf 2
1_set_submit_bio.conf 1
1_timekeeping.conf 1
1_vfs_ioctl.conf 1
1_submit_bio.conf 1
1_vmalloc.conf 1
1_write_flag.conf 1
1_wtlh.conf 1
2_bio_barrier.conf 1
2_generic_acct.conf 1
2_bio_cmpl.conf 2
2_make_req.conf 2
2_queue_write.conf 1
 
I checked one of my production hosts, and the two .h files that script is looking for are both present. There may be an issue with that script experiencing a compilation error with the test. I'd try enabling verbose or debug output on the configure command and rerunning it.

Double-check the pve-headers package you have installed matches the 'uname -a' value for the kernel that's currently running on your build host.

I haven't actually attempted to build CAS on proxmox given the enterprise support we pay for. I'm just pulling on my old dev roots here.
 
No, but I did do a whole bunch of testing with bcache, lvm cache and lvm write cache on a ceph lab setup. Haven't quite got round to writing it up yet and posting...

If you do have success getting opencas working would be very happy to add that to the benchmarks.
 
No, but I did do a whole bunch of testing with bcache, lvm cache and lvm write cache on a ceph lab setup. Haven't quite got round to writing it up yet and posting...

If you do have success getting opencas working would be very happy to add that to the benchmarks.

Hello from the future! 2024... did you ever get around to posting and sharing your findings?
 
Hi Giovanni

No, ultimately I never posted my findings, as there are too many variables at play when combining caching with Ceph, and even setups that benchmarked well didn't actually perform particularly well in our real world usage.

We did run with bcache backed OSDs for a bit, it was fine when you were dealing with hot data, but cold data could be painfully slow - ultimately this came down to how ceph works rather than an issue in the caching approach, and it took me a while to understand this.

The key thing to understand about Ceph is that while you might have 3 or 4 copies of every block of data on your network, Ceph will never use these additional copies to speed reads along - each block of data will be read from the particular OSD that ceph has chosen as the primary for that block.

If all of your OSDs are stored on hard drives, that means best case, a single threaded sequential read of data will go at the speed of one single hard drive - even if your data is spread across dozens of them with multiple copies. This is very much contrary to what you might be used to using RAID 1 or RAID 5/6 disk arrays, where the additional copies are used to speed up reads.

It actually gets worse than that though, because while the data you are reading might be one single large file for example, ceph might have spread this across several OSDs (and it is somewhat random which of these OSDs is chosen as the primary OSD for that block of data) - so partway through making your large read you might need to wait while ceph seeks on a different hard drive to get your next block of data.

What this all translates to real world was that if we were copying off a rarely accessed video file from our ceph store, the speed would often run at around 50 megabytes per second sustained - less than half the performance of one of the underlying drives.

This wasn't really good enough!

The solution I settled on was;
  • Instead of an OSD for each hard drive, hard drives were paired into RAID 0 arrays using LVM, then each of those two drive arrays was setup as an OSD - with WAL on a separate enterprise grade SSD with PLP. This means that our worst case performance was now the speed of two hard drives in RAID 0, instead of just one hard drive. Obviously this is marginally less safe/discouraged etc but has proved a good compromise in our setup, and we have survived a few hard drive failures at this point without issue (we run 4 copies/2 minimum on our ceph storage)
    • This had the added benefit of wasting less RAM on our servers - Ceph is quite greedy in the RAM it will assign to each OSD....
  • On one of our four servers (we have four proxmox/ceph servers, plus one witness server) we replaced the hard drive storage with pure SSD storage - so we have 8x3tb Hard drives in three servers, and 3x7.6tb SSDs in one server. I set the affinity on the SSD hosted OSDs so that ceph would always treat them as primary - this means that reads will always happen at SSD speed so long as this server is up and running.
If you have any further questions I'll do my best to answer but hopefully that makes sense!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!