PVE Ceph- should I use bcache?

Drkrieger · Aug 8, 2018

Hello!
I'm in the process of testing a PoC before deploying it into our production environment. I've been comparing between Bluestore and Filestore, both with SSD Journals (using Enterprise NVMe+4TB consumer spinners), and I'm finding that Filestore gives better overall performance (even with 100GB NVMe journal on both config types). I've read a few articles stating that bcache can improve performance, namely fix the 'slow requests' errors that can occasionally pop up. I currently have the PVE/Ceph cluster as a storage only, and will continue to use it as such in production. We have another separate cluster as a compute head (Intel Silver Scalable 12c/24t, 128GB RAM per node, 4 nodes), but for my PoC testing I'm just using a single box with a Ryzen 1800X and a 10Gb NIC connection.

My PVE/Ceph node configs are as follows:
CPU: Intel i5-8400 (all core turbo locked at 4GHz, C-States/Intel SpeedStep disabled)
Ram: 16GB DDR4 2666MHz C19
Mobo: Asus TUF Z370-Pro Gaming
Journal SSD: Samsung PM953 960GB NVMe M.2 (22110, PCIe x4)
OS SSD: WD Black 250GB NVMe SSD
OSD's: Seagate 4TB 7200rpm (256MB cache)
NIC: Intel X520-DA2, 10Gb fiber connection on all nodes and compute head
OS: Debian 9.5 (minimal, no gui, only SSH/System utils)

I've set my journal size to 100GB in the /etc/pve/ceph.conf, and I'm currently running Filestore. Doing benchmarks with programs like CrystalDiskMark sometimes causes the cluster to have delayed writes (slow requests on OSDs). Would building the OSDs with a bcache help reduce these errors? Has anyone done this successfully with PVE 5.2/Ceph Luminous?

P.S.- I never see my CPU's above 10% usage, nor the ram above ~15-20%. My I/O delays are usually in the 15-25% range though.

udo · Aug 8, 2018

Hi,
ceph with normal hdds are not fast... with more hdds you get faster, but this mean something like 6-10 nodes with 10 HDDs each.
(and every new node will speed up the ceph cluster).

And 16GB for the ceph osd node are much to less.

I've not understand how much nodes/OSDs do you have in your PoC.

About you bcache question: I don't have experiences with bcache, but I would use ceph as is it. Ceph is completly different to normal raid-storage so every addition to complexity is AFAIK not the right decision (for the first time).

Udo

Drkrieger · Aug 11, 2018

udo said:
Hi,
ceph with normal hdds are not fast... with more hdds you get faster, but this mean something like 6-10 nodes with 10 HDDs each.
(and every new node will speed up the ceph cluster).

And 16GB for the ceph osd node are much to less.

I've not understand how much nodes/OSDs do you have in your PoC.

About you bcache question: I don't have experiences with bcache, but I would use ceph as is it. Ceph is completly different to normal raid-storage so every addition to complexity is AFAIK not the right decision (for the first time).

Udo

FYI, I got the bcache working. Significant improvements in overall cluster write performance (more than double in my VM's CrystalDiskMark bechmarks). Reads were pretty much unaffected. And the load on the nodes is minimal still. The highest CPU usage I've seen is just over 20% on all nodes with 8 VM's running a 75/25 read/write IOMeter mix, and the ram is still only around 20% used. The cluster is averaging around 6500-8000 total IOPS during this load, and ZERO slow requests.

I'll try and get around to doing a write up on how I set this up, it pretty much involves doing everything from scratch in the command line (no webgui to make the OSD setup easy!).

Drkrieger · Aug 11, 2018

A comparison of without bcache to with:

No bcache, w/100GB NVMe Journal:

With 200GB bcache, 20GB NVMe Journal:

Higher block size on the VM drives will net much higher sequential write performance (I was seeing numbers over 400MB/s with 16K sectors).

AlexLup · Aug 23, 2018

Other than a ceph cache tier I havent really seen any improvement to speeds. SO I will def try out bcache!!

Thanks so much for this!

arnaudd · Sep 17, 2018

seems about same result wit zfs as caching with ceph on top
Ceph on ZFS, you will need modify systemd order and ceph volume (with lvm)

adriano_da_silva · May 18, 2022

Drkrieger said:
FYI, I got the bcache working. Significant improvements in overall cluster write performance (more than double in my VM's CrystalDiskMark bechmarks). Reads were pretty much unaffected. And the load on the nodes is minimal still. The highest CPU usage I've seen is just over 20% on all nodes with 8 VM's running a 75/25 read/write IOMeter mix, and the ram is still only around 20% used. The cluster is averaging around 6500-8000 total IOPS during this load, and ZERO slow requests.

I'll try and get around to doing a write up on how I set this up, it pretty much involves doing everything from scratch in the command line (no webgui to make the OSD setup easy!).

Hello,

I know this message is old, but please, I need to solve a similar problem. I'm trying to create an OSD using bcache drive. If it works, I intend to use bcache on all OSDs here. But when I try to build, in GUI bcache drives are not available for use. And from the CLI, the following error message appears: unable to get device info for '/dev/bcache0'

Let's see what happened:

Code:

root@pve-20:~# ls /dev/bcache*
/dev/bcache0
root@pve-20:~# ls /dev/nvme*
/dev/nvme0  /dev/nvme0n1  /dev/nvme0n1p1  /dev/nvme0n1p2  /dev/nvme0n1p3
root@pve-20:~# pveceph osd create /dev/bcache0 -db_dev /dev/nvme0n1p3
unable to get device info for '/dev/bcache0'
root@pve-20:~#

The bcache unit is working. I even did a test, creating a file system, mounting it in a directory and copying files inside it. And everything worked normally, although it didn't work to create the Ceph OSD.

Please someone could help.

Search

Search

PVE Ceph- should I use bcache?

Drkrieger

Active Member

udo

Distinguished Member

Drkrieger

Active Member

Drkrieger

Active Member

AlexLup

Well-Known Member

arnaudd

New Member

adriano_da_silva

Member

We value your privacy