PVE Ceph- should I use bcache?

Drkrieger

New Member
Jul 27, 2018
13
7
3
38
Hello!
I'm in the process of testing a PoC before deploying it into our production environment. I've been comparing between Bluestore and Filestore, both with SSD Journals (using Enterprise NVMe+4TB consumer spinners), and I'm finding that Filestore gives better overall performance (even with 100GB NVMe journal on both config types). I've read a few articles stating that bcache can improve performance, namely fix the 'slow requests' errors that can occasionally pop up. I currently have the PVE/Ceph cluster as a storage only, and will continue to use it as such in production. We have another separate cluster as a compute head (Intel Silver Scalable 12c/24t, 128GB RAM per node, 4 nodes), but for my PoC testing I'm just using a single box with a Ryzen 1800X and a 10Gb NIC connection.

My PVE/Ceph node configs are as follows:
CPU: Intel i5-8400 (all core turbo locked at 4GHz, C-States/Intel SpeedStep disabled)
Ram: 16GB DDR4 2666MHz C19
Mobo: Asus TUF Z370-Pro Gaming
Journal SSD: Samsung PM953 960GB NVMe M.2 (22110, PCIe x4)
OS SSD: WD Black 250GB NVMe SSD
OSD's: Seagate 4TB 7200rpm (256MB cache)
NIC: Intel X520-DA2, 10Gb fiber connection on all nodes and compute head
OS: Debian 9.5 (minimal, no gui, only SSH/System utils)

I've set my journal size to 100GB in the /etc/pve/ceph.conf, and I'm currently running Filestore. Doing benchmarks with programs like CrystalDiskMark sometimes causes the cluster to have delayed writes (slow requests on OSDs). Would building the OSDs with a bcache help reduce these errors? Has anyone done this successfully with PVE 5.2/Ceph Luminous?

P.S.- I never see my CPU's above 10% usage, nor the ram above ~15-20%. My I/O delays are usually in the 15-25% range though.
 
  • Like
Reactions: AlexLup

udo

Famous Member
Apr 22, 2009
5,934
183
83
Ahrensburg; Germany
Hi,
ceph with normal hdds are not fast... with more hdds you get faster, but this mean something like 6-10 nodes with 10 HDDs each.
(and every new node will speed up the ceph cluster).

And 16GB for the ceph osd node are much to less.

I've not understand how much nodes/OSDs do you have in your PoC.

About you bcache question: I don't have experiences with bcache, but I would use ceph as is it. Ceph is completly different to normal raid-storage so every addition to complexity is AFAIK not the right decision (for the first time).

Udo
 

Drkrieger

New Member
Jul 27, 2018
13
7
3
38
Hi,
ceph with normal hdds are not fast... with more hdds you get faster, but this mean something like 6-10 nodes with 10 HDDs each.
(and every new node will speed up the ceph cluster).

And 16GB for the ceph osd node are much to less.

I've not understand how much nodes/OSDs do you have in your PoC.

About you bcache question: I don't have experiences with bcache, but I would use ceph as is it. Ceph is completly different to normal raid-storage so every addition to complexity is AFAIK not the right decision (for the first time).

Udo


FYI, I got the bcache working. Significant improvements in overall cluster write performance (more than double in my VM's CrystalDiskMark bechmarks). Reads were pretty much unaffected. And the load on the nodes is minimal still. The highest CPU usage I've seen is just over 20% on all nodes with 8 VM's running a 75/25 read/write IOMeter mix, and the ram is still only around 20% used. The cluster is averaging around 6500-8000 total IOPS during this load, and ZERO slow requests.

I'll try and get around to doing a write up on how I set this up, it pretty much involves doing everything from scratch in the command line (no webgui to make the OSD setup easy!).
 
  • Like
Reactions: AlexLup

Drkrieger

New Member
Jul 27, 2018
13
7
3
38
A comparison of without bcache to with:

No bcache, w/100GB NVMe Journal:
PotcPBU.jpg


With 200GB bcache, 20GB NVMe Journal:
PziY54L.jpg


Higher block size on the VM drives will net much higher sequential write performance (I was seeing numbers over 400MB/s with 16K sectors).
 

AlexLup

Member
Mar 19, 2018
215
12
23
40
Other than a ceph cache tier I havent really seen any improvement to speeds. SO I will def try out bcache!!

Thanks so much for this!
 

arnaudd

New Member
Aug 4, 2017
11
0
1
47
seems about same result wit zfs as caching with ceph on top
Ceph on ZFS, you will need modify systemd order and ceph volume (with lvm)
 
Sep 14, 2020
37
1
8
45
FYI, I got the bcache working. Significant improvements in overall cluster write performance (more than double in my VM's CrystalDiskMark bechmarks). Reads were pretty much unaffected. And the load on the nodes is minimal still. The highest CPU usage I've seen is just over 20% on all nodes with 8 VM's running a 75/25 read/write IOMeter mix, and the ram is still only around 20% used. The cluster is averaging around 6500-8000 total IOPS during this load, and ZERO slow requests.

I'll try and get around to doing a write up on how I set this up, it pretty much involves doing everything from scratch in the command line (no webgui to make the OSD setup easy!).
Hello,

I know this message is old, but please, I need to solve a similar problem. I'm trying to create an OSD using bcache drive. If it works, I intend to use bcache on all OSDs here. But when I try to build, in GUI bcache drives are not available for use. And from the CLI, the following error message appears: unable to get device info for '/dev/bcache0'

Let's see what happened:
Code:
root@pve-20:~# ls /dev/bcache*
/dev/bcache0
root@pve-20:~# ls /dev/nvme*
/dev/nvme0  /dev/nvme0n1  /dev/nvme0n1p1  /dev/nvme0n1p2  /dev/nvme0n1p3
root@pve-20:~# pveceph osd create /dev/bcache0 -db_dev /dev/nvme0n1p3
unable to get device info for '/dev/bcache0'
root@pve-20:~#

The bcache unit is working. I even did a test, creating a file system, mounting it in a directory and copying files inside it. And everything worked normally, although it didn't work to create the Ceph OSD.

Please someone could help.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!