Hello folks
Sorry. This is long-ish. It's been a saga in my life ...
I'm not clear on exactly how to deploy a cache disk with CEPH.
I've read a lot of stuff about setting up CEPH, from Proxmox and the CEPH site. Browsed some forum posts. Done a bunch of test builds.
I'm an old VMware guy. And I'm converting an existing VSAN to CEPH.
In VSAN, you have disk groups, and each disk group can have cache disks assigned.
In CEPH when you setup an OSD, you have the option of putting the DB and WALL either on the OSD or another disk.
You can use one DB/WAL disk for multiple OSDs.
That seems kinda like cache for a VMware disk group. (Correct me if that's wrong.)
The cluster I'm converting to CEPH is performing fairly horrible. VMs get ~ 350 MB/s for both read and write. That's spinning rust speed. Bad.
(EDIT ... THIS 350MB/s WAS A BAD TEST CONFIG ON MY PART.)
I tested the original VSAN cluster before it was torn down, and it did 450 MB/s write and 4.5 GB/s read. Not great, but not this bad.
If VSAN could go that fast, maybe CEPH can with the proper config...
The hosts have identical SSD, no fast cache disks. (I just discovered this.)
The disks are SATA (not SAS) so they connect at 6gb. (Another lovely discovery.)
The network is 10gb. (Its always been that way. Not perfect, but its exactly what VSAN ran on ... much faster.)
It is clear why they are slow.
What is not clear is why they are so much slower than VSAN was.
I've done a lot of rebuilds, trying to tune it. Gawd the time I have into this already. Weeks. Many weeks.
The amount of disks in the CEPH array seems to not really matter. I get the same speed whether its 2 disks per host or 6 disks per host.
I've tried using one of the disks as local cache (DB/WAL). Zero improvement over no cache disk.
I've actually read a CEPH performance tuning article that indicates CEPH on SATA (6gb) does not significantly benefit from using a SATA (6gb) cache disk.
Yet VSAN was acceptable on this very same hardware with no fast cache. Is there something I'm doing wrong when i setup cache?
The one thing that's really clear is that I must order a SAS (12gb connection) enterprise-class MU SSD for each host to use as cache.
That alone I expect to make significant difference.
But what can I do with this gear today? How can I make it perform like VSAN did?
Sorry. This is long-ish. It's been a saga in my life ...
I'm not clear on exactly how to deploy a cache disk with CEPH.
I've read a lot of stuff about setting up CEPH, from Proxmox and the CEPH site. Browsed some forum posts. Done a bunch of test builds.
I'm an old VMware guy. And I'm converting an existing VSAN to CEPH.
In VSAN, you have disk groups, and each disk group can have cache disks assigned.
In CEPH when you setup an OSD, you have the option of putting the DB and WALL either on the OSD or another disk.
You can use one DB/WAL disk for multiple OSDs.
That seems kinda like cache for a VMware disk group. (Correct me if that's wrong.)
The cluster I'm converting to CEPH is performing fairly horrible. VMs get ~ 350 MB/s for both read and write. That's spinning rust speed. Bad.
(EDIT ... THIS 350MB/s WAS A BAD TEST CONFIG ON MY PART.)
I tested the original VSAN cluster before it was torn down, and it did 450 MB/s write and 4.5 GB/s read. Not great, but not this bad.
If VSAN could go that fast, maybe CEPH can with the proper config...
The hosts have identical SSD, no fast cache disks. (I just discovered this.)
The disks are SATA (not SAS) so they connect at 6gb. (Another lovely discovery.)
The network is 10gb. (Its always been that way. Not perfect, but its exactly what VSAN ran on ... much faster.)
It is clear why they are slow.
What is not clear is why they are so much slower than VSAN was.
I've done a lot of rebuilds, trying to tune it. Gawd the time I have into this already. Weeks. Many weeks.
The amount of disks in the CEPH array seems to not really matter. I get the same speed whether its 2 disks per host or 6 disks per host.
I've tried using one of the disks as local cache (DB/WAL). Zero improvement over no cache disk.
I've actually read a CEPH performance tuning article that indicates CEPH on SATA (6gb) does not significantly benefit from using a SATA (6gb) cache disk.
Yet VSAN was acceptable on this very same hardware with no fast cache. Is there something I'm doing wrong when i setup cache?
The one thing that's really clear is that I must order a SAS (12gb connection) enterprise-class MU SSD for each host to use as cache.
That alone I expect to make significant difference.
But what can I do with this gear today? How can I make it perform like VSAN did?
Last edited: