Ceph HDDs slow

qwertbert · Mar 21, 2023

Hi,
I am currently experimenting with Ceph on a PVE cluster with 7 hosts.
Each of the hosts has two OSDs as 16TB SATA hard drives. Using dd to the HDDs I can write with speeds up to 270MB/s.
The storage and client network are both connected with 10GBit/s, which I have also tested with iperf3.
I have used replicated pools (3x) and erasure encoded (4/2 and 4/3) so far.
Unfortunately, I never get a speed higher than 20MB/s when I copy the a VM drive to the new Ceph storage.
Are there any settings that I have overlooked that could improve the speeds?
Thanks for your help.

Edit:
Here some graphics for the 4,3 erasure pool.

gurubert · Mar 22, 2023

Do you have the RocksDB of the OSDs on an extra SSD or NVMe?

If not 20MB/s would be the expected write speed as the requests to the RocksDB on HDD thrashes the read/write-head.

When you benchmarks the HDDs and get a speed 270MB/s this is surely a sequential write test. Try a random write test with a blocksize of 4KB and one thread and look at the results.

alexskysilk · Mar 22, 2023

Where are you writing your metadata to? if its to these same spinners, that result is probably correct. If you're writing metadata to NVME (or at least good SSDs) and you're seeing 20MB/S on sequential IO, you need to examine your network configuration and total IO through the pool. you should be seeing closer to 150MB/s.

EC pools take a pretty big hit on IOPs- so your results are not out of bounds; this is never going to be performant without a LOT more OSDs, and metadata and db on fast storage devices. depending on the use case, I would probably lean towards a single filer with all the drives instead. 14 drives would fit easily in a Supermicro chassis, etc which will yield much more satisfying results.

I have EC pools running with spinners, but never with less then ~100 OSDs.

qwertbert · Mar 23, 2023

I installed some Optane P4801X I had lying around and now use them as DB/WOL disks for the spinner OSDs.
Now i have write speeds that are much much better. Thanks for the little push in the right direction!

need2gcm · Mar 23, 2023

Also ensure the spinners are CMR and not SMR (shingled). Those will give you serious performance hits, especially when running with non-synthetic reads and writes.

aychprox · Mar 24, 2023

qwertbert said:
I installed some Optane P4801X I had lying around and now use them as DB/WOL disks for the spinner OSDs.
Now i have write speeds that are much much better. Thanks for the little push in the right direction!

View attachment 48321

I'm interested to know your OSD apply & commit latency before and after add-in optane.

jdancer · Mar 26, 2023

I use SAS HDDs in production. I use the following optimizations learned through trial-and-error:

Set write cache enable (WCE) to 1 on SAS drives (sdparm -s WCE=1 -S /dev/sd[x])
Set VM cache to none
Set VM to use VirtIO-single SCSI controller and enable IO thread and discard option
Set VM CPU type to 'host'
Set VM CPU NUMA if server has 2 or more physical CPU sockets
Set VM VirtIO Multiqueue to number of cores/vCPUs
Set VM to have qemu-guest-agent software installed
Set Linux VMs IO scheduler to none/noop
Set RBD pool to use the 'krbd' option if using Ceph

Search

Search

Ceph HDDs slow

qwertbert

Member

gurubert

Distinguished Member

alexskysilk

Distinguished Member

qwertbert

Member

need2gcm

Active Member

aychprox

Renowned Member

jdancer

Renowned Member

We value your privacy