Ceph HDDs slow

Dec 31, 2021
17
9
8
Hi,
I am currently experimenting with Ceph on a PVE cluster with 7 hosts.
Each of the hosts has two OSDs as 16TB SATA hard drives. Using dd to the HDDs I can write with speeds up to 270MB/s.
The storage and client network are both connected with 10GBit/s, which I have also tested with iperf3.
I have used replicated pools (3x) and erasure encoded (4/2 and 4/3) so far.
Unfortunately, I never get a speed higher than 20MB/s when I copy the a VM drive to the new Ceph storage.
Are there any settings that I have overlooked that could improve the speeds?
Thanks for your help.

Edit:
Here some graphics for the 4,3 erasure pool.

1679411630452.png

1679411575569.png
1679411602417.png
 
Last edited:
Do you have the RocksDB of the OSDs on an extra SSD or NVMe?

If not 20MB/s would be the expected write speed as the requests to the RocksDB on HDD thrashes the read/write-head.

When you benchmarks the HDDs and get a speed 270MB/s this is surely a sequential write test. Try a random write test with a blocksize of 4KB and one thread and look at the results.
 
Last edited:
  • Like
Reactions: qwertbert
Where are you writing your metadata to? if its to these same spinners, that result is probably correct. If you're writing metadata to NVME (or at least good SSDs) and you're seeing 20MB/S on sequential IO, you need to examine your network configuration and total IO through the pool. you should be seeing closer to 150MB/s.

EC pools take a pretty big hit on IOPs- so your results are not out of bounds; this is never going to be performant without a LOT more OSDs, and metadata and db on fast storage devices. depending on the use case, I would probably lean towards a single filer with all the drives instead. 14 drives would fit easily in a Supermicro chassis, etc which will yield much more satisfying results.

I have EC pools running with spinners, but never with less then ~100 OSDs.
 
I installed some Optane P4801X I had lying around and now use them as DB/WOL disks for the spinner OSDs.
Now i have write speeds that are much much better. Thanks for the little push in the right direction!

1679585122406.png
 
  • Like
Reactions: gurubert
I use SAS HDDs in production. I use the following optimizations learned through trial-and-error:

Set write cache enable (WCE) to 1 on SAS drives (sdparm -s WCE=1 -S /dev/sd[x])
Set VM cache to none
Set VM to use VirtIO-single SCSI controller and enable IO thread and discard option
Set VM CPU type to 'host'
Set VM CPU NUMA if server has 2 or more physical CPU sockets
Set VM VirtIO Multiqueue to number of cores/vCPUs
Set VM to have qemu-guest-agent software installed
Set Linux VMs IO scheduler to none/noop
Set RBD pool to use the 'krbd' option if using Ceph
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!