Ceph poor write speed

coolwap

New Member
Sep 12, 2024
1
0
1
Hi,

This is my first post here. I am new to Proxmox and just set up a cluster of 3 nodes with the following configuration:

Network:
NIC#1:
1G - Corosync
1G - not used
NIC#2:
10G - VM traffic
10G - Ceph

iperf between CEPH interfaces shows bandwidth in a range of 8.16 - 9.41 Gbits/sec.

Storage:
4x3.84TB SSD SAMSUNG MZWLR3T8HBLS-00007 - Ceph OSDs for RBD storage (Disk Images)
1x1.92TB Kingston DC500M - Proxmox, Backups, ISO

Advertised speeds are: Read - 7000 MB/s, Write - 3800 MB/s
Used dd to test write speed directly on nvme mounted as ext4, looks good:
Code:
root@PROXMOX-SERVER-01:~# dd if=/dev/zero of=/media/nvme/test1.img bs=2G count=10 oflag=dsync
dd: warning: partial read (2147479552 bytes); suggest iflag=fullblock
0+10 records in
0+10 records out
21474795520 bytes (21 GB, 20 GiB) copied, 19.0494 s, 1.1 GB/s

However, Rados benchmark shows really poor write performance (25MB/sec):
Code:
root@PROXMOX-SERVER-01:~# rados bench -p ceph 15 write -t 16 --object-size=4K
hints = 1
Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 for up to 15 seconds or 0 objects
...
Total time run:         15.0123
Total writes made:      93145
Write size:             4096
Object size:            4096
Bandwidth (MB/sec):     24.2366
Stddev Bandwidth:       1.54596
Max bandwidth (MB/sec): 25.3398
Min bandwidth (MB/sec): 20.168
Average IOPS:           6204
Stddev IOPS:            395.765
Max IOPS:               6487
Min IOPS:               5163
Average Latency(s):     0.00257616
Stddev Latency(s):      0.00281054
Max latency(s):         0.221893
Min latency(s):         0.00132077

fio inside a VM is a bit better but still very poor (42.5MB/s):
Code:
root@ubuntu-vm:~$ fio --ioengine=psync --filename=/tmp/test_fio --size=1G --time_based --name=fio --group_reporting --runtime=60 --direct=1 --sync=1 --rw=write --bs=4M --numjobs=1 --iodepth=16
fio: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=16
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=40.0MiB/s][w=10 IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=63638: Fri Sep 13 08:55:58 2024
  write: IOPS=10, BW=40.5MiB/s (42.5MB/s)(2436MiB/60110msec); 0 zone resets
    clat (msec): min=48, max=287, avg=98.30, stdev=24.50
     lat (msec): min=48, max=287, avg=98.69, stdev=24.51
    clat percentiles (msec):
     |  1.00th=[   54],  5.00th=[   67], 10.00th=[   74], 20.00th=[   82],
     | 30.00th=[   86], 40.00th=[   89], 50.00th=[   95], 60.00th=[  106],
     | 70.00th=[  111], 80.00th=[  115], 90.00th=[  121], 95.00th=[  130],
     | 99.00th=[  161], 99.50th=[  207], 99.90th=[  288], 99.95th=[  288],
     | 99.99th=[  288]
   bw (  KiB/s): min=24576, max=49250, per=100.00%, avg=41519.04, stdev=5490.84, samples=120
   iops        : min=    6, max=   12, avg=10.00, stdev= 1.36, samples=120
  lat (msec)   : 50=0.33%, 100=53.53%, 250=45.65%, 500=0.49%
  cpu          : usr=0.48%, sys=0.35%, ctx=1218, majf=0, minf=11
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,609,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=40.5MiB/s (42.5MB/s), 40.5MiB/s-40.5MiB/s (42.5MB/s-42.5MB/s), io=2436MiB (2554MB), run=60110-60110msec

Disk stats (read/write):
    dm-0: ios=0/3063, merge=0/0, ticks=0/61452, in_queue=61452, util=98.95%, aggrios=0/4277, aggrmerge=0/617, aggrticks=0/191397, aggrin_queue=191674, aggrutil=99.31%
  sda: ios=0/4277, merge=0/617, ticks=0/191397, in_queue=191674, util=99.31%

Read speeds are a bit better but still very far from SSD specs and network bandwidth:
rados - 137MB/sec
fio - 236MB/sec

Why can Ceph performance be so much slower than network and SSD? What can I do to improve it? (I have used default settings when creating the cluster)
 
Last edited:
What CPUs doe the nodes have? Are the BIOS settings set to max performance / low latency? Maybe the vendor has some guides on that too.

Did you set a target_ratio for the pool? -> what is the current pg_num of the pool?
10 Gbit might be one bottleneck as well, overall with fast enough SSDs.

However, Rados benchmark shows really poor write performance (25MB/sec):
What if you test it with larger object sizes? Rados will work on 4M objects for example.

fio inside a VM is a bit better but still very poor (42.5MB/s):

What is the config of the VM? qm config {vmid}
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!