Hope someone can clarify this for me.
Proxmox 8.1.5 Ceph Reef 18.2.1 cluster, with 4 Dell r630 hosts, 6 ssd drives each, Ceph network with dual 10Ge connectivity.
With rados bench I get what is expected: wirespeed performance for single host test:
rados bench -p ceph01 120 write -b 4M -t 16 --run-name `hostname` --no-cleanup
Total time run: 120.032
Total writes made: 35159
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 1171.65
Stddev Bandwidth: 54.414
Max bandwidth (MB/sec): 1268
Min bandwidth (MB/sec): 1004
Average IOPS: 292
Stddev IOPS: 13.6035
Max IOPS: 317
Min IOPS: 251
Average Latency(s): 0.0546073
Stddev Latency(s): 0.019493
Max latency(s): 0.301827
Min latency(s): 0.0217319
rados bench -p ceph01 600 seq -t 16 --run-name `hostname`
Total time run: 88.336
Total reads made: 35159
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 1592.06
Average IOPS: 398
Stddev IOPS: 28.5776
Max IOPS: 458
Min IOPS: 325
Average Latency(s): 0.0389745
Max latency(s): 0.450536
Min latency(s): 0.0121847
I think 1171MB/s write, 1592MB/s read is excellent, no complaints whatsoever!
The odd thing is, if I do a performance test on a VM that has it's disk on the Ceph pool on the cluster. I get the following results with fio tests:
fio --ioengine=psync --filename=/var/tmp/test_fio --size=5G --time_based --name=fio --group_reporting --runtime=15 --direct=1 --sync=1 --rw=write --bs=4M --numjobs=1 --iodepth=1
WRITE: bw=129MiB/s (135MB/s), 129MiB/s-129MiB/s (135MB/s-135MB/s), io=75.5GiB (81.1GB), run=600017-600017msec
fio --ioengine=psync --filename=/var/tmp/test_fio --size=5G --time_based --name=fio --group_reporting --runtime=600 --direct=1 --sync=1 --rw=read --bs=4M --numjobs=1 --iodepth=1
READ: bw=368MiB/s (386MB/s), 368MiB/s-368MiB/s (386MB/s-386MB/s), io=216GiB (232GB), run=600009-600009msec
Using bigger block size of 16M gives better results.
fio --ioengine=psync --filename=/var/tmp/test_fio --size=5G --time_based --name=fio --group_reporting --runtime=30 --direct=1 --sync=1 --rw=write --bs=16M --numjobs=1 --iodepth=1
WRITE: bw=317MiB/s (332MB/s), 317MiB/s-317MiB/s (332MB/s-332MB/s), io=9504MiB (9966MB), run=30002-30002msec
fio --ioengine=psync --filename=/var/tmp/test_fio --size=5G --time_based --name=fio --group_reporting --runtime=30 --direct=1 --sync=1 --rw=read --bs=16M --numjobs=1 --iodepth=1
READ: bw=845MiB/s (886MB/s), 845MiB/s-845MiB/s (886MB/s-886MB/s), io=24.8GiB (26.6GB), run=30005-30005msec
So. Good read performance especially with bigger block sizes, but why is the write performance so slow, if the underlying Ceph can easily do over 1000MB/s?
Test VM config:
agent: 1
boot: order=scsi0;ide2;net0
cores: 4
ide2: none,media=cdrom
memory: 2048
name: testvm
net0: virtio=B6:6B:4E:CA:91:16,bridge=vmbr1
numa: 0
onboot: 1
ostype: l26
scsi0: ceph01:vm-501-disk-0,iothread=1,size=32G
scsi1: ceph01:vm-501-disk-1,iothread=1,size=160G
scsihw: virtio-scsi-single
smbios1: uuid=807fc9e6-7b1d-4af1-a2c1-882b4a0c43b9
sockets: 1
tags:
vmgenid: 0f5e7ed6-e735-4f5b-9b80-8c2a71710a52