Hi,
This is my first post here. I am new to Proxmox and just set up a cluster of 3 nodes with the following configuration:
Network:
NIC#1:
1G - Corosync
1G - not used
NIC#2:
10G - VM traffic
10G - Ceph
iperf between CEPH interfaces shows bandwidth in a range of 8.16 - 9.41 Gbits/sec.
Storage:
4x3.84TB SSD SAMSUNG MZWLR3T8HBLS-00007 - Ceph OSDs for RBD storage (Disk Images)
1x1.92TB Kingston DC500M - Proxmox, Backups, ISO
Advertised speeds are: Read - 7000 MB/s, Write - 3800 MB/s
Used dd to test write speed directly on nvme mounted as ext4, looks good:
However, Rados benchmark shows really poor write performance (25MB/sec):
fio inside a VM is a bit better but still very poor (42.5MB/s):
Read speeds are a bit better but still very far from SSD specs and network bandwidth:
rados - 137MB/sec
fio - 236MB/sec
Why can Ceph performance be so much slower than network and SSD? What can I do to improve it? (I have used default settings when creating the cluster)
This is my first post here. I am new to Proxmox and just set up a cluster of 3 nodes with the following configuration:
Network:
NIC#1:
1G - Corosync
1G - not used
NIC#2:
10G - VM traffic
10G - Ceph
iperf between CEPH interfaces shows bandwidth in a range of 8.16 - 9.41 Gbits/sec.
Storage:
4x3.84TB SSD SAMSUNG MZWLR3T8HBLS-00007 - Ceph OSDs for RBD storage (Disk Images)
1x1.92TB Kingston DC500M - Proxmox, Backups, ISO
Advertised speeds are: Read - 7000 MB/s, Write - 3800 MB/s
Used dd to test write speed directly on nvme mounted as ext4, looks good:
Code:
root@PROXMOX-SERVER-01:~# dd if=/dev/zero of=/media/nvme/test1.img bs=2G count=10 oflag=dsync
dd: warning: partial read (2147479552 bytes); suggest iflag=fullblock
0+10 records in
0+10 records out
21474795520 bytes (21 GB, 20 GiB) copied, 19.0494 s, 1.1 GB/s
However, Rados benchmark shows really poor write performance (25MB/sec):
Code:
root@PROXMOX-SERVER-01:~# rados bench -p ceph 15 write -t 16 --object-size=4K
hints = 1
Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 for up to 15 seconds or 0 objects
...
Total time run: 15.0123
Total writes made: 93145
Write size: 4096
Object size: 4096
Bandwidth (MB/sec): 24.2366
Stddev Bandwidth: 1.54596
Max bandwidth (MB/sec): 25.3398
Min bandwidth (MB/sec): 20.168
Average IOPS: 6204
Stddev IOPS: 395.765
Max IOPS: 6487
Min IOPS: 5163
Average Latency(s): 0.00257616
Stddev Latency(s): 0.00281054
Max latency(s): 0.221893
Min latency(s): 0.00132077
fio inside a VM is a bit better but still very poor (42.5MB/s):
Code:
root@ubuntu-vm:~$ fio --ioengine=psync --filename=/tmp/test_fio --size=1G --time_based --name=fio --group_reporting --runtime=60 --direct=1 --sync=1 --rw=write --bs=4M --numjobs=1 --iodepth=16
fio: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=psync, iodepth=16
fio-3.28
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=40.0MiB/s][w=10 IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=63638: Fri Sep 13 08:55:58 2024
write: IOPS=10, BW=40.5MiB/s (42.5MB/s)(2436MiB/60110msec); 0 zone resets
clat (msec): min=48, max=287, avg=98.30, stdev=24.50
lat (msec): min=48, max=287, avg=98.69, stdev=24.51
clat percentiles (msec):
| 1.00th=[ 54], 5.00th=[ 67], 10.00th=[ 74], 20.00th=[ 82],
| 30.00th=[ 86], 40.00th=[ 89], 50.00th=[ 95], 60.00th=[ 106],
| 70.00th=[ 111], 80.00th=[ 115], 90.00th=[ 121], 95.00th=[ 130],
| 99.00th=[ 161], 99.50th=[ 207], 99.90th=[ 288], 99.95th=[ 288],
| 99.99th=[ 288]
bw ( KiB/s): min=24576, max=49250, per=100.00%, avg=41519.04, stdev=5490.84, samples=120
iops : min= 6, max= 12, avg=10.00, stdev= 1.36, samples=120
lat (msec) : 50=0.33%, 100=53.53%, 250=45.65%, 500=0.49%
cpu : usr=0.48%, sys=0.35%, ctx=1218, majf=0, minf=11
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,609,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
WRITE: bw=40.5MiB/s (42.5MB/s), 40.5MiB/s-40.5MiB/s (42.5MB/s-42.5MB/s), io=2436MiB (2554MB), run=60110-60110msec
Disk stats (read/write):
dm-0: ios=0/3063, merge=0/0, ticks=0/61452, in_queue=61452, util=98.95%, aggrios=0/4277, aggrmerge=0/617, aggrticks=0/191397, aggrin_queue=191674, aggrutil=99.31%
sda: ios=0/4277, merge=0/617, ticks=0/191397, in_queue=191674, util=99.31%
Read speeds are a bit better but still very far from SSD specs and network bandwidth:
rados - 137MB/sec
fio - 236MB/sec
Why can Ceph performance be so much slower than network and SSD? What can I do to improve it? (I have used default settings when creating the cluster)
Last edited: