Ceph with 3 Nodes with IO and dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync

UK SPEED · Jun 18, 2024

Hello guys
I have 3 VDSs with proxmox ( Proxmox inside VPS don't know if this is the wrong thing to do), which are all linked with the Ceph pool with 10GB a ports My first Node with Micron 9400 Max and the others with gen 3 Nvmes intel drives.

when I try to do dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync on the main server on Ceph storage I get

1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.97877 s, 180 MB/s
root@MariaDB:~# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.86801 s, 183 MB/s
root@MariaDB:~# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 5.17572 s, 207 MB/s
root@MariaDB:~#

But with the ZFS storage out of the Ceph storages I get

[root@test ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.16568 s, 921 MB/s
[root@test ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.0955 s, 980 MB/s
[root@test ~]# dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync+
1+0 records in

Does this command read and write the data on all the Ceph or on the main node only ?

Also
Is the Ceph the main reason for that, I'm getting IO delays with 8-14 % on the main node only on the cluster ?

aaron · Jun 18, 2024

using dd if=/dev/zero is a bad idea for performance benchmarks. ZFS for example, will not write out all the zeroes, and therefore might get a much better, but wrong, performance reading.
Use either /dev/urandom as source, or even better, a dedicated tool for IO benchmarks, like FIO.

UK SPEED · Jun 18, 2024

test with Ceph

[root@test ~]# fio --name=fiotest --filename=/home/test1 --size=16Gb --rw=randread --bs=8K --direct=1 --numjobs=8 --ioengine=libaio --iodepth=32 --group_reporting --runtime=60 --startdelay=60
fiotest: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=32
...
fio-3.35
Starting 8 processes
Jobs: 8 (f=8): [r(8)][100.0%][r=563MiB/s][r=72.0k IOPS][eta 00m:00s]
fiotest: (groupid=0, jobs=8): err= 0: pid=134: Tue Jun 18 15:19:42 2024
read: IOPS=65.1k, BW=508MiB/s (533MB/s)(29.8GiB/60026msec)
slat (nsec): min=1522, max=199217k, avg=119857.70, stdev=581241.29
clat (nsec): min=590, max=238435k, avg=3814595.10, stdev=7055736.42
lat (usec): min=2, max=240508, avg=3934.45, stdev=7146.28
clat percentiles (usec):
| 1.00th=[ 9], 5.00th=[ 29], 10.00th=[ 43], 20.00th=[ 78],
| 30.00th=[ 1123], 40.00th=[ 1647], 50.00th=[ 2089], 60.00th=[ 2573],
| 70.00th=[ 3195], 80.00th=[ 4146], 90.00th=[ 6390], 95.00th=[17433],
| 99.00th=[36963], 99.50th=[40109], 99.90th=[50070], 99.95th=[56361],
| 99.99th=[84411]
bw ( KiB/s): min=308832, max=624172, per=100.00%, avg=521117.11, stdev=6318.96, samples=952
iops : min=38604, max=78020, avg=65138.03, stdev=789.84, samples=952
lat (nsec) : 750=0.26%, 1000=0.03%
lat (usec) : 2=0.01%, 4=0.22%, 10=0.64%, 20=1.79%, 50=10.03%
lat (usec) : 100=8.41%, 250=1.19%, 500=1.15%, 750=1.82%, 1000=2.64%
lat (msec) : 2=19.84%, 4=30.80%, 10=14.41%, 20=2.20%, 50=4.45%
lat (msec) : 100=0.09%, 250=0.01%
cpu : usr=0.67%, sys=3.55%, ctx=995774, majf=0, minf=630
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=3904692,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
READ: bw=508MiB/s (533MB/s), 508MiB/s-508MiB/s (533MB/s-533MB/s), io=29.8GiB (32.0GB), run=60026-60026msec

Disk stats (read/write):
rbd0: ios=1432407/12, merge=0/0, ticks=7518743/151, in_queue=7518894, util=100.00%

without Ceph

# fio --name=fiotest --filename=/home/test1 --size=16Gb --rw=randread --bs=8K --direct=1 --numjobs=8 --ioengine=libaio --iodepth=32 --group_reporting --runtime=60 --startdelay=60
fiotest: (g=0): rw=randread, bs=(R) 8192B-8192B, (W) 8192B-8192B, (T) 8192B-8192B, ioengine=libaio, iodepth=32
...
fio-3.35
Starting 8 processes
Jobs: 8 (f=8): [r(8)][100.0%][r=451MiB/s][r=57.8k IOPS][eta 00m:00s]
fiotest: (groupid=0, jobs=8): err= 0: pid=125: Tue Jun 18 15:28:00 2024
read: IOPS=56.3k, BW=440MiB/s (461MB/s)(25.8GiB/60004msec)
slat (usec): min=4, max=49025, avg=139.96, stdev=1134.92
clat (nsec): min=1762, max=49743k, avg=4408015.33, stdev=5613049.85
lat (usec): min=13, max=49762, avg=4547.97, stdev=5677.38
clat percentiles (usec):
| 1.00th=[ 453], 5.00th=[ 474], 10.00th=[ 486], 20.00th=[ 506],
| 30.00th=[ 529], 40.00th=[ 553], 50.00th=[ 627], 60.00th=[ 3556],
| 70.00th=[ 5800], 80.00th=[ 8356], 90.00th=[12387], 95.00th=[16188],
| 99.00th=[23987], 99.50th=[26084], 99.90th=[30802], 99.95th=[33162],
| 99.99th=[40109]
bw ( KiB/s): min=384273, max=497712, per=100.00%, avg=450576.36, stdev=2574.49, samples=952
iops : min=48032, max=62213, avg=56320.66, stdev=321.82, samples=952
lat (usec) : 2=0.01%, 4=0.01%, 20=0.01%, 50=0.01%, 100=0.01%
lat (usec) : 250=0.01%, 500=16.27%, 750=34.58%, 1000=0.69%
lat (msec) : 2=2.76%, 4=7.54%, 10=23.05%, 20=12.57%, 50=2.54%
cpu : usr=0.58%, sys=11.86%, ctx=86248, majf=11, minf=608
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=3376721,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
READ: bw=440MiB/s (461MB/s), 440MiB/s-440MiB/s (461MB/s-461MB/s), io=25.8GiB (27.7GB), run=60004-60004msec

Is that good @aaron

Jordan.zhang · Jun 19, 2024

Can you list your hardware configuration? Your test results seem to be much better than mine. By the way, did you mount /dev/rbd0 to /home?

UK SPEED · Jun 25, 2024

@ Jordan.zhang
Hello mate, do you mean the test without Ceph?

Ceph with 3 Nodes with IO and dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync

UK SPEED

Member

aaron

Proxmox Staff Member

UK SPEED

Member

Jordan.zhang

New Member

UK SPEED

Member

We value your privacy