Was kann ich an Ceph Performance erwarten


Dec 12, 2021
wir haben einen neuen Proxmox Cluster mit 5 Nodes und einem Backupserver. Ich kann nur überhaupt nicht wirklich einschätzen was ich an Leistung erwarten kann.

4 x Dell DC NVMe ISE 7450 RI U.2 7.68TB
4 x 10gb Netzwerkkarte
2 x 1gb Netzwerkkarte
2 x 100gb Netzwerk

Auf den 100gb Karten läuft Ceph.

root@pve5:~# fio --ioengine=libaio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name=fio
fio: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=296MiB/s][w=75.8k IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=16447: Thu Jul 25 14:09:27 2024
  write: IOPS=75.6k, BW=295MiB/s (310MB/s)(17.3GiB/60001msec); 0 zone resets
    slat (nsec): min=2220, max=46981, avg=2428.28, stdev=309.09
    clat (nsec): min=620, max=620751, avg=10537.05, stdev=1044.35
     lat (usec): min=9, max=630, avg=12.97, stdev= 1.10
    clat percentiles (nsec):
     |  1.00th=[10176],  5.00th=[10176], 10.00th=[10304], 20.00th=[10304],
     | 30.00th=[10304], 40.00th=[10432], 50.00th=[10432], 60.00th=[10432],
     | 70.00th=[10560], 80.00th=[10560], 90.00th=[10688], 95.00th=[10944],
     | 99.00th=[13248], 99.50th=[15936], 99.90th=[24192], 99.95th=[24960],
     | 99.99th=[31872]
   bw (  KiB/s): min=298736, max=305360, per=100.00%, avg=302530.22, stdev=1471.07, samples=119
   iops        : min=74684, max=76340, avg=75632.61, stdev=367.76, samples=119
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.08%, 20=99.51%, 50=0.41%
  lat (usec)   : 100=0.01%, 250=0.01%, 750=0.01%
  cpu          : usr=6.92%, sys=39.76%, ctx=4534644, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4535227,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=295MiB/s (310MB/s), 295MiB/s-295MiB/s (310MB/s-310MB/s), io=17.3GiB (18.6GB), run=60001-60001msec

Disk stats (read/write):
  nvme0n1: ios=350/4541653, merge=8/2916, ticks=56/36713, in_queue=36769, util=99.91%

auf einer VM mit Windows 2022 hatte ich im read 24000 iops.

Linux mit kubernetes:

- mit OpenEBS Local PV(das ist dann die Lokal Platte vom Host):

./kubestr fio -s openebs-hostpath
PVC created kubestr-fio-pvc-tkhlg
Pod created kubestr-fio-pod-hjbwb
Running FIO test (default-fio) on StorageClass (openebs-hostpath) with a PVC of Size (100Gi)
Elapsed time- 24.549538772s
FIO test results:
FIO version - fio-3.36
Global options - ioengine=libaio verify=0 direct=1 gtod_reduce=1

JobName: read_iops
  blocksize=4K filesize=2G iodepth=64 rw=randread
  IOPS=42727.835938 BW(KiB/s)=170928
  iops: min=33366 max=45955 avg=42794.929688
  bw(KiB/s): min=133464 max=183823 avg=171180.000000

JobName: write_iops
  blocksize=4K filesize=2G iodepth=64 rw=randwrite
  IOPS=34595.921875 BW(KiB/s)=138400
  iops: min=26228 max=38732 avg=34435.414062
  bw(KiB/s): min=104912 max=154928 avg=137741.859375

JobName: read_bw
  blocksize=128K filesize=2G iodepth=64 rw=randread
  IOPS=32374.083984 BW(KiB/s)=4144420
  iops: min=24112 max=34513 avg=32431.792969
  bw(KiB/s): min=3086336 max=4417667 avg=4151284.750000

JobName: write_bw
  blocksize=128k filesize=2G iodepth=64 rw=randwrite
  IOPS=28725.218750 BW(KiB/s)=3677365
  iops: min=14702 max=30946 avg=28716.689453
  bw(KiB/s): min=1881856 max=3961088 avg=3675741.000000

Disk stats (read/write):
  sdb: ios=1273852/1068990 merge=0/16 ticks=1965217/1675217 in_queue=3647005, util=42.120041%
  -  OK

2. mit ceph-csi block Treiber:

./kubestr fio -s ceph-block
PVC created kubestr-fio-pvc-m9qzw
Pod created kubestr-fio-pod-4cbd2
Running FIO test (default-fio) on StorageClass (ceph-block) with a PVC of Size (100Gi)
Elapsed time- 26.418651031s
FIO test results:
FIO version - fio-3.36
Global options - ioengine=libaio verify=0 direct=1 gtod_reduce=1

JobName: read_iops
  blocksize=4K filesize=2G iodepth=64 rw=randread
  IOPS=2961.541016 BW(KiB/s)=11862
  iops: min=2899 max=3046 avg=2966.466553
  bw(KiB/s): min=11599 max=12184 avg=11866.266602

JobName: write_iops
  blocksize=4K filesize=2G iodepth=64 rw=randwrite
  IOPS=1739.318481 BW(KiB/s)=6974
  iops: min=1510 max=1814 avg=1742.433350
  bw(KiB/s): min=6040 max=7256 avg=6969.799805

JobName: read_bw
  blocksize=128K filesize=2G iodepth=64 rw=randread
  IOPS=2925.999756 BW(KiB/s)=375064
  iops: min=2854 max=3018 avg=2931.933350
  bw(KiB/s): min=365312 max=386304 avg=375301.875000

JobName: write_bw
  blocksize=128k filesize=2G iodepth=64 rw=randwrite
  IOPS=1756.755005 BW(KiB/s)=225401
  iops: min=1724 max=1808 avg=1757.933350
  bw(KiB/s): min=220672 max=231424 avg=225020.406250

Disk stats (read/write):
  rbd0: ios=100394/59699 merge=0/830 ticks=2173550/1308741 in_queue=3482292, util=99.479439%
  -  OK

Wie kann ich das einstufen, ceph-csi erscheint mir zu langsam?
Bei diesem Test hast du direkt eine NVMe getestet, ohne Ceph. Diese Werte sehen aber nicht nach normaler NVMe Performance aus.
Wie sind die NVMe angebunden? Dirket onboard oder über einen Perc Controller. Ja soetwas sehe ich ab und zu, je nachdem wer bei DELL das Sizing gemacht hat.
Wenn die NVMe schon nicht performt, wird das mit den zusätzlichen Latenzen des Netzwerks nicht besser.

P.S. eventuell lässt du den Test noch einmal ohne iodepth=1 laufen. Das NVMe Protokoll ist für Parallelisierung gebaut und profitiert sehr stark davon.
Last edited:
root@pve5:~# fio --ioengine=libaio --filename=/dev/nvme4n1 --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 -runtime=60 --time_based --name=fio -iodepth=16
fio: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=794MiB/s][w=203k IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=375347: Tue Oct  1 10:27:15 2024
  write: IOPS=216k, BW=845MiB/s (886MB/s)(49.5GiB/60001msec); 0 zone resets
    slat (usec): min=2, max=668, avg= 3.79, stdev= 1.14
    clat (usec): min=12, max=738, avg=69.97, stdev= 4.95
     lat (usec): min=16, max=741, avg=73.76, stdev= 5.07
    clat percentiles (usec):
     |  1.00th=[   67],  5.00th=[   68], 10.00th=[   69], 20.00th=[   69],
     | 30.00th=[   69], 40.00th=[   69], 50.00th=[   69], 60.00th=[   69],
     | 70.00th=[   70], 80.00th=[   71], 90.00th=[   76], 95.00th=[   81],
     | 99.00th=[   89], 99.50th=[   93], 99.90th=[  105], 99.95th=[  113],
     | 99.99th=[  133]
   bw (  KiB/s): min=696776, max=877664, per=100.00%, avg=866072.40, stdev=24259.75, samples=119
   iops        : min=174194, max=219416, avg=216518.18, stdev=6064.95, samples=119
  lat (usec)   : 20=0.01%, 50=0.01%, 100=99.81%, 250=0.19%, 500=0.01%
  lat (usec)   : 750=0.01%
  cpu          : usr=12.48%, sys=87.50%, ctx=990, majf=0, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,12980872,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=845MiB/s (886MB/s), 845MiB/s-845MiB/s (886MB/s-886MB/s), io=49.5GiB (53.2GB), run=60001-60001msec

Disk stats (read/write):
  nvme4n1: ios=0/12948135, merge=0/0, ticks=0/97522, in_queue=97522, util=99.52%
Die Platten sind Micron:

DC NVMe ISE 7450 RI U.2 7.68TB    Micron Technology Inc

Und die Platten sind direkt angeschlossen.
Ich frage mich halt warum ceph-csi so viel langsamer ist.

ceph-csi: IOPS=2961.541016 BW(KiB/s)=11862
virtuelle Platte auf dem geiche System: IOPS=42727.835938 BW(KiB/s)=170928 auf Ceph


