Was kann ich an Ceph Performance erwarten

fwinkler · Oct 1, 2024

wir haben einen neuen Proxmox Cluster mit 5 Nodes und einem Backupserver. Ich kann nur überhaupt nicht wirklich einschätzen was ich an Leistung erwarten kann.

Server:
4 x Dell DC NVMe ISE 7450 RI U.2 7.68TB
4 x 10gb Netzwerkkarte
2 x 1gb Netzwerkkarte
2 x 100gb Netzwerk

Auf den 100gb Karten läuft Ceph.

Bash:

root@pve5:~# fio --ioengine=libaio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name=fio
fio: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=296MiB/s][w=75.8k IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=16447: Thu Jul 25 14:09:27 2024
  write: IOPS=75.6k, BW=295MiB/s (310MB/s)(17.3GiB/60001msec); 0 zone resets
    slat (nsec): min=2220, max=46981, avg=2428.28, stdev=309.09
    clat (nsec): min=620, max=620751, avg=10537.05, stdev=1044.35
     lat (usec): min=9, max=630, avg=12.97, stdev= 1.10
    clat percentiles (nsec):
     |  1.00th=[10176],  5.00th=[10176], 10.00th=[10304], 20.00th=[10304],
     | 30.00th=[10304], 40.00th=[10432], 50.00th=[10432], 60.00th=[10432],
     | 70.00th=[10560], 80.00th=[10560], 90.00th=[10688], 95.00th=[10944],
     | 99.00th=[13248], 99.50th=[15936], 99.90th=[24192], 99.95th=[24960],
     | 99.99th=[31872]
   bw (  KiB/s): min=298736, max=305360, per=100.00%, avg=302530.22, stdev=1471.07, samples=119
   iops        : min=74684, max=76340, avg=75632.61, stdev=367.76, samples=119
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.08%, 20=99.51%, 50=0.41%
  lat (usec)   : 100=0.01%, 250=0.01%, 750=0.01%
  cpu          : usr=6.92%, sys=39.76%, ctx=4534644, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4535227,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=295MiB/s (310MB/s), 295MiB/s-295MiB/s (310MB/s-310MB/s), io=17.3GiB (18.6GB), run=60001-60001msec

Disk stats (read/write):
  nvme0n1: ios=350/4541653, merge=8/2916, ticks=56/36713, in_queue=36769, util=99.91%

auf einer VM mit Windows 2022 hatte ich im read 24000 iops.

Linux mit kubernetes:

- mit OpenEBS Local PV(das ist dann die Lokal Platte vom Host):

Bash:

./kubestr fio -s openebs-hostpath
PVC created kubestr-fio-pvc-tkhlg
Pod created kubestr-fio-pod-hjbwb
Running FIO test (default-fio) on StorageClass (openebs-hostpath) with a PVC of Size (100Gi)
Elapsed time- 24.549538772s
FIO test results:
 
FIO version - fio-3.36
Global options - ioengine=libaio verify=0 direct=1 gtod_reduce=1

JobName: read_iops
  blocksize=4K filesize=2G iodepth=64 rw=randread
read:
  IOPS=42727.835938 BW(KiB/s)=170928
  iops: min=33366 max=45955 avg=42794.929688
  bw(KiB/s): min=133464 max=183823 avg=171180.000000

JobName: write_iops
  blocksize=4K filesize=2G iodepth=64 rw=randwrite
write:
  IOPS=34595.921875 BW(KiB/s)=138400
  iops: min=26228 max=38732 avg=34435.414062
  bw(KiB/s): min=104912 max=154928 avg=137741.859375

JobName: read_bw
  blocksize=128K filesize=2G iodepth=64 rw=randread
read:
  IOPS=32374.083984 BW(KiB/s)=4144420
  iops: min=24112 max=34513 avg=32431.792969
  bw(KiB/s): min=3086336 max=4417667 avg=4151284.750000

JobName: write_bw
  blocksize=128k filesize=2G iodepth=64 rw=randwrite
write:
  IOPS=28725.218750 BW(KiB/s)=3677365
  iops: min=14702 max=30946 avg=28716.689453
  bw(KiB/s): min=1881856 max=3961088 avg=3675741.000000

Disk stats (read/write):
  sdb: ios=1273852/1068990 merge=0/16 ticks=1965217/1675217 in_queue=3647005, util=42.120041%
  -  OK

2. mit ceph-csi block Treiber:

Bash:

./kubestr fio -s ceph-block
PVC created kubestr-fio-pvc-m9qzw
Pod created kubestr-fio-pod-4cbd2
Running FIO test (default-fio) on StorageClass (ceph-block) with a PVC of Size (100Gi)
Elapsed time- 26.418651031s
FIO test results:
 
FIO version - fio-3.36
Global options - ioengine=libaio verify=0 direct=1 gtod_reduce=1

JobName: read_iops
  blocksize=4K filesize=2G iodepth=64 rw=randread
read:
  IOPS=2961.541016 BW(KiB/s)=11862
  iops: min=2899 max=3046 avg=2966.466553
  bw(KiB/s): min=11599 max=12184 avg=11866.266602

JobName: write_iops
  blocksize=4K filesize=2G iodepth=64 rw=randwrite
write:
  IOPS=1739.318481 BW(KiB/s)=6974
  iops: min=1510 max=1814 avg=1742.433350
  bw(KiB/s): min=6040 max=7256 avg=6969.799805

JobName: read_bw
  blocksize=128K filesize=2G iodepth=64 rw=randread
read:
  IOPS=2925.999756 BW(KiB/s)=375064
  iops: min=2854 max=3018 avg=2931.933350
  bw(KiB/s): min=365312 max=386304 avg=375301.875000

JobName: write_bw
  blocksize=128k filesize=2G iodepth=64 rw=randwrite
write:
  IOPS=1756.755005 BW(KiB/s)=225401
  iops: min=1724 max=1808 avg=1757.933350
  bw(KiB/s): min=220672 max=231424 avg=225020.406250

Disk stats (read/write):
  rbd0: ios=100394/59699 merge=0/830 ticks=2173550/1308741 in_queue=3482292, util=99.479439%
  -  OK

Wie kann ich das einstufen, ceph-csi erscheint mir zu langsam?

Falk R. · Oct 1, 2024

fwinkler said:

Bash:

root@pve5:~# fio --ioengine=libaio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 --iodepth=1 --runtime=60 --time_based --name=fio
fio: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=296MiB/s][w=75.8k IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=16447: Thu Jul 25 14:09:27 2024
  write: IOPS=75.6k, BW=295MiB/s (310MB/s)(17.3GiB/60001msec); 0 zone resets
    slat (nsec): min=2220, max=46981, avg=2428.28, stdev=309.09
    clat (nsec): min=620, max=620751, avg=10537.05, stdev=1044.35
     lat (usec): min=9, max=630, avg=12.97, stdev= 1.10
    clat percentiles (nsec):
     |  1.00th=[10176],  5.00th=[10176], 10.00th=[10304], 20.00th=[10304],
     | 30.00th=[10304], 40.00th=[10432], 50.00th=[10432], 60.00th=[10432],
     | 70.00th=[10560], 80.00th=[10560], 90.00th=[10688], 95.00th=[10944],
     | 99.00th=[13248], 99.50th=[15936], 99.90th=[24192], 99.95th=[24960],
     | 99.99th=[31872]
   bw (  KiB/s): min=298736, max=305360, per=100.00%, avg=302530.22, stdev=1471.07, samples=119
   iops        : min=74684, max=76340, avg=75632.61, stdev=367.76, samples=119
  lat (nsec)   : 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 10=0.08%, 20=99.51%, 50=0.41%
  lat (usec)   : 100=0.01%, 250=0.01%, 750=0.01%
  cpu          : usr=6.92%, sys=39.76%, ctx=4534644, majf=0, minf=12
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,4535227,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=295MiB/s (310MB/s), 295MiB/s-295MiB/s (310MB/s-310MB/s), io=17.3GiB (18.6GB), run=60001-60001msec

Disk stats (read/write):
  nvme0n1: ios=350/4541653, merge=8/2916, ticks=56/36713, in_queue=36769, util=99.91%

Bei diesem Test hast du direkt eine NVMe getestet, ohne Ceph. Diese Werte sehen aber nicht nach normaler NVMe Performance aus.
Wie sind die NVMe angebunden? Dirket onboard oder über einen Perc Controller. Ja soetwas sehe ich ab und zu, je nachdem wer bei DELL das Sizing gemacht hat.
Wenn die NVMe schon nicht performt, wird das mit den zusätzlichen Latenzen des Netzwerks nicht besser.

P.S. eventuell lässt du den Test noch einmal ohne iodepth=1 laufen. Das NVMe Protokoll ist für Parallelisierung gebaut und profitiert sehr stark davon.

fwinkler · Oct 1, 2024

Bash:

root@pve5:~# fio --ioengine=libaio --filename=/dev/nvme4n1 --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1 -runtime=60 --time_based --name=fio -iodepth=16
fio: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=794MiB/s][w=203k IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=375347: Tue Oct  1 10:27:15 2024
  write: IOPS=216k, BW=845MiB/s (886MB/s)(49.5GiB/60001msec); 0 zone resets
    slat (usec): min=2, max=668, avg= 3.79, stdev= 1.14
    clat (usec): min=12, max=738, avg=69.97, stdev= 4.95
     lat (usec): min=16, max=741, avg=73.76, stdev= 5.07
    clat percentiles (usec):
     |  1.00th=[   67],  5.00th=[   68], 10.00th=[   69], 20.00th=[   69],
     | 30.00th=[   69], 40.00th=[   69], 50.00th=[   69], 60.00th=[   69],
     | 70.00th=[   70], 80.00th=[   71], 90.00th=[   76], 95.00th=[   81],
     | 99.00th=[   89], 99.50th=[   93], 99.90th=[  105], 99.95th=[  113],
     | 99.99th=[  133]
   bw (  KiB/s): min=696776, max=877664, per=100.00%, avg=866072.40, stdev=24259.75, samples=119
   iops        : min=174194, max=219416, avg=216518.18, stdev=6064.95, samples=119
  lat (usec)   : 20=0.01%, 50=0.01%, 100=99.81%, 250=0.19%, 500=0.01%
  lat (usec)   : 750=0.01%
  cpu          : usr=12.48%, sys=87.50%, ctx=990, majf=0, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,12980872,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=845MiB/s (886MB/s), 845MiB/s-845MiB/s (886MB/s-886MB/s), io=49.5GiB (53.2GB), run=60001-60001msec

Disk stats (read/write):
  nvme4n1: ios=0/12948135, merge=0/0, ticks=0/97522, in_queue=97522, util=99.52%

fwinkler · Oct 1, 2024

Die Platten sind Micron:

Bash:

DC NVMe ISE 7450 RI U.2 7.68TB    Micron Technology Inc

Und die Platten sind direkt angeschlossen.

fwinkler · Oct 1, 2024

Ich frage mich halt warum ceph-csi so viel langsamer ist.

ceph-csi: IOPS=2961.541016 BW(KiB/s)=11862
virtuelle Platte auf dem geiche System: IOPS=42727.835938 BW(KiB/s)=170928 auf Ceph

UdoB · Oct 1, 2024

fwinkler said:
Ich frage mich halt warum ceph-csi so viel langsamer ist.

Na, weil die Daten durch den Netzwerk-Stack, dann durch das Kabel, den Switch und das andere Kabel, nochmal durch die Netzwerktreiber und dann auf die Replikat-Destination müssen?

Lesen mag (manchmal) schnell gehen, aber Schreiben erzwingt dies gleich mehrfach, je nach size/min_size. Erst wenn ein Datum überall geschrieben wurde, ist der Vorgang beendet.

Lokal ist immer schneller als irgendein Netzwerkdateisystem...

Disclaimer: ich habe von Ceph keine Ahnung... ;-)

Falk R. · Oct 1, 2024

Ich habe von Kubernetes keine Ahnung, aber schon mehrfach gehört, dass CSI nicht ganz so schnell wie RBD ist.
Aber auch mit iodepth=16 kommt weniger raus als ich erwartet hätte. Was für ein Server ist das denn? Ich habe aber auch schon NVMe Performanceverbesserungen messen können, wenn man die C-States (Stromsparen) deaktiviert.

gurubert · Oct 1, 2024

Mach den Test mit fio mal direkt auf einem RBD zum Vergleich. Und spiele mal mit den Parameter --bs und --iodepth.

fwinkler · Oct 1, 2024

Modul

Beschreibung

SKU

Steuertyp

Menge

Komponenten

Basis

PowerEdge R6615

210-BFUO

SR

1

Smart Selection

Smart Selection PowerEdge R6615

486-82082

SR

1

FRONT STORAGE

2,5""-Gehäuse

379-BDTF

SR

1

Rückwand

NVMe-Rückwandplatine

379-BDSX

SR

1

REAR STORAGE

Kein Storage (Rückseite)

379-BDTE

SR

1

Trusted Platform Module

Trusted Platform Module 2.0 V3

461-AAIG

SR

1

Gehäuse-Konfiguration

2,5""-Gehäuse mit bis zu 10 NVMe-Direct-Laufwerken

321-BIFY

SR

1

Prozessor

AMD EPYC 9354P, 3,25 GHz, 32C/64T, 256 MB Cache (280 W) DDR5-4800

338-CGWZ

SR

1

Thermische Konfiguration des Prozessors

Kühlkörper mit hoher Performance

412-BBGB

SR

1

Typ von Speicherkonfiguration

Leistungsoptimierung

370-AHLL

SR

1

Speicher DIMM Typ und Geschwindigkeit

4.800 MT/s RDIMMs

370-AHCL

SR

1

Speicherkapazität

64 GB RDIMM, 4.800 MT/s, Dual Rank

370-AGZR

SR

12

RAID-Konfigurationen

C30, Kein RAID für NVME-Gehäuse

780-BCDO

SR

1

RAID/Interne Speichercontroller

Kein Controller

405-AACD

SR

1

Festplatte

Ohne Festplatte

400-ABHL

SR

1

Festplatten (PCIe SSD/Flex Bay)

7,68 TB Rechenzentrum, NVMe, leseoptimiert, AG-Laufwerk U2 der 4. Generation mit Träger

345-BJNW

SR

4

BIOS und erweiterte Systemkonfigurationseinstellungen

Energiesparende BIOS-Einstellungen

384-BBBH

SR

1

Erweiterte Systemkonfigurationen

Ohne Energy Star

387-BBEY

SR

1

Kühlung

4 Lüfter mit sehr hoher Performance für 1 CPU

384-BDHS

SR

1

Stromversorgung

Dual, redundant (1 + 1), Hot-Plug-Netzteil, 1.100 W MM (100–240 V AC) Titanium

450-AKLF

SR

1

Netzkabel

C13 zu C14, PDU Stil, 10 A, 2m (6,5 Fuß), Netzkabel

450-AADY

SR

2

PCIe-Riser

Riser-Konfiguration 3, 2 x16 FH (Gen 5)

330-BCBX

SR

1

Hauptplatine

PowerEdge R6615-Hauptplatine V2

329-BJSB

SR

1

Ethernet-Mezzanine-Adapter

Broadcom 57508, 2 Anschlüsse, 100 GbE, QSFP56, OCP NIC 3.0

540-BDXQ

SR

1

Ethernet-Mezzanine-Adapter

R6615/R7615, OCP 3.0-Kabel

540-BFDX

SR

1

Zusätzliche Netzwerkadapter

Broadcom 5720, 2 Anschlüsse, 1 GbE, LOM

540-BDKD

SR

1

Zusätzliche Netzwerkadapter

Broadcom 57454-BASE-T-Adapter, 4 Anschlüsse, 10 GbE, PCIe gesamte Höhe

540-BDLK

SR

1

Blende

Standardblende für x8/x10-Gehäuse, R6615

325-BETN

SR

1

Bootoptimierte Speicherkarten

BOSS-N1-Controller-Karte + mit 2 M.2 480 GB (RAID 1)

403-BCRU

SR

1

Bootoptimierte Speicherkarten

BOSS-Kabel, Klammer für R6615

470-AFNB

SR

1

Eingebetteter System-Management

iDRAC9, Enterprise 16G

528-CTIC

SR

1

fwinkler · Oct 1, 2024

c-stats habe ich schon aus

System Profile	Performance Per Watt (OS)PerformanceCustom
CPU Power Management	Maximum Performance
Memory Frequency	Maximum Performance
Turbo Boost	Enabled
C-States	Disabled
Memory Patrol Scrub	Standard
Memory Refresh Rate	1x
Workload Profile	Not ConfiguredHPC Profile
PCI ASPM L1 Link Power Management	Disabled
Determinism Slider	Power Determinism
Power Profile Select	High Performance Mode
PCIE Speed PMM Control	Auto
EQ Bypass To Highest Rate	Disabled
DF PState Frequency Optimizer	Enabled
DF PState Latency Optimizer	Enabled
DF CState	Enabled
Host System Management Port (HSMP) Support	Enabled
Boost FMax	0 - Auto
Algorithm Performance Boost Disable (ApbDis)	Disabled

fwinkler · Oct 1, 2024

gurubert said:
Mach den Test mit fio mal direkt auf einem RBD zum Vergleich. Und spiele mal mit den Parameter --bs und --iodepth.

Wie meinst du das? in einer vm oder auf dem proxmox host direkt?

gurubert · Oct 1, 2024

Direkt auf dem Proxmox-Knoten. Fio hat auch die Möglichkeit, mit RBDs zu sprechen.

fwinkler · Oct 1, 2024

Bash:

root@pve5:~# fio -ioengine=rbd  -pool=test -ioengine=rbd  -pool=test -direct=1 -sync=1 -rw=write -bs=4K -numjobs=1 -runtime=60 -time_based -name=fio -iodepth=4
 -rbdname=vm-113-disk-0
fio: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=4
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=33.4MiB/s][w=8555 IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=298365: Tue Oct  1 15:01:51 2024
  write: IOPS=8451, BW=33.0MiB/s (34.6MB/s)(1981MiB/60001msec); 0 zone resets
    slat (nsec): min=510, max=44300, avg=4646.06, stdev=2108.97
    clat (usec): min=307, max=13419, avg=468.33, stdev=156.84
     lat (usec): min=318, max=13423, avg=472.98, stdev=156.83
    clat percentiles (usec):
     |  1.00th=[  392],  5.00th=[  408], 10.00th=[  416], 20.00th=[  424],
     | 30.00th=[  433], 40.00th=[  441], 50.00th=[  449], 60.00th=[  457],
     | 70.00th=[  465], 80.00th=[  478], 90.00th=[  498], 95.00th=[  537],
     | 99.00th=[ 1037], 99.50th=[ 1172], 99.90th=[ 1844], 99.95th=[ 2311],
     | 99.99th=[ 7504]
   bw (  KiB/s): min=30368, max=35768, per=100.00%, avg=33829.65, stdev=918.50, samples=119
   iops        : min= 7592, max= 8942, avg=8457.41, stdev=229.62, samples=119
  lat (usec)   : 500=90.11%, 750=7.36%, 1000=1.40%
  lat (msec)   : 2=1.05%, 4=0.05%, 10=0.03%, 20=0.01%
  cpu          : usr=4.00%, sys=2.63%, ctx=277256, majf=0, minf=19
  IO depths    : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,507081,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
  WRITE: bw=33.0MiB/s (34.6MB/s), 33.0MiB/s-33.0MiB/s (34.6MB/s-34.6MB/s), io=1981MiB (2077MB), run=60001-60001msec

Disk stats (read/write):
    dm-1: ios=7/3407, merge=0/0, ticks=1/109, in_queue=110, util=1.19%, aggrios=47/1545, aggrmerge=0/0, aggrticks=15/26, aggrin_queue=41, aggrutil=0.06%
  nvme2n1: ios=47/1545, merge=0/0, ticks=15/26, in_queue=41, util=0.06%

so?

gurubert · Oct 1, 2024

Ich hoffe, auf dem VM-Image vm-113-disk-0 war nichts wichtiges drauf…

Für 4K Blocksize und 4 Threads sieht das ganz OK aus. Setz die iodepth mal auf 128 und danach die bs auf 4M mit einer iodepth von 8 oder 16.

fwinkler · Oct 1, 2024

nein

Bash:

root@pve5:~# fio -ioengine=rbd  -pool=test -ioengine=rbd  -pool=test -direct=1 -sync=1 -rw=write -bs=4K -numjobs=1 -runtime=60 -time_based -name=fio -iodepth=128 -rbdname=vm-113-disk-0
fio: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=rbd, iodepth=128
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=262MiB/s][w=67.0k IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=318694: Tue Oct  1 15:22:08 2024
  write: IOPS=66.6k, BW=260MiB/s (273MB/s)(15.3GiB/60002msec); 0 zone resets
    slat (nsec): min=770, max=374811, avg=6234.89, stdev=2866.94
    clat (usec): min=431, max=41339, avg=1914.04, stdev=446.25
     lat (usec): min=437, max=41344, avg=1920.28, stdev=446.07
    clat percentiles (usec):
     |  1.00th=[ 1074],  5.00th=[ 1467], 10.00th=[ 1614], 20.00th=[ 1696],
     | 30.00th=[ 1762], 40.00th=[ 1811], 50.00th=[ 1844], 60.00th=[ 1893],
     | 70.00th=[ 1958], 80.00th=[ 2040], 90.00th=[ 2376], 95.00th=[ 2671],
     | 99.00th=[ 3195], 99.50th=[ 3425], 99.90th=[ 4621], 99.95th=[ 5604],
     | 99.99th=[ 8029]
   bw (  KiB/s): min=230752, max=283400, per=100.00%, avg=266698.89, stdev=8539.38, samples=119
   iops        : min=57688, max=70850, avg=66674.76, stdev=2134.81, samples=119
  lat (usec)   : 500=0.01%, 750=0.01%, 1000=0.74%
  lat (msec)   : 2=75.87%, 4=23.21%, 10=0.16%, 20=0.01%, 50=0.01%
  cpu          : usr=42.82%, sys=14.02%, ctx=344658, majf=0, minf=289
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued rwts: total=0,3998730,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=128

Run status group 0 (all jobs):
  WRITE: bw=260MiB/s (273MB/s), 260MiB/s-260MiB/s (273MB/s-273MB/s), io=15.3GiB (16.4GB), run=60002-60002msec

Disk stats (read/write):
    dm-1: ios=0/3411, merge=0/0, ticks=0/213, in_queue=213, util=0.07%, aggrios=36/1700, aggrmerge=0/0, aggrticks=10/59, aggrin_queue=69, aggrutil=0.07%
  nvme2n1: ios=36/1700, merge=0/0, ticks=10/59, in_queue=69, util=0.07%

Bash:

root@pve5:~# fio -ioengine=rbd  -pool=test -ioengine=rbd  -pool=test -direct=1 -sync=1 -rw=write -bs=4M -numjobs=1 -runtime=60 -time_based -name=fio -iodepth=8 -rbdname=vm-113-disk-0
fio: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=rbd, iodepth=8
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=3219MiB/s][w=804 IOPS][eta 00m:00s]
fio: (groupid=0, jobs=1): err= 0: pid=320506: Tue Oct  1 15:24:02 2024
  write: IOPS=700, BW=2804MiB/s (2940MB/s)(164GiB/60011msec); 0 zone resets
    slat (usec): min=189, max=3059, avg=1158.16, stdev=501.84
    clat (usec): min=4342, max=65173, avg=10252.40, stdev=2559.82
     lat (usec): min=4913, max=66848, avg=11410.57, stdev=2664.94
    clat percentiles (usec):
     |  1.00th=[ 6063],  5.00th=[ 6915], 10.00th=[ 7439], 20.00th=[ 8225],
     | 30.00th=[ 8848], 40.00th=[ 9372], 50.00th=[10028], 60.00th=[10683],
     | 70.00th=[11207], 80.00th=[11863], 90.00th=[13173], 95.00th=[14615],
     | 99.00th=[17957], 99.50th=[19530], 99.90th=[25297], 99.95th=[30802],
     | 99.99th=[57934]
   bw (  MiB/s): min= 2400, max= 3392, per=100.00%, avg=2804.64, stdev=280.48, samples=119
   iops        : min=  600, max=  848, avg=701.16, stdev=70.12, samples=119
  lat (msec)   : 10=49.96%, 20=49.61%, 50=0.42%, 100=0.01%
  cpu          : usr=80.82%, sys=0.66%, ctx=10984, majf=0, minf=9221
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,42066,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=8

Run status group 0 (all jobs):
  WRITE: bw=2804MiB/s (2940MB/s), 2804MiB/s-2804MiB/s (2940MB/s-2940MB/s), io=164GiB (176GB), run=60011-60011msec

Disk stats (read/write):
    dm-1: ios=0/3184, merge=0/0, ticks=0/926, in_queue=926, util=0.31%, aggrios=36/1921, aggrmerge=0/0, aggrticks=7/30999, aggrin_queue=31006, aggrutil=0.30%
  nvme2n1: ios=36/1921, merge=0/0, ticks=7/30999, in_queue=31006, util=0.30%

gurubert · Oct 1, 2024

Max 67000 IOPS bzw 3,2GB/s sieht doch gar nicht so schlecht aus.

Das ist dann die summierte Performance, die über alle VMs zu erwarten ist.

Die einzelne VM erhält eher das Ergebnis mit 4K und 1 IO-Thread.

fwinkler · Oct 1, 2024

ja, da kommt auch noch ein zweiter 100gb Switch dazu.

Was mich halt irritiert ist das ceph-csi im Kubernetes. Das os ist talos und die iops auf der lokalen Platte, was ja auch schon ceph ist, ist ja auch ok.
Aber warum der ceph-csi Treiber so schlecht ist verstehe ich nicht.

Oder ist teste ich falsch?

Search

Search

Was kann ich an Ceph Performance erwarten

fwinkler

Member

Falk R.

Distinguished Member

fwinkler

Member

fwinkler

Member

fwinkler

Member

UdoB

Distinguished Member

Falk R.

Distinguished Member

gurubert

Famous Member

fwinkler

Member

fwinkler

Member

fwinkler

Member

gurubert

Famous Member

fwinkler

Member

gurubert

Famous Member

fwinkler

Member

gurubert

Famous Member

fwinkler

Member