fio with ioengine=rbd doesn't work with Proxmox's Ceph?

victorhooi

Well-Known Member
Apr 3, 2018
255
20
58
38
I have a new Proxmox cluster setup, with Ceph setup as well.

I have created my OSDs, and my Ceph pool.

I'm now trying to use fio with ioengine=rbd to benchmark the setup, based on some of the examples here.

However, it doesn't appear to be working on Proxmox's Ceph setup out of the box:

Code:
# fio -ioengine=rbd -direct=1 -name=test -bs=4M -iodepth=16 -rw=write -pool=vm_storage -runtime=60 -rbdname=testimg
test: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=rbd, iodepth=16
fio-3.12
Starting 1 process
rbd_open failed.
fio_rbd_connect failed.


Run status group 0 (all jobs):
I know that Proxmox doesn't create the default Ceph pool named "rbd". But in this case, I am specifying the RBD of "vm_storage".

Any ideas why it's not working?

(I also have no /dev/rb* devices, but not sure if that is expected).
 
Hi,

because you have to tell fio a bit more than rbd engine.

Code:
      (rbd,rados)clustername=str
              Specifies the name of the Ceph cluster.

       (rbd)rbdname=str
              Specifies the name of the RBD.

       (rbd,rados)pool=str
              Specifies the name of the Ceph pool containing RBD or RADOS data.

       (rbd,rados)clientname=str
              Specifies the username (without the 'client.' prefix) used to access the Ceph cluster. If the clustername is specified, the clientname shall be the full *type.id* string. If no type. prefix  is  given,  fio  will  add
              'client.'  by default.

       (rbd,rados)busy_poll=bool
              Poll store instead of waiting for completion. Usually this provides better throughput at cost of higher(up to 100%) CPU utilization.


see man fio for more details
 
Thanks wolfgang and spirit for the pointer! =)

The issue was the rbdname - I needed to point it to an actual RBD volume.

The client name is just the Ceph username (e.g. "admin"). I assume fio must use a default of admin, as it seems to work without it (and I assume Proxmox creates the user "admin" as part of the Ceph setup).

For rbdname (RBD Volume) - as this was a new cluster, I didn't have any setup. However, I created a new VM in Proxmox with a 32GB disk on the Ceph pool, and it created a new RBD volume for me (vm-100-disk-0).

Below is my working command-line:

Code:
# fio -ioengine=rbd -direct=1 -name=test -bs=4M -iodepth=16 -rw=write -pool=vm_storage -runtime=60 -rbdname=vm-100-disk-0
test: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=rbd, iodepth=16
fio-3.12
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=721MiB/s][w=180 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=904720: Wed Feb  5 01:03:52 2020
  write: IOPS=194, BW=779MiB/s (817MB/s)(32.0GiB/42072msec); 0 zone resets
    slat (usec): min=858, max=12005, avg=1125.99, stdev=456.70
    clat (msec): min=28, max=358, avg=81.00, stdev=29.10
     lat (msec): min=29, max=360, avg=82.12, stdev=29.10
    clat percentiles (msec):
     |  1.00th=[   37],  5.00th=[   45], 10.00th=[   50], 20.00th=[   57],
     | 30.00th=[   63], 40.00th=[   69], 50.00th=[   75], 60.00th=[   83],
     | 70.00th=[   92], 80.00th=[  104], 90.00th=[  123], 95.00th=[  136],
     | 99.00th=[  165], 99.50th=[  176], 99.90th=[  211], 99.95th=[  234],
     | 99.99th=[  359]
   bw (  KiB/s): min=696320, max=884736, per=99.95%, avg=797124.33, stdev=44100.14, samples=84
   iops        : min=  170, max=  216, avg=194.56, stdev=10.77, samples=84
  lat (msec)   : 50=11.02%, 100=66.13%, 250=22.83%, 500=0.02%
  cpu          : usr=21.20%, sys=1.21%, ctx=5381, majf=14, minf=225749
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.8%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,8192,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  WRITE: bw=779MiB/s (817MB/s), 779MiB/s-779MiB/s (817MB/s-817MB/s), io=32.0GiB (34.4GB), run=42072-42072msec

This was on a 3-node hyperconverged cluster.

Each node has 8 x Samsung SM863a drives, for a total of 24 SSDs.

We created a single OSD on each SSD, for a total of 24 OSDs.

The above IOPS for sequential writes seems a little low for me, but we haven't done much tuning yet.