PVE 6.3 with HA Ceph iSCSI

David Herselman · Jan 7, 2021

Hi,

To start I would not recommend that people use this to somehow cook together PVE using a remote cluster via iSCSI as storage for VMs. In our case we have a secondary cluster which used to host a multi-tenant internet based backup service which comprised of 6 servers with 310 TiB available to Ceph. We now use this cluster to do daily incremental Ceph backups of VMs in other clusters and use benjie (https://benji-backup.me/) to deduplicate, compress and encrypt 4 MiB data chunks and then store them in Ceph via the S3 compatible rados gateway (rgw) service. Benjie uses rolling snapshots to essentially be told what data has changed on the images being backed up, so no scanning drives (it has it's own daily data verification logic) and is multi-threaded in that multiple 4 MiB data chunks can get processed simultaneously. This is working well, so well in fact that it requires a fraction of the resources to do backups and the required storage space is minuscule. The best thing is that this cluster has zero load during standard office hours.

Right, so after a year we now store 31 days of daily backups, 12 weekly and 12 monthly backups of images from other Proxmox Ceph clusters. Projecting this forward for 3 years, with a surprisingly linear space increase pattern, means that we safely have 200 TiB available for sale to customers. Each of the 6 nodes can swallow 12 x 3.5 inch discs and has the OS installed on a relatively small software raided partition with separate 2 x SSDs in each system. The remaining space on the SSDs is used as Ceph OSDs to provide a flash only data pool and cache tiering pools for rotational media erasure coded pools (k=3, m=2). All drives are identically distributed on nodes and essentially come from repurposed equipment. We can subsequently easily upgrade storage by replacing the ancient 2 TiB spinners with much larger discs.

PS: We run ec32 and not ec42 as we wish to sustain the failure of an entire node and have Ceph being able to heal itself afterwards, again providing for a failure of a second node before the first is repaired.

Herewith a sample of the OSD placements:

Code:

ID     CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META     AVAIL    %USE   VAR   PGS  STATUS
  100    hdd  1.81898   1.00000  1.8 TiB  236 GiB  234 GiB  4.3 MiB  1.9 GiB  1.6 TiB  12.66  0.99   46      up
  101    hdd  1.81898   1.00000  1.8 TiB  257 GiB  255 GiB  3.8 MiB  2.2 GiB  1.6 TiB  13.79  1.08   51      up
  102    hdd  5.45799   1.00000  5.5 TiB  707 GiB  702 GiB   10 MiB  5.1 GiB  4.8 TiB  12.65  0.99  141      up
  103    hdd  5.45799   1.00000  5.5 TiB  702 GiB  697 GiB  7.7 MiB  5.1 GiB  4.8 TiB  12.56  0.99  139      up
  104    hdd  9.09599   1.00000  9.1 TiB  1.1 TiB  1.1 TiB  8.0 MiB  7.5 GiB  8.0 TiB  12.43  0.98  228      up
  105    hdd  9.09599   1.00000  9.1 TiB  1.1 TiB  1.1 TiB  7.4 MiB  7.7 GiB  8.0 TiB  12.42  0.98  225      up
  106    hdd  9.09599   1.00000  9.1 TiB  1.1 TiB  1.1 TiB  4.6 MiB  7.6 GiB  8.0 TiB  12.47  0.98  235      up
  107    hdd  9.09599   1.00000  9.1 TiB  1.1 TiB  1.1 TiB  2.3 MiB  7.5 GiB  8.0 TiB  12.35  0.97  224      up
10100    ssd  0.37700   1.00000  386 GiB  113 GiB  110 GiB  2.5 GiB  846 MiB  273 GiB  29.27  2.30   51      up
10101    ssd  0.37700   1.00000  386 GiB  105 GiB  103 GiB  1.4 GiB  958 MiB  281 GiB  27.28  2.14   55      up

Our clusters that host customer workloads comprise purely of flash with 3 way RBD replication, so costs aren't cheap but performance is excellent. Having redundant iSCSI connectivity would be really useful in some scenarios where we host physical equipment for customers. We have some benchmark data when running a 2 vCPU Windows 2012r2 VM against multi-path (active/backup) ceph-iscsi gateways and it really doesn't look bad. This is intended as a slow bulk storage tier for some customers that have essentially taken their on-premise file servers in to the cloud before transitioning to something like SharePoint. We intend to access RBD images in this cluster directly via external-ceph, from the current production clusters in the same data centre and will most probably use this very sparingly.

We have the following Ceph pools:

As of writing this PVE 6.3 is layered on top of Debian 10.7 with Proxmox's Ceph Octopus packages being 15.2.8. We installed the ceph-iscsi package from the Debian Buster (10) backports repository and got python3-rtslib-fb_2.1.71 from Debian Sid.

The following are my speed notes, you'll want to review content at the following locations as part of setting this up:

PS: We tested everything using real wildcard certificates so others may need to turn off SSL verification.

Herewith my notes:

Code:

  # Ceph iSCSI stores configuration metadata in a rbd pool. We typically rename the default 'rbd' pool as
  # rbd_hdd or rbd_ssd so we create a pool to hold the config and the images. We'll create the images manually
  # to place the actual data in the ec_hdd data pool:
    ceph osd pool create iscsi 8 replicated replicated_ssd;
    ceph osd pool application enable iscsi rbd
  # Adjust Ceph OSD timers:
    # Default:
    #   osd_client_watch_timeout: 30
    #   osd_heartbeat_grace     : 20
    #   osd_heartbeat_interval  : 6
    for setting in 'osd_client_watch_timeout 15' 'osd_heartbeat_grace 20' 'osd_heartbeat_interval 5'; do echo ceph config set osd "$setting"; done
    #for setting in 'osd_client_watch_timeout 15' 'osd_heartbeat_grace 20' 'osd_heartbeat_interval 5'; do echo ceph tell osd.* injectargs "--$setting"; done
    for OSD in 100; do for setting in osd_client_watch_timeout osd_heartbeat_grace osd_heartbeat_interval; do ceph daemon osd.$OSD config get $setting; done; done
    iscsi_api_passwd=`cat /dev/urandom | tr -dc 'A-Za-z0-9' | fold -w 32 | head -n 1`;
    echo $iscsi_api_passwd;
    echo -e "[config]\ncluster_name = ceph\npool = iscsi\ngateway_keyring = ceph.client.admin.keyring\napi_secure = true\napi_user = admin\napi_password = $iscsi_api_passwd" > /etc/pve/iscsi-gateway.cfg;

  # Ceph Dashboard:
    ceph dashboard iscsi-gateway-list;
    echo "ceph dashboard iscsi-gateway-add https://admin:$iscsi_api_passwd@kvm7a.fqdn:5000 kvm7a.fqdn" | sh;
    echo "ceph dashboard iscsi-gateway-add https://admin:$iscsi_api_passwd@kvm7b.fqdn:5000 kvm7b.fqdn" | sh;
    echo "ceph dashboard iscsi-gateway-add https://admin:$iscsi_api_passwd@kvm7e.fqdn:5000 kvm7e.fqdn" | sh;
    #ceph dashboard iscsi-gateway-rm kvm7a.fqdn;
    #ceph dashboard set-iscsi-api-ssl-verification false;
    # Testing communications to api:
    #   curl --user admin:******************************** -X GET https://kvm7a.fqdn:5000/api/_ping;
    #   Add --insecure to skip x509 SSL certificate verification

  # On each node:
    apt-get update; apt-get -y dist-upgrade; apt-get autoremove; apt-get autoclean;
    pico /etc/apt/sources.list;
      # Backports
      deb http://deb.debian.org/debian buster-backports main contrib non-free
      # Packages from the Backports repository do not unnecessarily replace packages so you
      # have to specify the repository when wanting to install packages. For example:
      #   apt-get -t buster-backports install "package"
    apt-get -t buster-backports install ceph-iscsi;
    wget http://debian.mirror.ac.za/debian/pool/main/p/python-rtslib-fb/python3-rtslib-fb_2.1.71-3_all.deb;
    dpkg -i python3-rtslib-fb_2.1.71-3_all.deb && rm -f python3-rtslib-fb_2.1.71-3_all.deb;
    ln -s /etc/pve/iscsi-gateway.cfg /etc/ceph/iscsi-gateway.cfg;
    ln -s /etc/pve/local/pve-ssl.pem /etc/ceph/iscsi-gateway.crt;
    ln -s /etc/pve/local/pve-ssl.key /etc/ceph/iscsi-gateway.key;
    systemctl status rbd-target-gw;
    systemctl status rbd-target-api;


Notes:
  Create iSCSI Target:
    gwcli
      /iscsi-targets create iqn.1998-12.reversefqdn.kvm7:ceph-iscsi
      goto gateways
      create kvm7a.fqdn 198.19.35.2 skipchecks=true
      create kvm7b.fqdn 198.19.35.3 skipchecks=true
      create kvm7e.fqdn 198.19.35.6 skipchecks=true
  Pre-Create Disk and define it under a new client host:
    rbd create iscsi/vm-169-disk-2 --data-pool ec_hdd --size 500G;
    #                ^^^^^^^^^^^^^             ^^^^^^        ^^^
    gwcli
      /disks/ create pool=iscsi image=vm-169-disk-2 size=500g
      #                               ^^^^^^^^^^^^^      ^^^
      # NB: This command will create new RBD images in the pool if they don't exist yet, make sure to pre-create it to place data in an erasure coded pool
      /iscsi-targets/iqn.1998-12.reversefqdn.kvm7:ceph-iscsi/hosts
      create iqn.1991-05.com.microsoft:win-test
      auth username=ceph-iscsi password=********************************    # cat /dev/urandom | tr -dc 'A-Za-z0-9' | fold -w 32 | head -n 1
      disk add iscsi/vm-169-disk-2
      #        ^^^^^^^^^^^^^^^^^^^
  Restart Ceph iSCSI API service if it was disabled due to it repeatedly failing before node joins cluster:
    systemctl stop rbd-target-api;
    systemctl reset-failed;
    systemctl start rbd-target-api;
    systemctl status rbd-target-api;
  Connecting Windows (tested with 2012r2 although documentation indicated 2016 being required):
    https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html/block_device_guide/using_an_iscsi_gateway#the_iscsi_initiator_for_microsoft_windows
  Increase disk's kernel cache:
    gwcli
      /disks/iscsi/vm-169-disk-2
      info
      reconfigure max_data_area_mb 128
  Delete client:
    gwcli
      /iscsi-targets/iqn.1998-12.reversefqdn.kvm7:ceph-iscsi/hosts
      delete iqn.1991-05.com.microsoft:win-test
      /iscsi-targets/iqn.1998-12.reversefqdn.kvm7:ceph-iscsi/disks
      delete disk=iscsi/vm-169-disk-2
      /disks
      #delete image_id=iscsi/vm-169-disk-2    # NB: This deletes the RBD image! Rename it before deleting to keep the image
      #  eg:  rbd mv iscsi/vm-169-disk-2 iscsi/vm-169-disk-2_KEEP
      #       gwcli /disks delete image_id=iscsi/vm-169-disk-2
  Dump config:
    gwcli export copy
  Debug:
    tail -f /var/log/rbd-target-api/rbd-target-api.log

The Ceph Dashboard integration isn't necessary but it is pretty:

Quick speed notes on how to setup Ceph Dashboard:

Code:

ceph mgr module enable dashboard;
  #ceph dashboard create-self-signed-cert;
  ceph dashboard set-ssl-certificate -i /etc/pve/local/pve-ssl.pem;
  ceph dashboard set-ssl-certificate-key -i /etc/pve/local/pve-ssl.key;
  ceph dashboard ac-user-create admin ***************** administrator
    # URL eg:  https://kvm7a.fqdn:8443/
  #ceph config set mgr mgr/dashboard/ssl true;

Update: Forgot to include a snippet of the gwcli which represents the pools in a pretty way with loads of information:

David Herselman · Jan 7, 2021

We used Microsoft's TechNet Diskspd utility (https://gallery.technet.microsoft.com/DiskSpd-a-robust-storage-6cd2f223) to benchmark performance.

The tool is command line based, herewith the parameters it was run with:

Code:

@echo off
cd "C:\Users\Administrator\Desktop\Diskspd-v2.0.17\amd64fre"
echo "Flushing each write:"
diskspd.exe -b8k -d120 -Suw -L -o2 -t4 -r -w30 -c250M e:\io.dat
echo "Default Windows flush behaviour:"
diskspd.exe -b8k -d120 -Sb -L -o2 -t4 -r -w30 -c250M e:\io.dat
pause

Notes:
Both tests use a 30% write / 70% read distribution using an 8 KB block size (typical of NTFS file systems hosting I/O transactions for a SQL server). The tests runs for a duration of 2 minutes with 4 worker threads and 2 outstanding I/O requests per thread. We purposefully ran 4 threads with only 2 vCPUs to simulate contention and get worse case scenario results.

The first run sends a flush instruction with each write, simulating writing behaviour of an ACID compliant database. The second run simply commits data and relies on standard Windows flush instructions, usually every 8 seconds, where Windows would additionally flush the cache when truncating the file system journal or it updates the file system structure.

NB: These results may appear underwhelming, especially when compared to a PC or laptop with a SSD. It is important to note that the storage sub system will only confirm writes when they have reached stable media in triplicate (RBD) or four data placements with erasure coding, physically distributed on separate storage servers. We do not use a writeback caching layer, so committed data is always actually committed.

Our recommendation is that you run the same benchmark on your existing systems and thereafter compare results.

Herewith a summary from the benchmark info further below:

Code:

Type                                      IOPs     MB/s
Ceph iSCSI - HDD - RBD - Flush             697     5.45
Ceph iSCSI - HDD - RBD - Normal Bulk    130367  1018.49
Ceph iSCSI - HDD - EC - Flush              935     7.31
Ceph iSCSI - HDD - EC - Normal Bulk     105476   824.03
Ceph - SSD - RBD - Flush                 10324    80.66
Ceph - SSD - RBD - Normal Bulk          117200   915.63

Increasing the max_data_area_mb from 8 to 128 MiB made little difference in our tests:
Type                                      IOPs     MB/s
Ceph iSCSI - HDD - EC - Flush              904     7.06
Ceph iSCSI - HDD - EC - Normal Bulk     129243  1009.71

Ceph iSCSI remote HDD RBD exported and accessed via MultiPath iSCSI - Flushing each write:

Code:

Command Line: diskspd.exe -b8k -d120 -Suw -L -o2 -t4 -r -w30 -c250M e:\io.dat

Input parameters:

        timespan:   1
        -------------
        duration: 120s
        warm up time: 5s
        cool down time: 0s
        measuring latency
        random seed: 0
        path: 'e:\io.dat'
                think time: 0ms
                burst size: 0
                software cache disabled
                hardware write cache disabled, writethrough on
                performing mix test (read/write ratio: 70/30)
                block size: 8192
                using random I/O (alignment: 8192)
                number of outstanding I/O operations: 2
                thread stride size: 0
                threads per file: 4
                using I/O Completion Ports
                IO priority: normal



Results for timespan 1:
*******************************************************************************

actual test time:       120.01s
thread count:           4
proc count:             2

CPU |  Usage |  User  |  Kernel |  Idle
-------------------------------------------
   0|   6.03%|   0.25%|    5.78%|  93.98%
   1|   2.25%|   0.21%|    2.04%|  97.75%
-------------------------------------------
avg.|   4.14%|   0.23%|    3.91%|  95.86%

Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |       171540480 |        20940 |       1.36 |     174.48 |   11.468 |  19.479 | e:\io.dat (250MB)
     1 |       169156608 |        20649 |       1.34 |     172.06 |   11.623 |  21.085 | e:\io.dat (250MB)
     2 |       172638208 |        21074 |       1.37 |     175.60 |   11.391 |  19.840 | e:\io.dat (250MB)
     3 |       172220416 |        21023 |       1.37 |     175.18 |   11.418 |  19.883 | e:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:         685555712 |        83686 |       5.45 |     697.32 |   11.474 |  20.077

Read IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |       119693312 |        14611 |       0.95 |     121.75 |    9.484 |  14.762 | e:\io.dat (250MB)
     1 |       117768192 |        14376 |       0.94 |     119.79 |    9.442 |  15.152 | e:\io.dat (250MB)
     2 |       121323520 |        14810 |       0.96 |     123.41 |    9.565 |  16.018 | e:\io.dat (250MB)
     3 |       120266752 |        14681 |       0.96 |     122.33 |    9.444 |  15.006 | e:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:         479051776 |        58478 |       3.81 |     487.27 |    9.484 |  15.245

Write IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |        51847168 |         6329 |       0.41 |      52.74 |   16.049 |  26.875 | e:\io.dat (250MB)
     1 |        51388416 |         6273 |       0.41 |      52.27 |   16.622 |  30.023 | e:\io.dat (250MB)
     2 |        51314688 |         6264 |       0.41 |      52.20 |   15.708 |  26.289 | e:\io.dat (250MB)
     3 |        51953664 |         6342 |       0.41 |      52.85 |   15.987 |  27.555 | e:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:         206503936 |        25208 |       1.64 |     210.05 |   16.091 |  27.722


  %-ile |  Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
    min |      2.281 |      3.972 |      2.281
   25th |      4.852 |      7.045 |      5.379
   50th |      6.482 |      9.307 |      7.360
   75th |      9.654 |     15.067 |     11.062
   90th |     15.740 |     27.561 |     19.333
   95th |     22.279 |     41.725 |     28.215
   99th |     48.117 |    129.090 |     74.746
3-nines |    298.370 |    383.684 |    321.961
4-nines |    383.915 |    503.075 |    460.005
5-nines |    486.193 |    570.680 |    570.680
6-nines |    486.193 |    570.680 |    570.680
7-nines |    486.193 |    570.680 |    570.680
8-nines |    486.193 |    570.680 |    570.680
9-nines |    486.193 |    570.680 |    570.680
    max |    486.193 |    570.680 |    570.680

Ceph iSCSI remote HDD RBD exported and accessed via MultiPath iSCSI - Default Windows flush behaviour:

Code:

Command Line: diskspd.exe -b8k -d120 -Sb -L -o2 -t4 -r -w30 -c250M e:\io.dat

Input parameters:

        timespan:   1
        -------------
        duration: 120s
        warm up time: 5s
        cool down time: 0s
        measuring latency
        random seed: 0
        path: 'e:\io.dat'
                think time: 0ms
                burst size: 0
                using software cache
                using hardware write cache, writethrough off
                performing mix test (read/write ratio: 70/30)
                block size: 8192
                using random I/O (alignment: 8192)
                number of outstanding I/O operations: 2
                thread stride size: 0
                threads per file: 4
                using I/O Completion Ports
                IO priority: normal



Results for timespan 1:
*******************************************************************************

actual test time:       120.01s
thread count:           4
proc count:             2

CPU |  Usage |  User  |  Kernel |  Idle
-------------------------------------------
   0|  86.71%|   7.10%|   79.62%|  13.29%
   1|  79.27%|   6.74%|   72.52%|  20.74%
-------------------------------------------
avg.|  82.99%|   6.92%|   76.07%|  17.02%

Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |     34099658752 |      4162556 |     270.99 |   34686.30 |    0.057 |   1.063 | e:\io.dat (250MB)
     1 |     31984738304 |      3904387 |     254.18 |   32535.00 |    0.061 |   1.221 | e:\io.dat (250MB)
     2 |     30441316352 |      3715981 |     241.91 |   30965.02 |    0.029 |   0.686 | e:\io.dat (250MB)
     3 |     31636725760 |      3861905 |     251.41 |   32181.00 |    0.030 |   0.770 | e:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:      128162439168 |     15644829 |    1018.49 |  130367.31 |    0.045 |   0.965

Read IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |     23859044352 |      2912481 |     189.61 |   24269.51 |    0.040 |   0.748 | e:\io.dat (250MB)
     1 |     22385115136 |      2732558 |     177.89 |   22770.22 |    0.043 |   0.946 | e:\io.dat (250MB)
     2 |     21292826624 |      2599222 |     169.21 |   21659.14 |    0.012 |   0.354 | e:\io.dat (250MB)
     3 |     22132506624 |      2701722 |     175.88 |   22513.27 |    0.013 |   0.379 | e:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:       89669492736 |     10945983 |     712.59 |   91212.14 |    0.027 |   0.662

Write IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |     10240614400 |      1250075 |      81.38 |   10416.79 |    0.097 |   1.566 | e:\io.dat (250MB)
     1 |      9599623168 |      1171829 |      76.29 |    9764.77 |    0.103 |   1.697 | e:\io.dat (250MB)
     2 |      9148489728 |      1116759 |      72.70 |    9305.88 |    0.069 |   1.127 | e:\io.dat (250MB)
     3 |      9504219136 |      1160183 |      75.53 |    9667.73 |    0.069 |   1.280 | e:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:       38492946432 |      4698846 |     305.90 |   39155.17 |    0.085 |   1.441


  %-ile |  Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
    min |      0.001 |      0.002 |      0.001
   25th |      0.007 |      0.013 |      0.008
   50th |      0.012 |      0.039 |      0.015
   75th |      0.023 |      0.090 |      0.038
   90th |      0.049 |      0.148 |      0.092
   95th |      0.079 |      0.192 |      0.133
   99th |      0.157 |      0.300 |      0.230
3-nines |      0.299 |      4.655 |      0.729
4-nines |     10.397 |     28.153 |     16.651
5-nines |    132.805 |    201.166 |    183.607
6-nines |    203.206 |    320.630 |    246.811
7-nines |    379.525 |    433.931 |    433.929
8-nines |    433.929 |    433.931 |    433.931
9-nines |    433.929 |    433.931 |    433.931
    max |    433.929 |    433.931 |    433.931

David Herselman · Jan 7, 2021

Ceph iSCSI remote HDD EC exported and accessed via MultiPath iSCSI - Flushing each write:

Code:

Command Line: diskspd.exe -b8k -d120 -Suw -L -o2 -t4 -r -w30 -c250M e:\io.dat

Input parameters:

        timespan:   1
        -------------
        duration: 120s
        warm up time: 5s
        cool down time: 0s
        measuring latency
        random seed: 0
        path: 'e:\io.dat'
                think time: 0ms
                burst size: 0
                software cache disabled
                hardware write cache disabled, writethrough on
                performing mix test (read/write ratio: 70/30)
                block size: 8192
                using random I/O (alignment: 8192)
                number of outstanding I/O operations: 2
                thread stride size: 0
                threads per file: 4
                using I/O Completion Ports
                IO priority: normal



Results for timespan 1:
*******************************************************************************

actual test time:       120.00s
thread count:           4
proc count:             2

CPU |  Usage |  User  |  Kernel |  Idle
-------------------------------------------
   0|   9.01%|   0.29%|    8.72%|  90.99%
   1|   1.82%|   0.20%|    1.63%|  98.18%
-------------------------------------------
avg.|   5.42%|   0.24%|    5.18%|  94.58%

Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |       229441536 |        28008 |       1.82 |     233.40 |    8.566 |   3.015 | e:\io.dat (250MB)
     1 |       229785600 |        28050 |       1.83 |     233.75 |    8.554 |   3.130 | e:\io.dat (250MB)
     2 |       229826560 |        28055 |       1.83 |     233.79 |    8.552 |   3.054 | e:\io.dat (250MB)
     3 |       230178816 |        28098 |       1.83 |     234.15 |    8.539 |   2.855 | e:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:         919232512 |       112211 |       7.31 |     935.09 |    8.553 |   3.015

Read IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |       159940608 |        19524 |       1.27 |     162.70 |    8.364 |   2.474 | e:\io.dat (250MB)
     1 |       160112640 |        19545 |       1.27 |     162.87 |    8.364 |   2.700 | e:\io.dat (250MB)
     2 |       161447936 |        19708 |       1.28 |     164.23 |    8.377 |   2.665 | e:\io.dat (250MB)
     3 |       160948224 |        19647 |       1.28 |     163.72 |    8.336 |   2.434 | e:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:         642449408 |        78424 |       5.11 |     653.53 |    8.360 |   2.571

Write IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |        69500928 |         8484 |       0.55 |      70.70 |    9.032 |   3.951 | e:\io.dat (250MB)
     1 |        69672960 |         8505 |       0.55 |      70.87 |    8.992 |   3.908 | e:\io.dat (250MB)
     2 |        68378624 |         8347 |       0.54 |      69.56 |    8.967 |   3.785 | e:\io.dat (250MB)
     3 |        69230592 |         8451 |       0.55 |      70.42 |    9.011 |   3.607 | e:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:         276783104 |        33787 |       2.20 |     281.56 |    9.000 |   3.816


  %-ile |  Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
    min |      1.881 |      3.612 |      1.881
   25th |      7.315 |      7.523 |      7.425
   50th |      8.578 |      9.044 |      8.923
   75th |      9.423 |     10.508 |     10.218
   90th |     10.765 |     10.989 |     10.801
   95th |     12.091 |     12.146 |     12.110
   99th |     13.661 |     15.013 |     13.737
3-nines |     20.700 |     64.020 |     37.480
4-nines |     74.637 |    134.935 |    107.905
5-nines |    136.838 |    156.179 |    136.838
6-nines |    136.838 |    156.179 |    156.179
7-nines |    136.838 |    156.179 |    156.179
8-nines |    136.838 |    156.179 |    156.179
9-nines |    136.838 |    156.179 |    156.179
    max |    136.838 |    156.179 |    156.179

Ceph iSCSI remote HDD RBD exported and accessed via MultiPath iSCSI - Default Windows flush behaviour:

Code:

Command Line: diskspd.exe -b8k -d120 -Sb -L -o2 -t4 -r -w30 -c250M e:\io.dat

Input parameters:

        timespan:   1
        -------------
        duration: 120s
        warm up time: 5s
        cool down time: 0s
        measuring latency
        random seed: 0
        path: 'e:\io.dat'
                think time: 0ms
                burst size: 0
                using software cache
                using hardware write cache, writethrough off
                performing mix test (read/write ratio: 70/30)
                block size: 8192
                using random I/O (alignment: 8192)
                number of outstanding I/O operations: 2
                thread stride size: 0
                threads per file: 4
                using I/O Completion Ports
                IO priority: normal



Results for timespan 1:
*******************************************************************************

actual test time:       120.01s
thread count:           4
proc count:             2

CPU |  Usage |  User  |  Kernel |  Idle
-------------------------------------------
   0|  63.50%|   3.84%|   59.66%|  36.51%
   1|  57.12%|   4.62%|   52.49%|  42.89%
-------------------------------------------
avg.|  60.31%|   4.23%|   56.08%|  39.70%

Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |     27328241664 |      3335967 |     217.16 |   27796.89 |    0.035 |   0.585 | e:\io.dat (250MB)
     1 |     26315603968 |      3212354 |     209.12 |   26766.89 |    0.037 |   0.665 | e:\io.dat (250MB)
     2 |     24676802560 |      3012305 |     196.09 |   25099.98 |    0.039 |   0.714 | e:\io.dat (250MB)
     3 |     25377193984 |      3097802 |     201.66 |   25812.39 |    0.038 |   0.698 | e:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:      103697842176 |     12658428 |     824.03 |  105476.15 |    0.037 |   0.665

Read IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |     19128893440 |      2335070 |     152.01 |   19456.93 |    0.022 |   0.348 | e:\io.dat (250MB)
     1 |     18407858176 |      2247053 |     146.28 |   18723.53 |    0.022 |   0.366 | e:\io.dat (250MB)
     2 |     17258315776 |      2106728 |     137.14 |   17554.28 |    0.024 |   0.343 | e:\io.dat (250MB)
     3 |     17754095616 |      2167248 |     141.08 |   18058.56 |    0.023 |   0.372 | e:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:       72549163008 |      8856099 |     576.51 |   73793.30 |    0.023 |   0.357

Write IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |      8199348224 |      1000897 |      65.16 |    8339.96 |    0.067 |   0.925 | e:\io.dat (250MB)
     1 |      7907745792 |       965301 |      62.84 |    8043.36 |    0.070 |   1.077 | e:\io.dat (250MB)
     2 |      7418486784 |       905577 |      58.95 |    7545.71 |    0.076 |   1.191 | e:\io.dat (250MB)
     3 |      7623098368 |       930554 |      60.58 |    7753.83 |    0.073 |   1.140 | e:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:       31148679168 |      3802329 |     247.52 |   31682.85 |    0.071 |   1.084


  %-ile |  Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
    min |      0.001 |      0.002 |      0.001
   25th |      0.005 |      0.010 |      0.005
   50th |      0.006 |      0.052 |      0.007
   75th |      0.010 |      0.088 |      0.021
   90th |      0.018 |      0.124 |      0.076
   95th |      0.025 |      0.149 |      0.106
   99th |      0.044 |      0.212 |      0.171
3-nines |      6.022 |      2.021 |      5.985
4-nines |      7.532 |     17.944 |      9.974
5-nines |     18.457 |    193.415 |     99.376
6-nines |    134.396 |    331.140 |    209.154
7-nines |    207.625 |    389.880 |    378.218
8-nines |    207.625 |    389.880 |    389.880
9-nines |    207.625 |    389.880 |    389.880
    max |    207.625 |    389.880 |    389.880

David Herselman · Jan 7, 2021

Ceph local SSD RBD - Flushing each write:

Code:

Command Line: diskspd.exe -b8k -d120 -Suw -L -o2 -t4 -r -w30 -c250M c:\io.dat

Input parameters:

        timespan:   1
        -------------
        duration: 120s
        warm up time: 5s
        cool down time: 0s
        measuring latency
        random seed: 0
        path: 'c:\io.dat'
                think time: 0ms
                burst size: 0
                software cache disabled
                hardware write cache disabled, writethrough on
                performing mix test (read/write ratio: 70/30)
                block size: 8192
                using random I/O (alignment: 8192)
                number of outstanding I/O operations: 2
                thread stride size: 0
                threads per file: 4
                using I/O Completion Ports
                IO priority: normal



Results for timespan 1:
*******************************************************************************

actual test time:       120.00s
thread count:           4
proc count:             2

CPU |  Usage |  User  |  Kernel |  Idle
-------------------------------------------
   0|   7.79%|   0.81%|    6.98%|  92.21%
   1|  12.89%|   1.15%|   11.74%|  87.11%
-------------------------------------------
avg.|  10.34%|   0.98%|    9.36%|  89.66%

Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |      2536603648 |       309644 |      20.16 |    2580.33 |    0.774 |   3.428 | c:\io.dat (250MB)
     1 |      2545811456 |       310768 |      20.23 |    2589.69 |    0.771 |   3.613 | c:\io.dat (250MB)
     2 |      2526208000 |       308375 |      20.08 |    2569.75 |    0.777 |   3.613 | c:\io.dat (250MB)
     3 |      2540822528 |       310159 |      20.19 |    2584.62 |    0.773 |   3.477 | c:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:       10149445632 |      1238946 |      80.66 |   10324.39 |    0.774 |   3.534

Read IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |      1778016256 |       217043 |      14.13 |    1808.66 |    0.139 |   1.124 | c:\io.dat (250MB)
     1 |      1781284864 |       217442 |      14.16 |    1811.99 |    0.131 |   0.991 | c:\io.dat (250MB)
     2 |      1771233280 |       216215 |      14.08 |    1801.76 |    0.140 |   1.107 | c:\io.dat (250MB)
     3 |      1775509504 |       216737 |      14.11 |    1806.11 |    0.133 |   1.493 | c:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:        7106043904 |       867437 |      56.47 |    7228.53 |    0.136 |   1.193

Write IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |       758587392 |        92601 |       6.03 |     771.66 |    2.262 |   5.761 | c:\io.dat (250MB)
     1 |       764526592 |        93326 |       6.08 |     777.70 |    2.263 |   6.164 | c:\io.dat (250MB)
     2 |       754974720 |        92160 |       6.00 |     767.99 |    2.273 |   6.132 | c:\io.dat (250MB)
     3 |       765313024 |        93422 |       6.08 |     778.50 |    2.259 |   5.640 | c:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:        3043401728 |       371509 |      24.19 |    3095.86 |    2.264 |   5.928


  %-ile |  Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
    min |      0.025 |      0.050 |      0.025
   25th |      0.053 |      1.314 |      0.056
   50th |      0.060 |      1.589 |      0.072
   75th |      0.076 |      2.080 |      1.236
   90th |      0.308 |      3.259 |      1.874
   95th |      0.375 |      4.434 |      2.544
   99th |      0.877 |     10.488 |      5.356
3-nines |      4.124 |     89.044 |     52.577
4-nines |     56.958 |    148.352 |    122.022
5-nines |    125.211 |    673.218 |    230.320
6-nines |    447.851 |    702.707 |    682.620
7-nines |    447.851 |    702.707 |    702.707
8-nines |    447.851 |    702.707 |    702.707
9-nines |    447.851 |    702.707 |    702.707
    max |    447.851 |    702.707 |    702.707

Ceph local SSD RBD - Default Windows flush behaviour:

Code:

Command Line: diskspd.exe -b8k -d120 -Sb -L -o2 -t4 -r -w30 -c250M c:\io.dat

Input parameters:

        timespan:   1
        -------------
        duration: 120s
        warm up time: 5s
        cool down time: 0s
        measuring latency
        random seed: 0
        path: 'c:\io.dat'
                think time: 0ms
                burst size: 0
                using software cache
                using hardware write cache, writethrough off
                performing mix test (read/write ratio: 70/30)
                block size: 8192
                using random I/O (alignment: 8192)
                number of outstanding I/O operations: 2
                thread stride size: 0
                threads per file: 4
                using I/O Completion Ports
                IO priority: normal



Results for timespan 1:
*******************************************************************************

actual test time:       120.00s
thread count:           4
proc count:             2

CPU |  Usage |  User  |  Kernel |  Idle
-------------------------------------------
   0|  68.93%|   3.95%|   64.99%|  31.07%
   1|  62.07%|   4.90%|   57.17%|  37.93%
-------------------------------------------
avg.|  65.50%|   4.42%|   61.08%|  34.50%

Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |     28751347712 |      3509686 |     228.49 |   29246.53 |    0.034 |   0.157 | c:\io.dat (250MB)
     1 |     28883697664 |      3525842 |     229.54 |   29381.16 |    0.034 |   0.130 | c:\io.dat (250MB)
     2 |     28762890240 |      3511095 |     228.58 |   29258.27 |    0.034 |   0.144 | c:\io.dat (250MB)
     3 |     28817760256 |      3517793 |     229.02 |   29314.08 |    0.034 |   0.134 | c:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:      115215695872 |     14064416 |     915.63 |  117200.04 |    0.034 |   0.142

Read IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |     20125032448 |      2456669 |     159.93 |   20471.64 |    0.009 |   0.026 | c:\io.dat (250MB)
     1 |     20199899136 |      2465808 |     160.53 |   20547.80 |    0.009 |   0.116 | c:\io.dat (250MB)
     2 |     20120338432 |      2456096 |     159.90 |   20466.87 |    0.009 |   0.162 | c:\io.dat (250MB)
     3 |     20158824448 |      2460794 |     160.20 |   20506.02 |    0.009 |   0.123 | c:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:       80604094464 |      9839367 |     640.57 |   81992.33 |    0.009 |   0.118

Write IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  AvgLat  | LatStdDev |  file
-----------------------------------------------------------------------------------------------------
     0 |      8626315264 |      1053017 |      68.55 |    8774.89 |    0.092 |   0.276 | c:\io.dat (250MB)
     1 |      8683798528 |      1060034 |      69.01 |    8833.36 |    0.091 |   0.143 | c:\io.dat (250MB)
     2 |      8642551808 |      1054999 |      68.68 |    8791.40 |    0.091 |   0.057 | c:\io.dat (250MB)
     3 |      8658935808 |      1056999 |      68.81 |    8808.07 |    0.091 |   0.142 | c:\io.dat (250MB)
-----------------------------------------------------------------------------------------------------
total:       34611601408 |      4225049 |     275.06 |   35207.71 |    0.092 |   0.173


  %-ile |  Read (ms) | Write (ms) | Total (ms)
----------------------------------------------
    min |      0.002 |      0.002 |      0.002
   25th |      0.005 |      0.065 |      0.006
   50th |      0.006 |      0.085 |      0.008
   75th |      0.009 |      0.110 |      0.058
   90th |      0.016 |      0.139 |      0.100
   95th |      0.022 |      0.159 |      0.123
   99th |      0.034 |      0.206 |      0.171
3-nines |      0.054 |      0.298 |      0.241
4-nines |      0.091 |      2.126 |      0.617
5-nines |      1.368 |      5.460 |      3.314
6-nines |     26.688 |    131.930 |     26.688
7-nines |    179.204 |    177.863 |    177.863
8-nines |    179.204 |    177.863 |    179.204
9-nines |    179.204 |    177.863 |    179.204
    max |    179.204 |    177.863 |    179.204

Search

Search

PVE 6.3 with HA Ceph iSCSI

David Herselman

Renowned Member

David Herselman

Renowned Member

David Herselman

Renowned Member

David Herselman

Renowned Member