CEPH Single Node High IO

mcdowellster

Well-Known Member
Jun 13, 2018
37
4
48
40
Hello All,

With any writes or reads running I seem to see high IO delays but low cpu usage.

Currently with data copying to an SMB share on a vm on another node over 10Gbit I see 2.66% CPU usage and 46% IO delay. Is this normal?

Two pools are setup. One HDD and on SSD. Both are setup correctly with the correct PGs etc.

The only thing I've done differently with this build over my last single node is that I use the SD Card controller to install the OS. It's slow. I knew it would be a mistake so I will be copying the data off the SD Cards and booting from a 128GB SSD in the future. Watching iostat doesn't suggest much activity on the OS disk (sdk) anyways.

Any help is much appreciated.

Single Node Spec:
4 x Intel(R) Xeon(R) CPU E5-2407 0 @ 2.20GHz (1 Socket)
Linux 4.15.17-1-pve #1 SMP PVE 4.15.17-9 (Wed, 9 May 2018 13:31:43 +0200)
40GB DDR 3 ECC RAM



IOSTAT:
Code:
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util

sda               0.16     0.45   20.41    5.09  2676.92    92.73   217.20     0.09    3.66    4.47    0.39   1.49   3.80
sdb               0.26     0.43   26.16    3.48  3511.54    72.33   241.81     0.12    4.07    4.56    0.42   1.81   5.36
sdc               0.29     0.72   28.66    6.98  3277.77   138.20   191.72     0.16    4.54    5.57    0.35   1.65   5.89
sdd               0.28     0.61   35.01    4.54  4846.56    91.61   249.71     0.25    6.23    7.01    0.17   2.75  10.88
sde               0.51     0.65   43.79    7.35  4201.57   147.55   170.08     0.28    5.43    6.27    0.42   1.97  10.08
sdf               0.34     0.81   40.80    7.12  4848.41   171.47   209.52     0.28    5.78    6.71    0.45   2.18  10.42
sdg               0.36     0.60   33.59    5.20  3058.07   109.42   163.30     0.14    3.71    4.25    0.20   1.29   5.01
sdh               0.49     0.89   51.78    7.72  6375.04   186.49   220.56     0.22    3.72    4.19    0.58   1.60   9.53
sdi               8.75     3.81    8.79   11.19   713.27   269.65    98.37     0.02    0.78    0.74    0.81   0.50   1.01
sdj               8.76     3.81    8.62   11.15   702.28   262.81    97.66     0.03    1.38    1.10    1.59   0.90   1.77
sdk               0.08     5.91    0.13    6.76     4.79   165.84    49.54     1.11  161.11   14.49  163.93   6.02   4.14
dm-0              0.00     0.00    0.00    0.00     0.06     0.00    48.91     0.00    5.60    5.60    0.00   3.52   0.00
dm-1              0.00     0.00    0.14   11.78     4.53   165.84    28.57     0.21   17.31   15.90   17.33   3.47   4.14
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     8.83     0.00    1.24    1.29    0.00   1.24   0.00

CRUSH MAP:
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class ssd
device 9 osd.9 class ssd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host isolinear {
    id -3        # do not change unnecessarily
    id -4 class hdd        # do not change unnecessarily
    id -5 class ssd        # do not change unnecessarily
    # weight 24.556
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 1.819
    item osd.1 weight 1.819
    item osd.2 weight 2.728
    item osd.3 weight 2.728
    item osd.4 weight 3.638
    item osd.5 weight 3.638
    item osd.6 weight 3.638
    item osd.7 weight 3.638
    item osd.8 weight 0.455
    item osd.9 weight 0.455
}
root default {
    id -1        # do not change unnecessarily
    id -2 class hdd        # do not change unnecessarily
    id -6 class ssd        # do not change unnecessarily
    # weight 24.556
    alg straw2
    hash 0    # rjenkins1
    item isolinear weight 24.556
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default class hdd
    step choose firstn 0 type osd
    step emit
}
rule FastStoreage {
    id 1
    type replicated
    min_size 1
    max_size 10
    step take default class ssd
    step choose firstn 0 type osd
    step emit
}

# end crush map

Server View
Logs

CEPH CONF
Code:
[global]
     auth client required = cephx
     auth cluster required = cephx
     auth service required = cephx
     cluster network = 10.10.10.0/24
     debug asok = 0/0
     debug auth = 0/0
     debug buffer = 0/0
     debug client = 0/0
     debug context = 0/0
     debug crush = 0/0
     debug filer = 0/0
     debug filestore = 0/0
     debug finisher = 0/0
     debug heartbeatmap = 0/0
     debug journal = 0/0
     debug journaler = 0/0
     debug lockdep = 0/0
     debug mds = 0/0
     debug mds balancer = 0/0
     debug mds locker = 0/0
     debug mds log = 0/0
     debug mds log expire = 0/0
     debug mds migrator = 0/0
     debug mon = 0/0
     debug monc = 0/0
     debug ms = 0/0
     debug objclass = 0/0
     debug objectcacher = 0/0
     debug objecter = 0/0
     debug optracker = 0/0
     debug osd = 0/0
     debug paxos = 0/0
     debug perfcounter = 0/0
     debug rados = 0/0
     debug rbd = 0/0
     debug rgw = 0/0
     debug throttle = 0/0
     debug timer = 0/0
     debug tp = 0/0
     fsid = e791fb44-9c38-4e86-8edf-cdbc5a3f7d63
     keyring = /etc/pve/priv/$cluster.$name.keyring
     mon allow pool delete = true
     osd crush chooseleaf type = 0
     osd journal size = 5120
     osd pool default min size = 2
     osd pool default size = 3
     public network = 10.10.10.0/24

[osd]
     keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.DS9]
     host = DS9
     mon addr = 10.10.10.3:6789

[mon.isolinear]
     host = isolinear
     mon addr = 10.10.10.4:6789
 
Hi,
With any writes or reads running I seem to see high IO delays but low cpu usage.
This is normal here from the kernel doc.

Code:
- iowait: In a word, iowait stands for waiting for I/O to complete. But there
  are several problems:
  1. Cpu will not wait for I/O to complete, iowait is the time that a task is
     waiting for I/O to complete. When cpu goes into idle state for
     outstanding task io, another task will be scheduled on this CPU.
  2. In a multi-core CPU, the task waiting for I/O to complete is not running
     on any CPU, so the iowait of each CPU is difficult to calculate.
  3. The value of iowait field in /proc/stat will decrease in certain
     conditions.
  So, the iowait is not reliable by reading from /proc/stat.

The only thing I've done differently with this build over my last single node is that I use the SD Card controller to install the OS
This has to be changed very soon because PVE will write very much so the SD will reach the wareout very soon.