Hello All,
With any writes or reads running I seem to see high IO delays but low cpu usage.
Currently with data copying to an SMB share on a vm on another node over 10Gbit I see 2.66% CPU usage and 46% IO delay. Is this normal?
Two pools are setup. One HDD and on SSD. Both are setup correctly with the correct PGs etc.
The only thing I've done differently with this build over my last single node is that I use the SD Card controller to install the OS. It's slow. I knew it would be a mistake so I will be copying the data off the SD Cards and booting from a 128GB SSD in the future. Watching iostat doesn't suggest much activity on the OS disk (sdk) anyways.
Any help is much appreciated.
Single Node Spec:
4 x Intel(R) Xeon(R) CPU E5-2407 0 @ 2.20GHz (1 Socket)
Linux 4.15.17-1-pve #1 SMP PVE 4.15.17-9 (Wed, 9 May 2018 13:31:43 +0200)
40GB DDR 3 ECC RAM
IOSTAT:
CRUSH MAP:
CEPH CONF
With any writes or reads running I seem to see high IO delays but low cpu usage.
Currently with data copying to an SMB share on a vm on another node over 10Gbit I see 2.66% CPU usage and 46% IO delay. Is this normal?
Two pools are setup. One HDD and on SSD. Both are setup correctly with the correct PGs etc.
The only thing I've done differently with this build over my last single node is that I use the SD Card controller to install the OS. It's slow. I knew it would be a mistake so I will be copying the data off the SD Cards and booting from a 128GB SSD in the future. Watching iostat doesn't suggest much activity on the OS disk (sdk) anyways.
Any help is much appreciated.
Single Node Spec:
4 x Intel(R) Xeon(R) CPU E5-2407 0 @ 2.20GHz (1 Socket)
Linux 4.15.17-1-pve #1 SMP PVE 4.15.17-9 (Wed, 9 May 2018 13:31:43 +0200)
40GB DDR 3 ECC RAM
IOSTAT:
Code:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.16 0.45 20.41 5.09 2676.92 92.73 217.20 0.09 3.66 4.47 0.39 1.49 3.80
sdb 0.26 0.43 26.16 3.48 3511.54 72.33 241.81 0.12 4.07 4.56 0.42 1.81 5.36
sdc 0.29 0.72 28.66 6.98 3277.77 138.20 191.72 0.16 4.54 5.57 0.35 1.65 5.89
sdd 0.28 0.61 35.01 4.54 4846.56 91.61 249.71 0.25 6.23 7.01 0.17 2.75 10.88
sde 0.51 0.65 43.79 7.35 4201.57 147.55 170.08 0.28 5.43 6.27 0.42 1.97 10.08
sdf 0.34 0.81 40.80 7.12 4848.41 171.47 209.52 0.28 5.78 6.71 0.45 2.18 10.42
sdg 0.36 0.60 33.59 5.20 3058.07 109.42 163.30 0.14 3.71 4.25 0.20 1.29 5.01
sdh 0.49 0.89 51.78 7.72 6375.04 186.49 220.56 0.22 3.72 4.19 0.58 1.60 9.53
sdi 8.75 3.81 8.79 11.19 713.27 269.65 98.37 0.02 0.78 0.74 0.81 0.50 1.01
sdj 8.76 3.81 8.62 11.15 702.28 262.81 97.66 0.03 1.38 1.10 1.59 0.90 1.77
sdk 0.08 5.91 0.13 6.76 4.79 165.84 49.54 1.11 161.11 14.49 163.93 6.02 4.14
dm-0 0.00 0.00 0.00 0.00 0.06 0.00 48.91 0.00 5.60 5.60 0.00 3.52 0.00
dm-1 0.00 0.00 0.14 11.78 4.53 165.84 28.57 0.21 17.31 15.90 17.33 3.47 4.14
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 8.83 0.00 1.24 1.29 0.00 1.24 0.00
CRUSH MAP:
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class ssd
device 9 osd.9 class ssd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host isolinear {
id -3 # do not change unnecessarily
id -4 class hdd # do not change unnecessarily
id -5 class ssd # do not change unnecessarily
# weight 24.556
alg straw2
hash 0 # rjenkins1
item osd.0 weight 1.819
item osd.1 weight 1.819
item osd.2 weight 2.728
item osd.3 weight 2.728
item osd.4 weight 3.638
item osd.5 weight 3.638
item osd.6 weight 3.638
item osd.7 weight 3.638
item osd.8 weight 0.455
item osd.9 weight 0.455
}
root default {
id -1 # do not change unnecessarily
id -2 class hdd # do not change unnecessarily
id -6 class ssd # do not change unnecessarily
# weight 24.556
alg straw2
hash 0 # rjenkins1
item isolinear weight 24.556
}
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default class hdd
step choose firstn 0 type osd
step emit
}
rule FastStoreage {
id 1
type replicated
min_size 1
max_size 10
step take default class ssd
step choose firstn 0 type osd
step emit
}
# end crush map
Server View
Logs
CEPH CONF
Code:
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.10.10.0/24
debug asok = 0/0
debug auth = 0/0
debug buffer = 0/0
debug client = 0/0
debug context = 0/0
debug crush = 0/0
debug filer = 0/0
debug filestore = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug journal = 0/0
debug journaler = 0/0
debug lockdep = 0/0
debug mds = 0/0
debug mds balancer = 0/0
debug mds locker = 0/0
debug mds log = 0/0
debug mds log expire = 0/0
debug mds migrator = 0/0
debug mon = 0/0
debug monc = 0/0
debug ms = 0/0
debug objclass = 0/0
debug objectcacher = 0/0
debug objecter = 0/0
debug optracker = 0/0
debug osd = 0/0
debug paxos = 0/0
debug perfcounter = 0/0
debug rados = 0/0
debug rbd = 0/0
debug rgw = 0/0
debug throttle = 0/0
debug timer = 0/0
debug tp = 0/0
fsid = e791fb44-9c38-4e86-8edf-cdbc5a3f7d63
keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd crush chooseleaf type = 0
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
public network = 10.10.10.0/24
[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring
[mon.DS9]
host = DS9
mon addr = 10.10.10.3:6789
[mon.isolinear]
host = isolinear
mon addr = 10.10.10.4:6789