ceph io very very low

haiwan

Active Member
Apr 23, 2019
234
1
38
36
hi
i don't know what issue.
we test vm io very very low
demo1
3 node
every node 6TB enterprise *2
use 10G connect.
you check this io write very bad.
i wish get help check how to found this trouble.

demo2
we use all ssd test write show 4M/s
same too low
 

Attachments

  • 微信截图_20190604200635.png
    微信截图_20190604200635.png
    26.8 KB · Views: 48
  • 微信截图_20190604200816.png
    微信截图_20190604200816.png
    68.6 KB · Views: 42
  • 微信截图_20190604200829.png
    微信截图_20190604200829.png
    18.4 KB · Views: 38
  • 微信截图_20190604201058.png
    微信截图_20190604201058.png
    12 KB · Views: 39
Did you activate writeback on the VM disks? Ceph works better with bigger writes, a default rados object has 4MB.
 
Did you activate writeback on the VM disks? Ceph works better with bigger writes, a default rados object has 4MB.
hi,thanks
yeah. we try open write back. really write fast.
but i don't know why io too slow?
we use all new disk .
 
Please provide more information about your hardware and cluster setup. Otherwise we just fish in the dark.
 
Please provide more information about your hardware and cluster setup. Otherwise we just fish in the dark.
Code:
root@pve20:~# ceph osd crush tree --show-shadow
ID CLASS WEIGHT   TYPE NAME         
-2   hdd 38.20520 root default~hdd   
-4   hdd 10.91577     host pve20~hdd
 0   hdd  5.45789         osd.0     
 1   hdd  5.45789         osd.1     
-6   hdd 10.91577     host pve21~hdd
 2   hdd  5.45789         osd.2     
 3   hdd  5.45789         osd.3     
-8   hdd 16.37366     host pve22~hdd
 4   hdd  5.45789         osd.4     
 5   hdd  5.45789         osd.5     
 6   hdd  5.45789         osd.6     
-1       38.20520 root default       
-3       10.91577     host pve20     
 0   hdd  5.45789         osd.0     
 1   hdd  5.45789         osd.1     
-5       10.91577     host pve21     
 2   hdd  5.45789         osd.2     
 3   hdd  5.45789         osd.3     
-7       16.37366     host pve22     
 4   hdd  5.45789         osd.4     
 5   hdd  5.45789         osd.5     
 6   hdd  5.45789         osd.6     
root@pve20:~#
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class hdd
device 5 osd.5 class hdd
device 6 osd.6 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host pve20 {
    id -3        # do not change unnecessarily
    id -4 class hdd        # do not change unnecessarily
    # weight 10.916
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 5.458
    item osd.1 weight 5.458
}
host pve21 {
    id -5        # do not change unnecessarily
    id -6 class hdd        # do not change unnecessarily
    # weight 10.916
    alg straw2
    hash 0    # rjenkins1
    item osd.2 weight 5.458
    item osd.3 weight 5.458
}
host pve22 {
    id -7        # do not change unnecessarily
    id -8 class hdd        # do not change unnecessarily
    # weight 16.374
    alg straw2
    hash 0    # rjenkins1
    item osd.4 weight 5.458
    item osd.5 weight 5.458
    item osd.6 weight 5.458
}
root default {
    id -1        # do not change unnecessarily
    id -2 class hdd        # do not change unnecessarily
    # weight 38.205
    alg straw2
    hash 0    # rjenkins1
    item pve20 weight 10.916
    item pve21 weight 10.916
    item pve22 weight 16.374
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map
Code:
[global]
     auth client required = cephx
     auth cluster required = cephx
     auth service required = cephx
     cluster network = 192.168.100.0/24
     fsid = 3d4c74e0-07ac-4bbb-b270-973a7beaa1c9
     keyring = /etc/pve/priv/$cluster.$name.keyring
     mon allow pool delete = true
     osd journal size = 5120
     osd pool default min size = 2
     osd pool default size = 3
     public network = 192.168.100.0/24

[osd]
     keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.pve20]
     host = pve20
     mon addr = 192.168.100.20:6789

[mon.pve21]
     host = pve21
     mon addr = 192.168.100.21:6789

[mon.pve22]
     host = pve22
     mon addr = 192.168.100.22:6789
 
every node all connect 10G switch . no use bond
this is SATA enterprise disk.
 

Attachments

  • 微信截图_20190605123428.png
    微信截图_20190605123428.png
    18.5 KB · Views: 17
  • 微信截图_20190605123524.png
    微信截图_20190605123524.png
    17.7 KB · Views: 16
  • 微信截图_20190605123444.png
    微信截图_20190605123444.png
    12 KB · Views: 16
  • 微信截图_20190605123806.png
    微信截图_20190605123806.png
    16.7 KB · Views: 16
  • 微信截图_20190605123825.png
    微信截图_20190605123825.png
    29.9 KB · Views: 16
and this is open writeback later test .
you think ok?
 

Attachments

  • 微信截图_20190605125245.png
    微信截图_20190605125245.png
    22.1 KB · Views: 32
root@pve20:~# rados bench -p rbd 60 write -b 4M -t 16 --no-cleanup
error opening pool rbd: (2) No such file or directory
 

Attachments

  • 微信截图_20190605202648.png
    微信截图_20190605202648.png
    6 KB · Views: 19
Code:
root@pve20:~# rados bench -p pveceph-vm 60 write -b 4M -t 16 --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects
Object prefix: benchmark_data_pve20_1221018
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        39        23   91.9968        92    0.525026    0.453288
    2      16        71        55   109.987       128    0.388673    0.484536
    3      16        99        83   110.654       112    0.279539    0.506827
    4      16       125       109   108.986       104    0.893676    0.532952
    5      16       152       136   108.786       108   0.0923541    0.545537
    6      16       160       144   95.9876        32     1.10482    0.554353
    7      16       189       173   98.8444       116    0.167148     0.61635
    8      16       208       192   95.9877        76      0.1381    0.628111
    9      16       230       214   95.0991        88    0.900637    0.640316
   10      16       249       233   93.1882        76     1.05682    0.650741
   11      16       265       249    90.534        64     1.21001    0.674377
   12      16       278       262    87.322        52   0.0790534    0.672922
   13      16       305       289   88.9115       108     0.83334    0.699524
   14      16       319       303   86.5603        56    0.113261    0.692738
   15      16       343       327   87.1889        96    0.331241    0.714442
   16      16       356       340   84.9892        52   0.0955234    0.711637
   17      16       372       356   83.7541        64   0.0960042     0.72174
   18      16       394       378   83.9891        88    0.379105    0.723566
   19      16       410       394   82.9367        64    0.267056    0.741247
2019-06-05 20:24:43.698592 min lat: 0.073624 max lat: 3.40872 avg lat: 0.751385
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   20      16       426       410   81.9894        64    0.193852    0.751385
   21      16       443       427   81.3227        68    0.277608    0.755047
   22      16       453       437   79.4442        40     0.11136    0.768414
   23      16       464       448   77.9029        44    0.205865     0.78516
   24      16       482       466   77.6566        72     1.62623    0.795868
   25      16       493       477     76.31        44   0.0918577    0.794046
   26      16       507       491   75.5285        56   0.0958611    0.808335
   27      16       526       510   75.5457        76     0.26046    0.813982
   28      16       539       523   74.7045        52    0.147487    0.822418
   29      16       558       542   74.7489        76     1.33057    0.840701
   30      16       577       561   74.7902        76      1.2884    0.835475
   31      16       590       574   74.0548        52     1.10672    0.841937
   32      16       606       590   73.7402        64    0.848479    0.844206
   33      16       622       606   73.4449        64    0.267696     0.83981
   34      16       648       632    74.343       104     0.21442    0.844791
   35      16       656       640   73.1331        32     2.79732    0.856335
   36      16       675       659   73.2124        76    0.341708    0.860593
   37      16       686       670   72.4228        44      1.1331    0.863092
   38      16       701       685   72.0957        60       1.118    0.869825
   39      16       706       690   70.7598        20    0.200203    0.873931
2019-06-05 20:25:03.701289 min lat: 0.0735826 max lat: 5.10807 avg lat: 0.879367
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   40      16       713       697   69.6908        28     2.12733    0.879367
   41      16       727       711   69.3566        56    0.148251    0.896475
   42      16       742       726   69.1336        60    0.264372    0.901768
   43      16       760       744      69.2        72    0.286994    0.906682
   44      16       777       761   69.1726        68    0.250539    0.912612
   45      16       801       785   69.7685        96    0.611254    0.904167
   46      16       820       804   69.9037        76     1.73497    0.902356
   47      16       844       828   70.4586        96    0.312386    0.897403
   48      16       857       841   70.0739        52    0.209539     0.89851
   49      16       868       852   69.5417        44   0.0920531    0.901282
   50      16       889       873   69.8307        84    0.100025    0.902767
   51      16       905       889   69.7162        64    0.301183    0.906768
   52      16       920       904   69.5292        60     0.22323    0.907437
   53      16       935       919   69.3493        60    0.210478    0.908166
   54      16       951       935     69.25        64     1.00415    0.909389
   55      16       968       952   69.2271        68    0.133412     0.90768
   56      16       982       966   68.9908        56    0.104931    0.917286
   57      16       996       980   68.7627        56    0.233982    0.914207
   58      16      1011       995   68.6115        60    0.170382    0.920822
   59      16      1030      1014   68.7365        76     0.18077     0.92112
2019-06-05 20:25:23.704031 min lat: 0.0721173 max lat: 5.10807 avg lat: 0.921952
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
   60      16      1050      1034   68.9241        80    0.316779    0.921952
^C
 
Code:
root@pve20:~# rados bench -p pveceph-vm 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_pve20_1222678
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        40        24   95.9962        96    0.228331    0.443699
    2      16        69        53   105.989       116    0.221838    0.503378
    3      16        80        64   85.3229        44   0.0908387    0.514024
    4      16        94        78   77.9902        56    0.105896    0.639713
    5      16       123       107   85.5891       116     1.05938    0.693312
    6      16       141       125    83.323        72    0.291186    0.687986
    7      16       158       142    81.133        68     1.31452    0.739318
    8      16       169       153   76.4907        44    0.460282    0.773746
    9      16       176       160   71.1025        28     2.01795    0.788584
   10      16       189       173   69.1917        52    0.275826    0.849904
Total time run:         10.672805
Total writes made:      190
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     71.209
Stddev Bandwidth:       30.8141
Max bandwidth (MB/sec): 116
Min bandwidth (MB/sec): 28
Average IOPS:           17
Stddev IOPS:            7
Max IOPS:               29
Min IOPS:               7
Average Latency(s):     0.898248
Stddev Latency(s):      0.683704
Max latency(s):         3.83125
Min latency(s):         0.080931
root@pve20:~#
 
You are using 3x nodes with SATA spinners and a replica of 3. To gain performance, either add more nodes or add SSDs for a separate fast pool.

Again, checkout the Ceph benchmark paper (PDF) and its thread.
https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/
okay.
thakns , Alwin
so we just undertand this right ?
yeah SATA no SSD write and read good.
if we use this
plan A
3*node every node 2*osd every osd is 6TB sata
plan B
5*node every node 5*osd every osd is 6TB sata
plan C
5*node every node 3*osd every osd is 6TB sata
you think whick beter? now we all just basic SATA DISK.
 
plan D
5*node, every node 4*osd every osd is 6TB sata, every node 1*SSD DB/WAL

This way, small writes will go to the SSD, as the DB and WAL of the OSD will be located there. And only the data is written to the 6TB disk.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html#_ceph_bluestore

EDIT: it needs to be a enterprise SSD with sustainable IO rates.
 
and you think we before open writeback test have ok? face SATA
if i add in 1 ssd per node in the testing server, will it improve the read and write speed?
 
plan D
5*node, every node 4*osd every osd is 6TB sata, every node 1*SSD DB/WAL

This way, small writes will go to the SSD, as the DB and WAL of the OSD will be located there. And only the data is written to the 6TB disk.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html#_ceph_bluestore

EDIT: it needs to be a enterprise SSD with sustainable IO rates.
hi
If you want to use a separate DB/WAL device for your OSDs, you can specify it through the -journal_dev option. The WAL is placed with the DB, if not specified separately.

pveceph createosd /dev/sd[X] -journal_dev /dev/sd[Y]
i no how to use this.
we have add ssd disk.
 
plan D
5*node, every node 4*osd every osd is 6TB sata, every node 1*SSD DB/WAL

This way, small writes will go to the SSD, as the DB and WAL of the OSD will be located there. And only the data is written to the 6TB disk.
https://pve.proxmox.com/pve-docs/chapter-pveceph.html#_ceph_bluestore

EDIT: it needs to be a enterprise SSD with sustainable IO rates.
and have a question
pveceph createosd /dev/sd[X] -journal_dev /dev/sd[Y]
we have 2-3 osd
we new ssd give who osd use log ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!