[SOLVED] Cehp timeout and losted disks

Discussion in 'Proxmox VE: Installation and configuration' started by kaltsi, Jun 26, 2018.

Tags:
  1. kaltsi

    kaltsi Member

    Joined:
    Mar 22, 2013
    Messages:
    32
    Likes Received:
    2
    Hi all,
    I get many times "timeout" from cehp.

    Some VM lost its disk. And cannot boot from disk.

    I didn't analyze logs yet, but I 'll update this post.

    Add:
    The ceph log file contains lot of similar entry:
    check update: 59 slow requests are blocked > 32 sec
     
    #1 kaltsi, Jun 26, 2018
    Last edited: Jun 26, 2018
  2. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,571
    Likes Received:
    138
    Elaborate more on your ceph setup, hardware, configuration, status. For now, I can only say that you have requests from ceph clients pending, these are in the queue for more then 32 sec already.
     
  3. kaltsi

    kaltsi Member

    Joined:
    Mar 22, 2013
    Messages:
    32
    Likes Received:
    2
    Thanks your reply.

    The system has 10G optical network. Therefore I think there isn't network issue.
    he average speed between the nodes is 418.4MB/s (include the disk IO)
    But I don't know where the problem may be.

    Could you help me how can I use this to get more details about this errors.
    https://github.com/ceph/ceph/pull/21750

    And I have another question. Which choose is better if I use lot of smaller disk or few bigger disk
     
    #3 kaltsi, Jun 28, 2018
    Last edited: Jun 28, 2018
  4. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,571
    Likes Received:
    138
  5. kaltsi

    kaltsi Member

    Joined:
    Mar 22, 2013
    Messages:
    32
    Likes Received:
    2
    We have the same 4 nodes.
    CPU(s) 48 x Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (2 Sockets)
    Mem: 256GB/node

    Code:
    root@vCNT-host-4:~# ceph -s
      cluster:
        id:     fb926dd9-17b9-42fb-88d6-27f4944fd554
        health: HEALTH_OK
    
      services:
        mon: 4 daemons, quorum vCNT-host-1,vCNT-host-2,vCNT-host-3,vCNT-host-4
        mgr: vCNT-host-3(active), standbys: vCNT-host-4, vCNT-host-1, vCNT-host-2
        osd: 16 osds: 16 up, 16 in
    
      data:
        pools:   1 pools, 512 pgs
        objects: 1102k objects, 3339 GB
        usage:   8299 GB used, 4994 GB / 13294 GB avail
        pgs:     512 active+clean
    
      io:
        client:   15524 kB/s rd, 6982 kB/s wr, 765 op/s rd, 267 op/s wr
    
    Code:
    create linked clone of drive sata0 (vCNT-Storage_vm:base-279-disk-1)
    trying to aquire cfs lock 'storage-vCNT-Storage_vm' ...
    trying to aquire cfs lock 'storage-vCNT-Storage_vm' ...
    trying to aquire cfs lock 'storage-vCNT-Storage_vm' ...
    trying to aquire cfs lock 'storage-vCNT-Storage_vm' ...
    trying to aquire cfs lock 'storage-vCNT-Storage_vm' ...
    trying to aquire cfs lock 'storage-vCNT-Storage_vm' ...
    trying to aquire cfs lock 'storage-vCNT-Storage_vm' ...
    trying to aquire cfs lock 'storage-vCNT-Storage_vm' ...
    trying to aquire cfs lock 'storage-vCNT-Storage_vm' ...
    TASK ERROR: clone failed: error with cfs lock 'storage-vCNT-Storage_vm': got lock request timeout
    The chep status was "overall HEALTH_OK" in this time.
    In the syslog:
    Code:
    Jun 28 15:07:52 vCNT-host-4 ceph-osd[10770]: 2018-06-28 15:07:52.761655 7ff73acb5700 -1 osd.13 pg_epoch: 1421 pg[2.174( v 1421'355128 (1421'353606,1421'355128] local-lis/les=938/939 n=2187 ec=99/72 lis/c 938/938 les/c/f 939/939/0 105/938/938) [1,13,8] r=1 lpr=938 luod=0'0 crt=1421'355128 lcod 1421'355127 active] _scan_snaps no head for 2:2eae9ed9:::rbd_data.1008782eb141f2.000000000000412b:155 (have MIN)
    
    Thanks for the sent URLs, I'll read it.
     
  6. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,571
    Likes Received:
    138
    The access to the storage is working? The PVE cluster has quorum?

    More then 3 MON/MGR are not needed, when you go into the thousands of ceph nodes/clients, additional MONs may then be required. At least two MGRs, if one dies.
     
  7. kaltsi

    kaltsi Member

    Joined:
    Mar 22, 2013
    Messages:
    32
    Likes Received:
    2
    Yes, the cluster is working well.
    Code:
    root@vCNT-host-4:~# pvecm status
    Quorum information
    ------------------
    Date:             Thu Jun 28 17:25:36 2018
    Quorum provider:  corosync_votequorum
    Nodes:            4
    Node ID:          0x00000002
    Ring ID:          1/436
    Quorate:          Yes
    
    Votequorum information
    ----------------------
    Expected votes:   4
    Highest expected: 4
    Total votes:      4
    Quorum:           3
    Flags:            Quorate
    
    Membership information
    ----------------------
        Nodeid      Votes Name
    0x00000001          1 10.102.166.130
    0x00000004          1 10.102.166.131
    0x00000003          1 10.102.166.132
    0x00000002          1 10.102.166.133 (local)
    
    The access to the storage is working usually. But sometimes there is timeout.
    The next try is usually succeeded.
    And some VMs losted its data of disk (disk was not bootable anymore). but after rollback it can work again.

    For example I convert a VM to Template and there was timeout while the system converted the disk from "vm" to "base" type. The VM configuration become to Template but the disk stayed vm format. After this situation it is needed to remove the "template:1" line from vm configuration by hand. And the next try is usually working.
     
  8. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,571
    Likes Received:
    138
    How is the ceph health then? I assume, the slow requests appear.

    What das a 'rados -p <POOL> bench 60 write' give you?

    Are there orphan images on the rbd pool?

    Still missing hardware details, like model of disks, RAID or HBA.
     
  9. kaltsi

    kaltsi Member

    Joined:
    Mar 22, 2013
    Messages:
    32
    Likes Received:
    2
    It is a HP Apollo server. The hosts have Smart Array P440 Raid controller, but all disk in raid0 for ceph and raid1 for OS.
    The Ceph health usually HEALTH_OK. But there are lot of events from slow request in the ceph log file.
    For example:
    Code:
    cluster [WRN] Health check failed: 6 slow requests are blocked > 32 sec (REQUEST_SLOW)
    cluster [WRN] Health check update: 5 slow requests are blocked > 32 sec (REQUEST_SLOW)
    cluster [WRN] Health check cleared: REQUEST_SLOW (was: 5 slow requests are blocked > 32 sec)
    cluster [INF]Cluster is now healthy
    
    Code:
    root@vCNT-host-4:~# rados -p vCNT-Storage bench 60 write
    hints = 1
    Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects
    Object prefix: benchmark_data_vCNT-host-4_3583797
      sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
        0       0         0         0         0         0           -           0
        1      16       137       121   483.963       484   0.0160639   0.0582168
        2      16       137       121   241.972         0           -   0.0582168
        3      16       137       121   161.316         0           -   0.0582168
        4      16       137       121   120.987         0           -   0.0582168
        5      16       161       145   115.987        24   0.0161211    0.100628
        6      16       245       229    152.65       336   0.0177876    0.261065
        7      16       377       361   206.263       528   0.0177685    0.284036
        8      16       377       361    180.48         0           -    0.284036
        9      16       377       361   160.427         0           -    0.284036
       10      16       377       361   144.384         0           -    0.284036
       11      16       377       361   131.258         0           -    0.284036
       12      16       437       421   140.318        48   0.0363651    0.360888
       13      16       562       546   167.982       500   0.0134725    0.284943
       14      16       562       546   155.983         0           -    0.284943
       15      16       649       633   168.781       174   0.0132016    0.344774
       16      16       649       633   158.233         0           -    0.344774
       17      16       649       633   148.925         0           -    0.344774
       18      16       654       638   141.763   6.66667   0.0232407    0.365441
       19      16       703       687   144.616       196   0.0152398    0.352621
    2018-06-29 09:26:59.014971 min lat: 0.0130459 max lat: 8.11459 avg lat: 0.352621
      sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
       20      16       703       687   137.385         0           -    0.352621
       21      16       723       707   134.652        40   0.0238429    0.387419
       22      16       801       785   142.712       312   0.0166144    0.351341
       23      16       831       815   141.724       120   0.0153901    0.339042
       24      16       903       887   147.817       288   0.0154528    0.401815
       25      16       909       893   142.864        24   0.0134731    0.399216
       26      16       909       893    137.37         0           -    0.399216
       27      16       929       913   135.245        40   0.0151433    0.407127
       28      16       929       913   130.415         0           -    0.407127
       29      16       930       914   126.056         2     5.29795    0.412478
       30      16      1016      1000   133.319       344   0.0126818    0.391834
       31      16      1016      1000   129.019         0           -    0.391834
       32      16      1035      1019   127.362        38   0.0163367    0.409573
       33      16      1035      1019   123.502         0           -    0.409573
       34      16      1042      1026   120.693        14   0.0167332     0.41071
       35      16      1042      1026   117.245         0           -     0.41071
       36      16      1113      1097   121.876       142   0.0153936    0.418399
       37      16      1113      1097   118.583         0           -    0.418399
       38      16      1113      1097   115.462         0           -    0.418399
       39      16      1146      1130   115.886        44   0.0129354    0.411341
    2018-06-29 09:27:19.016944 min lat: 0.0126818 max lat: 16.8512 avg lat: 0.414792
      sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
       40      16      1172      1156   115.588       104   0.0224196    0.414792
       41      16      1205      1189   115.988       132    0.121636    0.404011
       42      16      1205      1189   113.226         0           -    0.404011
       43      16      1244      1228   114.221        78   0.0140761    0.425268
       44      16      1244      1228   111.625         0           -    0.425268
       45      16      1358      1342   119.277       228   0.0169134    0.412128
       46      16      1591      1575   136.943       932   0.0804751    0.466864
       47      16      1800      1784   151.814       836   0.0211982    0.420438
       48      16      1940      1924   160.317       560   0.0529476    0.398188
       49      16      2112      2096   171.084       688   0.0412349    0.373047
       50      16      2188      2172   173.742       304   0.0255911     0.36615
       51      16      2257      2241   175.747       276    0.022596    0.362043
       52      16      2329      2313   177.905       288   0.0273724    0.357763
       53      16      2380      2364   178.397       204    0.361083    0.356179
       54      16      2449      2433   180.204       276     0.35389    0.353385
       55      16      2487      2471    179.69       152   0.0724898    0.352563
       56      16      2533      2517   179.767       184    0.630852    0.353883
       57      16      2610      2594   182.016       308   0.0455911    0.350184
       58      16      2728      2712   187.015       472   0.0278062    0.340587
       59      16      2788      2772   187.912       240   0.0206987    0.338436
    2018-06-29 09:27:39.019146 min lat: 0.0126818 max lat: 30.0243 avg lat: 0.339645
      sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
       60      16      2821      2805    186.98       132     1.32027    0.339645
    Total time run:         60.819601
    Total writes made:      2822
    Write size:             4194304
    Object size:            4194304
    Bandwidth (MB/sec):     185.598
    Stddev Bandwidth:       219.711
    Max bandwidth (MB/sec): 932
    Min bandwidth (MB/sec): 0
    Average IOPS:           46
    Stddev IOPS:            54
    Max IOPS:               233
    Min IOPS:               0
    Average Latency(s):     0.344473
    Stddev Latency(s):      1.58766
    Max latency(s):         30.0243
    Min latency(s):         0.0126818
    Cleaning up (deleting benchmark objects)
    Removed 2822 objects
    Clean up completed and total clean up time :12.508660
    

    I found some interesting disks.
    For example:
    VM-170
    Code:
    root@vCNT-host-1:~# rbd ls vCNT-Storage | grep 170
    vm-170-disk-1
    vm-170-disk-2
    vm-170-disk-3
    
    Code:
    root@vCNT-host-1:~# cat /etc/pve/nodes/vCNT-host-2/qemu-server/170.conf
    balloon: 4096
    bootdisk: sata0
    cores: 2
    memory: 6144
    name: RobertH-TNL-v3.16-1-FED01
    net0: e1000=3E:4A:AB:E3:6A:AE,bridge=vmbr1,tag=3908
    net1: e1000=BA:95:70:35:68:BC,bridge=vmbr1,tag=3908
    net2: e1000=8E:18:93:8F:15:1A,bridge=vmbr1,tag=3908
    net3: e1000=6A:86:6E:0C:C2:42,bridge=vmbr1,tag=3908
    numa: 0
    ostype: l26
    parent: Init_snapshot
    sata0: vCNT-Storage_vm:base-791-disk-1/vm-170-disk-3,size=70G
    smbios1: uuid=17229829-3cda-467b-8d92-84204b61677f
    sockets: 1
    
    [Init_snapshot]
    #Init_comment
    balloon: 4096
    bootdisk: sata0
    cores: 2
    memory: 6144
    name: RobertH-TNL-v3.16-1-FED01
    net0: e1000=3E:4A:AB:E3:6A:AE,bridge=vmbr1,tag=3908
    net1: e1000=BA:95:70:35:68:BC,bridge=vmbr1,tag=3908
    net2: e1000=8E:18:93:8F:15:1A,bridge=vmbr1,tag=3908
    net3: e1000=6A:86:6E:0C:C2:42,bridge=vmbr1,tag=3908
    numa: 0
    ostype: l26
    sata0: vCNT-Storage_vm:base-791-disk-1/vm-170-disk-3,size=70G
    smbios1: uuid=17229829-3cda-467b-8d92-84204b61677f
    snaptime: 1529666334
    sockets: 1
    
     
  10. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,571
    Likes Received:
    138
    Put the controller into HBA (IT-mode).
    Code:
       4      16       137       121   120.987         0           -   0.0582168
       5      16       161       145   115.987        24   0.0161211    0.100628
       6      16       245       229    152.65       336   0.0177876    0.261065
       7      16       377       361   206.263       528   0.0177685    0.284036
       8      16       377       361    180.48         0           -    0.284036
       9      16       377       361   160.427         0           -    0.284036
    
    This is an indication that the OSDs are starving, as the RAID controller interferes with the writes.

    In general avoid RAID use HBA with Ceph (or ZFS).
     
  11. kaltsi

    kaltsi Member

    Joined:
    Mar 22, 2013
    Messages:
    32
    Likes Received:
    2
    Thanks your hints.

    I reconfigured the whole system.
    All raind controllers use HBA mode.

    But it seems it is not better.

    Code:
    2018-07-06 07:00:00.000157 mon.vCNT-host-1 mon.0 10.102.166.130:6789/0 62593 : cluster [WRN] overall HEALTH_WARN 1 nearfull osd(s); 1 pool(s) nearfull
    2018-07-06 08:00:00.000151 mon.vCNT-host-1 mon.0 10.102.166.130:6789/0 63480 : cluster [WRN] overall HEALTH_WARN 1 nearfull osd(s); 1 pool(s) nearfull
    2018-07-06 09:00:00.000164 mon.vCNT-host-1 mon.0 10.102.166.130:6789/0 64406 : cluster [WRN] overall HEALTH_WARN 1 nearfull osd(s); 1 pool(s) nearfull
    2018-07-06 09:07:04.909589 mon.vCNT-host-1 mon.0 10.102.166.130:6789/0 64505 : cluster [WRN] Health check failed: 1 slow requests are blocked > 32 sec (REQUEST_SLOW)
    2018-07-06 09:07:11.207034 mon.vCNT-host-1 mon.0 10.102.166.130:6789/0 64510 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 1 slow requests are blocked > 32 sec)
    2018-07-06 09:07:25.117456 mon.vCNT-host-1 mon.0 10.102.166.130:6789/0 64511 : cluster [WRN] Health check failed: 2 slow requests are blocked > 32 sec (REQUEST_SLOW)
    2018-07-06 09:07:31.177075 mon.vCNT-host-1 mon.0 10.102.166.130:6789/0 64513 : cluster [WRN] Health check update: 4 slow requests are blocked > 32 sec (REQUEST_SLOW)
    2018-07-06 09:07:41.281063 mon.vCNT-host-1 mon.0 10.102.166.130:6789/0 64517 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 4 slow requests are blocked > 32 sec)
    2018-07-06 09:26:01.027166 mon.vCNT-host-1 mon.0 10.102.166.130:6789/0 65186 : cluster [WRN] Health check failed: 10 slow requests are blocked > 32 sec (REQUEST_SLOW)
    2018-07-06 09:26:10.533493 mon.vCNT-host-1 mon.0 10.102.166.130:6789/0 65192 : cluster [INF] Health check cleared: REQUEST_SLOW (was: 7 slow requests are blocked > 32 sec)
    
    Code:
    root@vCNT-host-4:~# rados -p vCNT-Storage bench 60 write
    hints = 1
    Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects
    Object prefix: benchmark_data_vCNT-host-4_2695674
      sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
        0       0         0         0         0         0           -           0
        1      16        42        26   103.994       104    0.442838    0.460067
        2      16        63        47   93.9896        84    0.372876    0.520925
        3      16        85        69   91.9913        88    0.310366    0.567248
        4      16       123       107   106.988       152    0.400993    0.556787
        5      16       148       132   105.588       100    0.402739    0.553861
        6      16       170       154   102.655        88    0.639312    0.571264
        7      16       202       186   106.274       128    0.487974    0.574715
        8      16       232       216   107.988       120    0.525585    0.563723
        9      16       261       245   108.876       116    0.479361    0.563683
       10      16       298       282   112.787       148    0.415036    0.558593
       11      16       327       311   113.078       116    0.917769    0.556129
       12      16       350       334    111.32        92    0.412973    0.547002
       13      16       384       368   113.218       136     0.48994    0.552062
       14      16       409       393   112.273       100    0.130571    0.563136
       15      16       436       420   111.987       108    0.334692    0.556066
       16      16       469       453   113.236       132    0.283018    0.552478
       17      16       501       485   114.104       128    0.314325     0.54894
       18      16       533       517   114.876       128     0.37995    0.548436
       19      16       561       545   114.724       112    0.360336    0.544479
    2018-07-06 09:44:53.157006 min lat: 0.128848 max lat: 1.43031 avg lat: 0.54302
      sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
       20      16       592       576   115.187       124    0.182168     0.54302
       21      16       615       599   114.082        92    0.234216    0.551228
       22      16       644       628   114.168       116    0.752318    0.548768
       23      16       670       654   113.726       104    0.683905    0.551409
       24      16       701       685   114.153       124    0.277923    0.555179
       25      16       730       714   114.226       116    0.374056    0.553792
       26      16       759       743   114.294       116    0.220685    0.552543
       27      16       783       767   113.616        96    0.338427    0.552144
       28      16       811       795   113.558       112    0.308912    0.553524
       29      16       843       827   114.055       128    0.138018    0.557699
       30      16       862       846   112.787        76    0.236558      0.5596
       31      16       894       878   113.277       128    0.529105    0.560271
       32      16       916       900   112.487        88    0.368218    0.561754
       33      16       945       929   112.593       116    0.565941    0.563599
       34      16       976       960   112.928       124    0.226921    0.561286
       35      16      1004       988   112.901       112    0.454622     0.56244
       36      16      1033      1017   112.987       116    0.537357    0.563696
       37      16      1053      1037   112.095        80     1.15554    0.565556
       38      16      1079      1063   111.882       104    0.412322    0.566183
       39      16      1108      1092   111.987       116    0.730968    0.567856
    2018-07-06 09:45:13.159330 min lat: 0.127764 max lat: 1.49225 avg lat: 0.569049
      sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
       40      16      1131      1115   111.487        92    0.515967    0.569049
       41      16      1163      1147   111.889       128    0.468317    0.567324
       42      16      1185      1169    111.32        88    0.484569    0.569685
       43      16      1217      1201   111.708       128    0.519618    0.570298
       44      16      1237      1221   110.987        80    0.761683    0.571563
       45      16      1251      1235   109.765        56    0.554703    0.573311
       46      16      1276      1260   109.552       100    0.526327    0.579003
       47      16      1303      1287   109.519       108    0.469738    0.580984
       48      16      1329      1313   109.404       104    0.743499    0.579879
       49      16      1356      1340   109.375       108    0.295312    0.580171
       50      16      1379      1363   109.027        92    0.471967    0.580474
       51      16      1407      1391   109.085       112    0.773703    0.584233
       52      16      1434      1418   109.064       108    0.154409    0.583459
       53      16      1462      1446   109.119       112    0.238878      0.5841
       54      16      1487      1471    108.95       100    0.558933    0.583237
       55      16      1524      1508    109.66       148    0.317236    0.580367
       56      16      1555      1539   109.916       124    0.544948    0.580141
       57      16      1578      1562   109.601        92    0.524091    0.579395
       58      16      1609      1593   109.849       124    0.504469    0.580712
       59      16      1635      1619    109.75       104    0.534606     0.57997
    2018-07-06 09:45:33.161760 min lat: 0.127764 max lat: 1.69057 avg lat: 0.578658
      sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
       60      16      1671      1655    110.32       144    0.873459    0.578658
    Total time run:         60.960629
    Total writes made:      1672
    Write size:             4194304
    Object size:            4194304
    Bandwidth (MB/sec):     109.71
    Stddev Bandwidth:       19.0954
    Max bandwidth (MB/sec): 152
    Min bandwidth (MB/sec): 56
    Average IOPS:           27
    Stddev IOPS:            4
    Max IOPS:               38
    Min IOPS:               14
    Average Latency(s):     0.581926
    Stddev Latency(s):      0.252402
    Max latency(s):         1.69057
    Min latency(s):         0.127764
    Cleaning up (deleting benchmark objects)
    Removed 1672 objects
    Clean up completed and total clean up time :8.814471
    
    Code:
    root@vCNT-host-4:~# ceph -s
      cluster:
        id:     fb926dd9-17b9-42fb-88d6-27f4944fd554
        health: HEALTH_WARN
                1 nearfull osd(s)
                1 pool(s) nearfull
    
      services:
        mon: 3 daemons, quorum vCNT-host-1,vCNT-host-3,vCNT-host-4
        mgr: vCNT-host-4(active), standbys: vCNT-host-3, vCNT-host-1
        osd: 12 osds: 12 up, 12 in
    
      data:
        pools:   1 pools, 512 pgs
        objects: 977k objects, 2881 GB
        usage:   8311 GB used, 2865 GB / 11176 GB avail
        pgs:     511 active+clean
                 1   active+clean+scrubbing+deep
    
      io:
        client:   26351 B/s rd, 1240 kB/s wr, 90 op/s rd, 141 op/s wr
    
    Code:
    root@vCNT-host-4:~# ceph osd tree
    ID CLASS WEIGHT   TYPE NAME              STATUS REWEIGHT PRI-AFF
    -1       10.91510 root default
    -3        2.72878     host vCNT-host-1
     0   hdd  0.90959         osd.0              up  1.00000 1.00000
     1   hdd  0.90959         osd.1              up  1.00000 1.00000
     2   hdd  0.90959         osd.2              up  1.00000 1.00000
    -5        2.72878     host vCNT-host-2
     3   hdd  0.90959         osd.3              up  1.00000 1.00000
     4   hdd  0.90959         osd.4              up  1.00000 1.00000
     5   hdd  0.90959         osd.5              up  1.00000 1.00000
    -7        2.72878     host vCNT-host-3
     8   hdd  0.90959         osd.8              up  1.00000 1.00000
     9   hdd  0.90959         osd.9              up  1.00000 1.00000
    10   hdd  0.90959         osd.10             up  1.00000 1.00000
    -9        2.72878     host vCNT-host-4
    12   hdd  0.90959         osd.12             up  1.00000 1.00000
    13   hdd  0.90959         osd.13             up  1.00000 1.00000
    14   hdd  0.90959         osd.14             up  1.00000 1.00000
    

    Could you please offer something else what I can do to make better the performance?
     
  12. Datanat

    Datanat New Member
    Proxmox VE Subscriber

    Joined:
    Apr 11, 2018
    Messages:
    16
    Likes Received:
    3
    Hello @kaltsi, it's a tough one.

    Slow requests can be caused by network issues, disks issues, or event controller issues as stated by @Alwin.

    Looks like you have a 12x7200RPMs SATA drives. Do you use filestore or bluestore ?

    In lab we tested filestore with 12 SATA drives too and it was not good until we put journal on SSD.

    An then this :

    if the drives are full : performance dropdown is catastrophic, SATA drives performances can drop to a few iops with indecent latency if near full.

    So assuming a 3/2 pool, 12x70 nominal iops but here more like 20/30 iops per drive with 30+ms latency could end with the results you have.

    rule of thumb (12x30/3/3) = between 27 to 40 large iops issued

    Regards,
     
  13. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,571
    Likes Received:
    138
    I would say the contrary, it got better, as your rados bench is now writing stable. But I agree with @Datanat, that you will need faster disks (incl. SSD/NVMe). Also you need to consider, that you can't fill up your cluster, as you need to have space on ceph if a host fails and its data has to be replicated to the other nodes.
     
  14. kaltsi

    kaltsi Member

    Joined:
    Mar 22, 2013
    Messages:
    32
    Likes Received:
    2
    Hi all,

    Thank you for your helphull answers.

    The system has 12x 1T 7.2k SATA HDD. (4 nodes x 3 disk for ceph)
    Yes you are right the system is better than previously. But the the disks are very very slow.
    The system uses bluestore OSD. Therefore I can not use SSD for journal.

    I moved a 10G size file with scp between two nodes and between a node and a virtual node.
    Code:
    10GB 306.9MB/s   00:33 from node OS disk to node disk
    10GB  11.5MB/s   14:47 from node OS disk to Ceph VM disk
    10GB  46.4MB/s   03:4 from node OS disk to Disk VM disk
    I can transfer data between two nodes more then 1GB/s speed. In this case the here is no network issue.
    Network-1.PNG Network-2.PNG
    But the network is scarcely used.

    I looked for some performance report on the internet. There are 4 times better performance with similar HW then I have.

    Code:
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               4.10    0.00    1.68    0.69    0.00   93.53
    
    Device:            tps    MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
    sdb              80.73         3.20         2.91    2895238    2633180
    sdc              82.50         3.14         2.89    2839708    2611683
    sdd              84.38         3.39         3.00    3060710    2705882
    


    It seems I have to find another solution.
     
  15. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,571
    Likes Received:
    138
    Bluestore allows on creation of the OSD to put the DB/WAL on different devices (pveceph help createosd).

    Depending on the available cache the results may look very different on each run. Better use fio to test the IO performance.
    Look at this thread for more information. https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/

    Please check with iperf what bandwidth you have between the links, to verify the graphs above.
     
  16. kaltsi

    kaltsi Member

    Joined:
    Mar 22, 2013
    Messages:
    32
    Likes Received:
    2
    Hi,

    iperf
    Code:
    [ ID] Interval       Transfer     Bandwidth
    [  6]  0.0-10.0 sec  2.73 GBytes  2.35 Gbits/sec
    [  4]  0.0-10.0 sec  2.73 GBytes  2.35 Gbits/sec
    [  5]  0.0-10.0 sec  2.73 GBytes  2.35 Gbits/sec
    [  3]  0.0-10.0 sec  2.73 GBytes  2.35 Gbits/sec
    [SUM]  0.0-10.0 sec  10.9 GBytes  9.39 Gbits/sec
    
    fio
    4ktest
    Code:
    fio --filename=/dev/sdd --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=4k --rwmixread=100 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=4ktest
    4ktest: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16
    ...
    fio-2.16
    Starting 16 processes
    Jobs: 16 (f=16): [r(16)] [100.0% done] [920KB/0KB/0KB /s] [230/0/0 iops] [eta 00m:00s]
    4ktest: (groupid=0, jobs=16): err= 0: pid=1373324: Mon Jul 16 16:21:11 2018
      read : io=56844KB, bw=959234B/s, iops=234, runt= 60682msec
        slat (usec): min=1, max=1016.5K, avg=61651.56, stdev=216010.11
        clat (msec): min=9, max=2878, avg=1025.78, stdev=413.95
         lat (msec): min=14, max=2951, avg=1087.44, stdev=466.56
        clat percentiles (msec):
         |  1.00th=[   46],  5.00th=[  490], 10.00th=[  791], 20.00th=[  840],
         | 30.00th=[  865], 40.00th=[  889], 50.00th=[  914], 60.00th=[  938],
         | 70.00th=[  979], 80.00th=[ 1074], 90.00th=[ 1778], 95.00th=[ 1844],
         | 99.00th=[ 1958], 99.50th=[ 1991], 99.90th=[ 2540], 99.95th=[ 2606],
         | 99.99th=[ 2868]
        lat (msec) : 10=0.01%, 20=0.10%, 50=1.06%, 100=2.14%, 250=1.44%
        lat (msec) : 500=0.27%, 750=1.75%, 1000=66.63%, 2000=26.21%, >=2000=0.39%
      cpu          : usr=0.01%, sys=0.00%, ctx=2526, majf=0, minf=148
      IO depths    : 1=0.1%, 2=0.2%, 4=0.5%, 8=0.9%, 16=98.3%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=99.9%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued    : total=r=14211/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
         latency   : target=0, window=0, percentile=100.00%, depth=16
    
    Run status group 0 (all jobs):
       READ: io=56844KB, aggrb=936KB/s, minb=936KB/s, maxb=936KB/s, mint=60682msec, maxt=60682msec
    
    Disk stats (read/write):
      sdd: ios=14343/0, merge=0/0, ticks=8086248/0, in_queue=8090536, util=99.87%
    
    8k7030test
    Code:
    fio --filename=/dev/sdd --direct=1 --rw=randrw --refill_buffers --norandommap --randrepeat=0 --ioengine=libaio --bs=8k --rwmixread=70 --iodepth=16 --numjobs=16 --runtime=60 --group_reporting --name=8k7030test
    8k7030test: (g=0): rw=randrw, bs=8K-8K/8K-8K/8K-8K, ioengine=libaio, iodepth=16
    ...
    fio-2.16
    Starting 16 processes
    Jobs: 14 (f=14): [f(2),_(1),f(12),_(1)] [100.0% done] [1712KB/536KB/0KB /s] [214/67/0 iops] [eta 00m:00s]
    8k7030test: (groupid=0, jobs=16): err= 0: pid=1375632: Mon Jul 16 16:24:58 2018
      read : io=69456KB, bw=1140.1KB/s, iops=142, runt= 60879msec
        slat (usec): min=1, max=1186.1K, avg=72396.31, stdev=252419.78
        clat (msec): min=13, max=3253, avg=1182.40, stdev=478.54
         lat (msec): min=13, max=3526, avg=1254.80, stdev=543.08
        clat percentiles (msec):
         |  1.00th=[   49],  5.00th=[  237], 10.00th=[  914], 20.00th=[  971],
         | 30.00th=[ 1004], 40.00th=[ 1037], 50.00th=[ 1057], 60.00th=[ 1090],
         | 70.00th=[ 1139], 80.00th=[ 1237], 90.00th=[ 2057], 95.00th=[ 2147],
         | 99.00th=[ 2278], 99.50th=[ 2343], 99.90th=[ 3064], 99.95th=[ 3130],
         | 99.99th=[ 3261]
      write: io=29104KB, bw=489536B/s, iops=59, runt= 60879msec
        slat (usec): min=1, max=1182.2K, avg=68616.06, stdev=247002.88
        clat (msec): min=19, max=4203, avg=1195.66, stdev=479.54
         lat (msec): min=19, max=4203, avg=1264.28, stdev=534.50
        clat percentiles (msec):
         |  1.00th=[   61],  5.00th=[  865], 10.00th=[  922], 20.00th=[  971],
         | 30.00th=[ 1012], 40.00th=[ 1037], 50.00th=[ 1057], 60.00th=[ 1090],
         | 70.00th=[ 1139], 80.00th=[ 1221], 90.00th=[ 2057], 95.00th=[ 2147],
         | 99.00th=[ 2409], 99.50th=[ 3097], 99.90th=[ 3326], 99.95th=[ 3490],
         | 99.99th=[ 4228]
        lat (msec) : 20=0.02%, 50=0.92%, 100=1.73%, 250=1.95%, 500=0.28%
        lat (msec) : 750=0.40%, 1000=22.09%, 2000=59.75%, >=2000=12.86%
      cpu          : usr=0.01%, sys=0.01%, ctx=2133, majf=0, minf=164
      IO depths    : 1=0.1%, 2=0.3%, 4=0.5%, 8=1.0%, 16=98.1%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=99.9%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued    : total=r=8682/w=3638/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
         latency   : target=0, window=0, percentile=100.00%, depth=16
    
    Run status group 0 (all jobs):
       READ: io=69456KB, aggrb=1140KB/s, minb=1140KB/s, maxb=1140KB/s, mint=60879msec, maxt=60879msec
      WRITE: io=29104KB, aggrb=478KB/s, minb=478KB/s, maxb=478KB/s, mint=60879msec, maxt=60879msec
    
    Disk stats (read/write):
      sdd: ios=8777/3637, merge=0/0, ticks=5571560/2499196, in_queue=8108924, util=99.89%
    
     
  17. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,571
    Likes Received:
    138
    When you run the test form page three of the benchmark paper and compare the results, then they are at the level of the Seagate Constellation. There is not really much more to get with those 7.2k SATA disks.

    For faster results, you need either more SATA disks or SSD/NVMe. With the later, you can use a separate pool to gain a high speed vs high capacity pool.
     
  18. kaltsi

    kaltsi Member

    Joined:
    Mar 22, 2013
    Messages:
    32
    Likes Received:
    2
    Thank you for your help and hints.
    I replaced some SATA disks to SAS 10k disk and I added one SSD to every nodes for handle the ceph db.
    I got a 10 times better ceph. This configuration is enoght for my activities.
    It was not easy because I could not shut down the cluster.
     
    AlexLup and Alwin like this.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice