Ceph error: slow and stuck requests are blocked

Discussion in 'Proxmox VE: Installation and configuration' started by klowet, Apr 20, 2019.

  1. klowet

    klowet New Member

    Joined:
    Jun 22, 2018
    Messages:
    23
    Likes Received:
    0
    Hi

    I'm running a hyper converged cluster of 4 nodes, with each 3 disks for Ceph. In the Ceph logs, I'm getting these errors. Sometimes the errors disappears for a few hours and then start coming back again.

    Does someone knows where these errors are coming from? I have looked in the OSD logs, but they don't have much more information. All disks are new SSDs.

    Code:
    2019-04-20 20:46:48.621869 osd.1 osd.1 10.0.15.1:6808/4634 45 : cluster [WRN] slow request 30.781274 seconds old, received at 2019-04-20 20:46:17.840538: osd_op(client.330079.0:2826 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 1635328~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
    2019-04-20 20:46:48.621872 osd.1 osd.1 10.0.15.1:6808/4634 46 : cluster [WRN] slow request 30.780455 seconds old, received at 2019-04-20 20:46:17.841356: osd_op(client.330079.0:2827 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 2159616~516096] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
    2019-04-20 20:46:48.621874 osd.1 osd.1 10.0.15.1:6808/4634 47 : cluster [WRN] slow request 30.779906 seconds old, received at 2019-04-20 20:46:17.841905: osd_op(client.330079.0:2828 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 2675712~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
    2019-04-20 20:46:48.621877 osd.1 osd.1 10.0.15.1:6808/4634 48 : cluster [WRN] slow request 30.779369 seconds old, received at 2019-04-20 20:46:17.842442: osd_op(client.330079.0:2829 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 3200000~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
    2019-04-20 20:46:48.621879 osd.1 osd.1 10.0.15.1:6808/4634 49 : cluster [WRN] slow request 30.778854 seconds old, received at 2019-04-20 20:46:17.842957: osd_op(client.330079.0:2830 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 3724288~8192] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
    2019-04-20 20:47:18.484178 osd.1 osd.1 10.0.15.1:6808/4634 50 : cluster [WRN] 8 slow requests, 5 included below; oldest blocked for > 2398.468450 secs
    2019-04-20 20:47:18.484187 osd.1 osd.1 10.0.15.1:6808/4634 51 : cluster [WRN] slow request 60.643557 seconds old, received at 2019-04-20 20:46:17.840538: osd_op(client.330079.0:2826 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 1635328~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
    2019-04-20 20:47:18.484205 osd.1 osd.1 10.0.15.1:6808/4634 52 : cluster [WRN] slow request 60.642739 seconds old, received at 2019-04-20 20:46:17.841356: osd_op(client.330079.0:2827 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 2159616~516096] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
    2019-04-20 20:47:18.484214 osd.1 osd.1 10.0.15.1:6808/4634 53 : cluster [WRN] slow request 60.642190 seconds old, received at 2019-04-20 20:46:17.841905: osd_op(client.330079.0:2828 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 2675712~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
    2019-04-20 20:47:18.484222 osd.1 osd.1 10.0.15.1:6808/4634 54 : cluster [WRN] slow request 60.641652 seconds old, received at 2019-04-20 20:46:17.842442: osd_op(client.330079.0:2829 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 3200000~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
    2019-04-20 20:47:18.484227 osd.1 osd.1 10.0.15.1:6808/4634 55 : cluster [WRN] slow request 60.641138 seconds old, received at 2019-04-20 20:46:17.842957: osd_op(client.330079.0:2830 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 3724288~8192] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
    2019-04-20 20:47:44.131137 osd.0 osd.0 10.0.15.1:6804/4458 21 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 480.162172 secs
    2019-04-20 20:47:44.131141 osd.0 osd.0 10.0.15.1:6804/4458 22 : cluster [WRN] slow request 480.162172 seconds old, received at 2019-04-20 20:39:43.968933: osd_op(client.317399.0:83947 1.13e 1:7ce8dce9:::rbd_data.cade6b8b4567.000000000000047d:head [write 3903488~8192] snapc 0=[] ondisk+write+known_if_redirected e128) currently sub_op_commit_rec from 8
    2019-04-20 20:47:38.969191 osd.5 osd.5 10.0.15.2:6800/4117 47 : cluster [WRN] 3 slow requests, 1 included below; oldest blocked for > 2418.973605 secs
    2019-04-20 20:47:38.969206 osd.5 osd.5 10.0.15.2:6800/4117 48 : cluster [WRN] slow request 480.606888 seconds old, received at 2019-04-20 20:39:38.362245: osd_op(client.317399.0:83856 1.b5 1:ad3befa2:::rbd_data.cade6b8b4567.0000000000000aff:head [write 3215360~8192] snapc 0=[] ondisk+write+known_if_redirected e128) currently sub_op_commit_rec from 7
    2019-04-20 20:48:10.564216 osd.11 osd.11 10.0.15.4:6800/3880 17 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 480.697435 secs
    2019-04-20 20:48:10.564223 osd.11 osd.11 10.0.15.4:6800/3880 18 : cluster [WRN] slow request 480.697435 seconds old, received at 2019-04-20 20:40:09.866755: osd_op(client.317399.0:83959 1.167 1:e68d1179:::rbd_data.cade6b8b4567.0000000000000b03:head [write 1552384~4096] snapc 0=[] ondisk+write+known_if_redirected e128) currently sub_op_commit_rec from 6
    2019-04-20 20:48:18.209207 osd.1 osd.1 10.0.15.1:6808/4634 56 : cluster [WRN] 8 slow requests, 5 included below; oldest blocked for > 2458.193526 secs
    2019-04-20 20:48:18.209210 osd.1 osd.1 10.0.15.1:6808/4634 57 : cluster [WRN] slow request 120.368633 seconds old, received at 2019-04-20 20:46:17.840538: osd_op(client.330079.0:2826 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 1635328~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
    2019-04-20 20:48:18.209213 osd.1 osd.1 10.0.15.1:6808/4634 58 : cluster [WRN] slow request 120.367814 seconds old, received at 2019-04-20 20:46:17.841356: osd_op(client.330079.0:2827 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 2159616~516096] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
    2019-04-20 20:48:18.209215 osd.1 osd.1 10.0.15.1:6808/4634 59 : cluster [WRN] slow request 120.367266 seconds old, received at 2019-04-20 20:46:17.841905: osd_op(client.330079.0:2828 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 2675712~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
    2019-04-20 20:48:18.209219 osd.1 osd.1 10.0.15.1:6808/4634 60 : cluster [WRN] slow request 120.366728 seconds old, received at 2019-04-20 20:46:17.842442: osd_op(client.330079.0:2829 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 3200000~524288] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
    2019-04-20 20:48:18.209222 osd.1 osd.1 10.0.15.1:6808/4634 61 : cluster [WRN] slow request 120.366214 seconds old, received at 2019-04-20 20:46:17.842957: osd_op(client.330079.0:2830 1.a5 1:a51d67d6:::rbd_data.cade6b8b4567.0000000000001181:head [write 3724288~8192] snapc 0=[] ondisk+write+known_if_redirected e129) currently sub_op_commit_rec from 6
    2019-04-20 20:48:31.307597 osd.7 osd.7 10.0.15.3:6800/4070 46 : cluster [WRN] 1 slow requests, 1 included below; oldest blocked for > 480.182633 secs
    2019-04-20 20:48:31.307617 osd.7 osd.7 10.0.15.3:6800/4070 47 : cluster [WRN] slow request 480.182633 seconds old, received at 2019-04-20 20:40:31.124905: osd_op(client.316949.0:27762 1.20 1:0478ad06:::rbd_data.a41f6b8b4567.000000000000105c:head [write 3428352~4096] snapc 0=[] ondisk+write+known_if_redirected e128) currently sub_op_commit_rec from 2
    
    Code:
    proxmox-ve: 5.4-1 (running kernel: 4.15.18-12-pve)
    pve-manager: 5.4-3 (running version: 5.4-3/0a6eaa62)
    pve-kernel-4.15: 5.3-3
    pve-kernel-4.15.18-12-pve: 4.15.18-35
    ceph: 12.2.11-pve1
    
    Code:
    ID CLASS WEIGHT   TYPE NAME                STATUS REWEIGHT PRI-AFF
    -1       20.95789 root default                                
    -3        5.23947     host anr1-a-hcz01s01                    
     0   ssd  1.74649         osd.0                up  1.00000 1.00000
     1   ssd  1.74649         osd.1                up  1.00000 1.00000
     2   ssd  1.74649         osd.2                up  1.00000 1.00000
    -5        5.23947     host anr1-a-hcz01s02                    
     3   ssd  1.74649         osd.3                up  1.00000 1.00000
     4   ssd  1.74649         osd.4                up  1.00000 1.00000
     5   ssd  1.74649         osd.5                up  1.00000 1.00000
    -7        5.23947     host anr1-a-hcz01s03                    
     6   ssd  1.74649         osd.6                up  1.00000 1.00000
     7   ssd  1.74649         osd.7                up  1.00000 1.00000
     8   ssd  1.74649         osd.8                up  1.00000 1.00000
    -9        5.23947     host anr1-a-hcz01s04                    
     9   ssd  1.74649         osd.9                up  1.00000 1.00000
    10   ssd  1.74649         osd.10               up  1.00000 1.00000
    11   ssd  1.74649         osd.11               up  1.00000 1.00000
    
    Code:
    HEALTH_ERR 10 slow requests are blocked > 32 sec. Implicated osds 0,7,11; 6 stuck requests are blocked > 4096 sec. Implicated osds 1,5,8
    REQUEST_SLOW 10 slow requests are blocked > 32 sec. Implicated osds 0,7,11
        9 ops are blocked > 2097.15 sec
        1 ops are blocked > 262.144 sec
        osds 0,7,11 have blocked requests > 2097.15 sec
    REQUEST_STUCK 6 stuck requests are blocked > 4096 sec. Implicated osds 1,5,8
        6 ops are blocked > 4194.3 sec
        osds 1,5,8 have stuck requests > 4194.3 sec
    
    Code:
    [global]
        auth client required = cephx
        auth cluster required = cephx
        auth service required = cephx
        cluster network = 10.0.15.0/24
        fsid = 7d745d6c-5523-4c5e-89dd-a19a4306f8f0
        keyring = /etc/pve/priv/$cluster.$name.keyring
        mon allow pool delete = true
        osd journal size = 5120
        osd pool default min size = 2
        osd pool default size = 3
        public network = 10.0.15.0/24
    
    [osd]
        keyring = /var/lib/ceph/osd/ceph-$id/keyring
    
    [mon.anr1-a-hcz01s02]
        host = anr1-a-hcz01s02
        mon addr = 10.0.15.2:6789
    
    [mon.anr1-a-hcz01s03]
        host = anr1-a-hcz01s03
        mon addr = 10.0.15.3:6789
    
    [mon.anr1-a-hcz01s01]
        host = anr1-a-hcz01s01
        mon addr = 10.0.15.1:6789
    
     
    #1 klowet, Apr 20, 2019
    Last edited: Apr 20, 2019
  2. aychprox

    aychprox Member

    Joined:
    Oct 27, 2015
    Messages:
    61
    Likes Received:
    4
    how you define your pools?

    erasure coded with cache tier or replicated?
     
  3. klowet

    klowet New Member

    Joined:
    Jun 22, 2018
    Messages:
    23
    Likes Received:
    0
    Code:
    root@anr1-a-hcz01s01:/var/log/ceph# ceph osd pool get rbd all
    size: 3
    min_size: 2
    crash_replay_interval: 0
    pg_num: 512
    pgp_num: 512
    crush_rule: replicated_rule
    hashpspool: true
    nodelete: false
    nopgchange: false
    nosizechange: false
    write_fadvise_dontneed: false
    noscrub: false
    nodeep-scrub: false
    use_gmt_hitset: 1
    auid: 0
    fast_read: 0
    
    Code:
    root@anr1-a-hcz01s01:/var/log/ceph# ceph osd crush rule dump
    [
        {
            "rule_id": 0,
            "rule_name": "replicated_rule",
            "ruleset": 0,
            "type": 1,
            "min_size": 1,
            "max_size": 10,
            "steps": [
                {
                    "op": "take",
                    "item": -1,
                    "item_name": "default"
                },
                {
                    "op": "chooseleaf_firstn",
                    "num": 0,
                    "type": "host"
                },
                {
                    "op": "emit"
                }
            ]
        }
    ]
    
    
     
  4. sb-jw

    sb-jw Active Member

    Joined:
    Jan 23, 2018
    Messages:
    321
    Likes Received:
    28
    Is it across all Nodes or is it more related to specify Nodes / OSDs? Is it occuring during Backup or other heavy load? Do you have only one Pool with 512 PG and a replica of 3? Could you send us the output of "ceph -s" (im interessting in the workload of the Cluster) and "ceph pg dump"?

    What Hardware do you use for the CEPH Cluster?
    Do you have some Metrics (Check_MK, Graphite / Grafana, etc)?
     
  5. klowet

    klowet New Member

    Joined:
    Jun 22, 2018
    Messages:
    23
    Likes Received:
    0
    I think I may have found the cause and the solution. The nodes are connected with multiple networks: management, backup and Ceph. The ceph public (and sync) network have their own physical network. The management network have the default gateway and a dns lookup server defined. So If I ping another node, it replies with its management network. So when Ceph is looking for its colleagues, it does not get the replies from the Ceph interface but the management IP. I have added now the Ceph IP's from all nodes in all the /etc/hosts file so that replies are coming from the Ceph interface.

    Since that change - yesterday evening, just after my first post here - I haven't got errors anymore.
    I'll come back tomorrow or later next week.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice