slow requests are blocked - very slow VMs

ssaman

Active Member
Oct 28, 2015
38
2
28
Hi all,
since today, we have an issue with our proxmox / ceph.
Code:
  cluster:
    id:     e999c2ba-bd91-41d1-92b1-c7874b4b2b40
    health: HEALTH_WARN
            498 slow requests are blocked > 32 sec. Implicated osds 0,2,7,9,11,12,13

  services:
    mon: 3 daemons, quorum node1,node2,node3
    mgr: node1(active), standbys: c6-node1, node3, node2
    osd: 14 osds: 14 up, 14 in

  data:
    pools:   1 pools, 512 pgs
    objects: 880.41k objects, 3.18TiB
    usage:   9.71TiB used, 65.4TiB / 75.1TiB avail
    pgs:     512 active+clean

  io:
    client:   316KiB/s rd, 210KiB/s wr, 25op/s rd, 20op/s wr
Code:
ID  CLASS WEIGHT   REWEIGHT SIZE    USE     AVAIL   %USE  VAR  PGS TYPE NAME
 -1       75.09892        - 75.1TiB 9.71TiB 65.4TiB 12.93 1.00   - root default
 -3       31.43779        - 31.4TiB 3.35TiB 28.1TiB 10.65 0.82   -     host node1
  0   hdd  7.27730  1.00000 7.28TiB  789GiB 6.51TiB 10.58 0.82 124         osd.0
  1   hdd  7.27730  1.00000 7.28TiB  797GiB 6.50TiB 10.69 0.83 125         osd.1
  2   hdd  7.27730  1.00000 7.28TiB  821GiB 6.48TiB 11.02 0.85 129         osd.2
  3   hdd  7.27730  1.00000 7.28TiB  853GiB 6.44TiB 11.45 0.89 134         osd.3
  4   ssd  1.45540  1.00000 1.46TiB 87.3GiB 1.37TiB  5.86 0.45   0         osd.4
  5   ssd  0.87320  1.00000  894GiB 79.7GiB  814GiB  8.92 0.69   0         osd.5
 -7       21.83057        - 21.8TiB 3.18TiB 18.6TiB 14.58 1.13   -     host node2
  6   hdd  5.45740  1.00000 5.46TiB  811GiB 4.67TiB 14.52 1.12 127         osd.6
  7   hdd  5.45740  1.00000 5.46TiB  794GiB 4.68TiB 14.21 1.10 125         osd.7
  8   hdd  5.45789  1.00000 5.46TiB  776GiB 4.70TiB 13.89 1.07 122         osd.8
  9   hdd  5.45789  1.00000 5.46TiB  877GiB 4.60TiB 15.70 1.21 138         osd.9
-10       21.83057        - 21.8TiB 3.18TiB 18.7TiB 14.56 1.13   -     host node3
 10   hdd  5.45740  1.00000 5.46TiB  834GiB 4.64TiB 14.93 1.15 131         osd.10
 11   hdd  5.45789  1.00000 5.46TiB  766GiB 4.71TiB 13.71 1.06 121         osd.11
 12   hdd  5.45740  1.00000 5.46TiB  889GiB 4.59TiB 15.90 1.23 140         osd.12
 13   hdd  5.45789  1.00000 5.46TiB  766GiB 4.71TiB 13.71 1.06 120         osd.13
                      TOTAL 75.1TiB 9.71TiB 65.4TiB 12.93
MIN/MAX VAR: 0.45/1.23  STDDEV: 2.79
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class hdd
device 1 osd.1 class hdd
device 2 osd.2 class hdd
device 3 osd.3 class hdd
device 4 osd.4 class ssd
device 5 osd.5 class ssd
device 6 osd.6 class hdd
device 7 osd.7 class hdd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
device 12 osd.12 class hdd
device 13 osd.13 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root

# buckets
host node1 {
    id -3        # do not change unnecessarily
    id -2 class hdd        # do not change unnecessarily
    id -5 class ssd        # do not change unnecessarily
    # weight 31.438
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 7.277
    item osd.1 weight 7.277
    item osd.2 weight 7.277
    item osd.3 weight 7.277
    item osd.4 weight 1.455
    item osd.5 weight 0.873
}
host node2 {
    id -7        # do not change unnecessarily
    id -8 class hdd        # do not change unnecessarily
    id -9 class ssd        # do not change unnecessarily
    # weight 21.831
    alg straw2
    hash 0    # rjenkins1
    item osd.6 weight 5.457
    item osd.7 weight 5.457
    item osd.8 weight 5.458
    item osd.9 weight 5.458
}
host node3 {
    id -10        # do not change unnecessarily
    id -11 class hdd        # do not change unnecessarily
    id -12 class ssd        # do not change unnecessarily
    # weight 21.831
    alg straw2
    hash 0    # rjenkins1
    item osd.10 weight 5.457
    item osd.13 weight 5.458
    item osd.12 weight 5.457
    item osd.11 weight 5.458
}
root default {
    id -1        # do not change unnecessarily
    id -4 class hdd        # do not change unnecessarily
    id -6 class ssd        # do not change unnecessarily
    # weight 75.099
    alg straw2
    hash 0    # rjenkins1
    item node1 weight 31.438
    item node2 weight 21.831
    item node3 weight 21.831
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}
rule replicated_ssd {
    id 1
    type replicated
    min_size 1
    max_size 10
We already activated ceph balancer
Code:
{
    "active": true,
    "plans": [],
    "mode": "none"
}
I hope someone can help us.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!