howto stop or remove a ops in ceph

huky

Renowned Member
Jul 1, 2016
70
3
73
44
Chongqing, China
My ceph cluster is unhealth:
Code:
            1 filesystem is degraded
            11 PGs pending on creation
            Reduced data availability: 202 pgs inactive, 6 pgs down
            Degraded data redundancy: 269/10009374 objects degraded (0.003%), 17 pgs degraded, 3 pgs undersized
            2 daemons have recently crashed
            17 slow ops, oldest one blocked for 6512 sec, daemons [osd.30,osd.32,osd.35] have slow ops.

How could I find and stop the OPS?
 
Last edited:
Hi,

daemons [osd.30,osd.32,osd.35] have slow ops.

does integers are the OSD IDs, so first thing would be checking those disks health and status (e.g., smart health data) and the host those OSDs reside on, check also dmesg (kernel log) and journal for any errors on disk or ceph daemons.

Which Ceph and PVE version is in use in that setup?

What does the setup look in general? I.e., how many nodes, what networks (and bandwidth) how many OSDs per node, which type of disk tech (NVMe, SSD or spinner), ...?
As sometimes, this can stem from an overloaded part in the cluster.

You may get some more details on the reason for those operations being slow by following:
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-osd/#debugging-slow-requests
 
thanks.
the disk smart is health.
pve is v6 upgrade from v5, 9 nodes.
ceph is Nautilus upgrade from Luminous.
the cluster has 43 OSDs(most size 2T) with 9 ssd(a ssd every node) and work normally.

i added a 4T at January 13, The process went smoothly.
then I added a 10T at January 14, The entire cluster and all VMs and CTs are very slow. now after 2d, it is still Not available.
I want to stop ceph reblance and to use VMs and CTs now.

thanks again.

Code:
  cluster:
    id:     225397cb-7b69-4c24-8c34-f43951f42974
    health: HEALTH_WARN
            1 filesystem is degraded
            12 PGs pending on creation
            Reduced data availability: 202 pgs inactive
            Degraded data redundancy: 269/9985113 objects degraded (0.003%), 17 pgs degraded, 3 pgs undersized
            2 daemons have recently crashed
            68 slow ops, oldest one blocked for 1930 sec, daemons [osd.11,osd.30,osd.32,osd.35] have slow ops.
 
  services:
    mon: 3 daemons, quorum node003,node009,node008 (age 31m)
    mgr: node003(active, since 31m), standbys: node009, node008
    mds: cephfs1:3/3 {0=node003=up:replay,1=node009=up:resolve,2=node008=up:resolve}
    osd: 44 osds: 44 up (since 32m), 44 in (since 2d); 177 remapped pgs
 
  task status:
    scrub status:
        mds.node003: idle
        mds.node008: idle
        mds.node009: idle
 
  data:
    pools:   7 pools, 2720 pgs
    objects: 3.33M objects, 12 TiB
    usage:   38 TiB used, 57 TiB / 94 TiB avail
    pgs:     0.368% pgs unknown
             7.059% pgs not active
             269/9985113 objects degraded (0.003%)
             307894/9985113 objects misplaced (3.084%)
             2518 active+clean
             174  activating+remapped
             14   activating+degraded
             10   unknown
             3    activating+undersized+degraded+remapped
             1    activating
Hi,



does integers are the OSD IDs, so first thing would be checking those disks health and status (e.g., smart health data) and the host those OSDs reside on, check also dmesg (kernel log) and journal for any errors on disk or ceph daemons.

Which Ceph and PVE version is in use in that setup?

What does the setup look in general? I.e., how many nodes, what networks (and bandwidth) how many OSDs per node, which type of disk tech (NVMe, SSD or spinner), ...?
As sometimes, this can stem from an overloaded part in the cluster.

You may get some more details on the reason for those operations being slow by following:
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-osd/#debugging-slow-requests
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!