Proxmox Cluster stuck

marco011ET

Member
Mar 5, 2021
8
0
6
35
Hi all, i have several problems with a cluster.
this cluster have
6 nodes every nodes have 10 OSD(3TB each)
1 node with 3 OSD(10TB each)

this cluster remained unmanaged for approximately a year and a half
after a fault in the air conditioning system, many ceph OSDs fell, and the cluster began to no longer work
I can activate some VMs, but many cannot.
I proceeded to update each node to the version that the repositories allowed me
all nodes have pve-manager/7.4-18/b1f94095 except one node, this node seems fault the updates and the only way i have to activate it is use a old kernel version 5.11.27

this is the ceph situation:
Code:
 cluster:
    id:     f17eb6a9-5bfd-4a22-b064-eaa0204a4892
    health: HEALTH_WARN
            clock skew detected on mon.CAARPVE4, mon.CAARPVE5, mon.CAARPVE6
            2/6 mons down, quorum CAARPVE3,CAARPVE4,CAARPVE5,CAARPVE6
            4 osds down
            6 nearfull osd(s)
            all OSDs are running pacific or later but require_osd_release < pacific
            Reduced data availability: 123 pgs inactive, 7 pgs down, 1 pg stale
            Low space hindering backfill (add storage if this doesn't resolve itself): 16 pgs backfill_toofull
            Degraded data redundancy: 6714845/33588930 objects degraded (19.991%), 533 pgs degraded, 584 pgs undersized
            6 pgs not deep-scrubbed in time
            3 pgs not scrubbed in time
            2 pool(s) nearfull
            1 daemons have recently crashed
            96651 slow ops, oldest one blocked for 7752 sec, mon.CAARPVE4 has slow ops

  services:
    mon: 6 daemons, quorum CAARPVE3,CAARPVE4,CAARPVE5,CAARPVE6 (age 71m), out of quorum: CAARPVE2, CAARPVE7
    mgr: CAARPVE5(active, since 2h), standbys: CAARPVE1, CAARPVE3
    osd: 66 osds: 45 up (since 42m), 49 in (since 32m); 713 remapped pgs

  data:
    pools:   2 pools, 1088 pgs
    objects: 11.20M objects, 27 TiB
    usage:   88 TiB used, 59 TiB / 146 TiB avail
    pgs:     11.305% pgs not active
             6714845/33588930 objects degraded (19.991%)
             5838798/33588930 objects misplaced (17.383%)
             300 active+undersized+degraded+remapped+backfill_wait
             245 active+clean
             189 active+remapped+backfill_wait
             107 active+undersized+degraded
             87  undersized+degraded+remapped+backfill_wait+peered
             58  active+clean+remapped
             27  active+undersized+remapped+backfill_wait
             12  active+undersized+remapped
             10  undersized+degraded+remapped+backfilling+peered
             8   active+undersized+degraded+remapped+backfill_wait+backfill_toofull
             8   active+undersized
             7   undersized+degraded+peered
             7   active+undersized+degraded+remapped+backfilling
             7   down
             6   undersized+degraded+remapped+backfill_wait+backfill_toofull+peered
             5   undersized+remapped+backfill_wait+peered
             2   active+remapped+backfill_wait+backfill_toofull
             1   stale+active+clean
             1   undersized+peered
             1   active+recovery_wait+degraded+remapped

  io:
    recovery: 230 MiB/s, 57 objects/s

  progress:
    Global Recovery Event (2h)
      [=======.....................] (remaining: 5h)
i try to man a VM disk with rbd map command but when i try to export the data, nothing happened it remains stuck.
there are any action i can do it for fix it?
also many times the monitor go in error (500) leave me stucked
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!