Recent content by hahosting

  1. H

    Failed OSD for ceph stale pg

    as an fyi this is the output when activating the osd root@vms-ceph112:~# journalctl -u ceph-osd@1203.service -n 50 -- Journal begins at Fri 2023-06-30 13:25:22 BST, ends at Sat 2023-11-25 14:19:28 GMT. -- Nov 25 14:18:56 vms-ceph112 ceph-osd[116397]: 20: (RocksDBStore::do_open(std::ostream&...
  2. H

    Failed OSD for ceph stale pg

    interestingly i have run an fsck on the command you sent root@vms-ceph112:/var/lib/ceph/osd# ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-1203 fsck success And i get a success
  3. H

    Failed OSD for ceph stale pg

    yes: root@vms-ceph112:/var/lib/ceph/osd/ceph-1203# ls block block.wal ceph_fsid fsid keyring ready type whoami root@vms-ceph112:/var/lib/ceph/osd/ceph-1203#
  4. H

    Failed OSD for ceph stale pg

    im afraid the new disk shows no valid partions and wont activate. I have put the broken osd back in and it does show albeit wont start. If i try to extract the pg as above i get root@vms-ceph112:/var/lib/ceph/osd# ceph-objectstore-tool --op export --pgid 8.11 --data-path...
  5. H

    Failed OSD for ceph stale pg

    The disk has been cloned to a new one and heading back to the DC with it now so will update shortly
  6. H

    Failed OSD for ceph stale pg

    Thanks for your detailed reply. Yes all our pools are now 3/2 except this old one. We have set different crush rules for the different size disks for the various different pools we have created. So the 10tb disks are in a 3/2 backup crush rule pool whereas the 2tb disks are in a crush rule...
  7. H

    Failed OSD for ceph stale pg

    great stuff yes i found the above commands before drive removal but without the disk mounted i couldnt do anything with it. I will update here when drive returns
  8. H

    Failed OSD for ceph stale pg

    ok thats great thanks for your help. I have set noscrub and nodeep-scrub for now I think at this point all i can do is wait for the recovered osd device to return and see if a. it can be activated and if not b. i can extract the pg's i cant even start what looks like all the vm's with disks...
  9. H

    Failed OSD for ceph stale pg

    is it worth disabling scrubbing at this point?
  10. H

    Failed OSD for ceph stale pg

    if we decide data is lost can i just delete the dead pg's and see what does and doesn't start from a vm point of view and restore what doesn't from backup?
  11. H

    Failed OSD for ceph stale pg

    yes i agree thankyou, we are trying to recover the data from the disk currently to a new device. If i can recover the data is it possible to extract the missing pg's and add them back to the cluster or am i just wasting my time?
  12. H

    Failed OSD for ceph stale pg

    root@vms-ceph112:/home/richard.admin# ceph pg dump_stuck stale PG_STAT STATE UP UP_PRIMARY ACTING ACTING_PRIMARY 8.1e0 stale+active+undersized+degraded [1203] 1203 [1203] 1203 8.11 stale+active+undersized+degraded [1203] 1203...
  13. H

    Failed OSD for ceph stale pg

    yes the query command just sits forever. I will try it again to be sure
  14. H

    Failed OSD for ceph stale pg

    yes 1203 being the issue. we have removed the disk and its reading from a recovery point of view but very slowly so im assuming at this point its a bad disk and being on an old 2/1 has ruined my day. I am trying to get the drive recovered to a new one and perhaps it might activate?
  15. H

    Failed OSD for ceph stale pg

    root@vms-ceph112:/home/richard.admin# ceph osd dump | grep ratio full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 root@vms-ceph112:/home/richard.admin#