Destroy OSD wiped journal/db drive partition table

Discussion in 'Proxmox VE: Installation and configuration' started by flamozzle, Jan 11, 2019 at 20:48.

  1. flamozzle

    flamozzle New Member

    Joined:
    Mar 13, 2014
    Messages:
    5
    Likes Received:
    0
    This happened on proxmox 5.3-6, a cluster of 3 servers with ceph OSDs on two of the servers. I was destroying a bluestore OSD to re-create it as a filestore OSD. The system had a mixture of filestore and bluestore OSDs.

    I did everything using the web UI. I stopped the OSD, out-ed it. Then I destroyed it.

    The *really big* problem is that it wiped the partition table of the journal/db drive in the process.

    In the output below, sdc is the OSD disk, and sdf is the journal/db drive (an SSD).

    Here's what happened:
    After this, the primary GPT on sdf was gone:
    Luckily, I was paying close attention, and noticed the problem right away.

    I was able to recover from the backup GPT.

    I used gdisk to verify the backup GPT looked good:
    root@pm0:~# gdisk -l /dev/sdf
    GPT fdisk (gdisk) version 1.0.1

    Caution: invalid main GPT header, but valid backup; regenerating main header
    from backup!

    Caution! After loading partitions, the CRC doesn't check out!
    Warning! Main partition table CRC mismatch! Loaded backup partition table
    instead of main partition table!

    Warning! One or more CRCs don't match. You should repair the disk!

    Partition table scan:
    MBR: not present
    BSD: not present
    APM: not present
    GPT: damaged

    Found invalid MBR and corrupt GPT. What do you want to do? (Using the
    GPT MAY permit recovery of GPT data.)
    1 - Use current GPT
    2 - Create blank GPT

    Your answer: 1
    Disk /dev/sdf: 234441648 sectors, 111.8 GiB
    Logical sector size: 512 bytes
    Disk identifier (GUID): B56DBFD0-6FC3-48D8-9095-A66F94512F70
    Partition table holds up to 128 entries
    First usable sector is 34, last usable sector is 234441614
    Partitions will be aligned on 2048-sector boundaries
    Total free space is 171527021 sectors (81.8 GiB)

    Number Start (sector) End (sector) Size Code Name
    3 20973568 31459327 5.0 GiB 8300
    4 31459328 41945087 5.0 GiB 8300
    10 94373888 104859647 5.0 GiB F802 ceph journal
    12 115345408 125831167 5.0 GiB F802 ceph journal
    13 125831168 136316927 5.0 GiB F802 ceph journal
    15 138414080 148899839 5.0 GiB F802 ceph journal
    root@pm0:~#

    and then was able to use gdisk to recover the backup GPT:
    root@pm0:~# gdisk /dev/sdf
    GPT fdisk (gdisk) version 1.0.1

    Caution: invalid main GPT header, but valid backup; regenerating main header
    from backup!

    Caution! After loading partitions, the CRC doesn't check out!
    Warning! Main partition table CRC mismatch! Loaded backup partition table
    instead of main partition table!

    Warning! One or more CRCs don't match. You should repair the disk!

    Partition table scan:
    MBR: not present
    BSD: not present
    APM: not present
    GPT: damaged

    Found invalid MBR and corrupt GPT. What do you want to do? (Using the
    GPT MAY permit recovery of GPT data.)
    1 - Use current GPT
    2 - Create blank GPT

    Your answer: 1

    Command (? for help): v

    Problem: The CRC for the main partition table is invalid. This table may be
    corrupt. Consider loading the backup partition table ('c' on the recovery &
    transformation menu). This report may be a false alarm if you've already
    corrected other problems.

    Identified 1 problems!

    Command (? for help): r

    Recovery/transformation command (? for help): c
    Warning! This will probably do weird things if you've converted an MBR to
    GPT form and haven't yet saved the GPT! Proceed? (Y/N): y

    Recovery/transformation command (? for help): v

    No problems found. 171527021 free sectors (81.8 GiB) available in 5
    segments, the largest of which is 85541775 (40.8 GiB) in size.

    Recovery/transformation command (? for help): w

    Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
    PARTITIONS!!

    Do you want to proceed? (Y/N): y
    OK; writing new GUID partition table (GPT) to /dev/sdf.
    Warning: The kernel is still using the old partition table.
    The new table will be used at the next reboot or after you
    run partprobe(8) or kpartx(8)
    The operation has completed successfully.
    root@pm0:~# sfdisk -d /dev/sdf
    label: gpt
    label-id: B56DBFD0-6FC3-48D8-9095-A66F94512F70
    device: /dev/sdf
    unit: sectors
    first-lba: 34
    last-lba: 234441614

    /dev/sdf3 : start= 20973568, size= 10485760, type=0FC63DAF-8483-4772-8E79-3D69D8477DE4, uuid=2A40747F-1AA6-4A5E-A734-C16294DA01B8
    /dev/sdf4 : start= 31459328, size= 10485760, type=0FC63DAF-8483-4772-8E79-3D69D8477DE4, uuid=E83ED104-8726-487E-AB94-E88E4D7F9474
    /dev/sdf10 : start= 94373888, size= 10485760, type=45B0969E-9B03-4F30-B4C6-B4B80CEFF106, uuid=27266775-967F-444E-8E3A-A9A5CA372692, name="ceph journal"
    /dev/sdf12 : start= 115345408, size= 10485760, type=45B0969E-9B03-4F30-B4C6-B4B80CEFF106, uuid=89FB93BD-43DD-4FC2-BAE2-89746272FAFE, name="ceph journal"
    /dev/sdf13 : start= 125831168, size= 10485760, type=45B0969E-9B03-4F30-B4C6-B4B80CEFF106, uuid=44ED149F-CEB1-48B6-96D8-DC48A83D71B7, name="ceph journal"
    /dev/sdf15 : start= 138414080, size= 10485760, type=45B0969E-9B03-4F30-B4C6-B4B80CEFF106, uuid=3E75D1C3-45F0-4B3E-9A08-760C1FA467CF, name="ceph journal"
    root@pm0:~#

    I am not in a hurry to see if I can reproduce this problem. Experiencing it once was enough excitement for today
     
  2. flamozzle

    flamozzle New Member

    Joined:
    Mar 13, 2014
    Messages:
    5
    Likes Received:
    0
    I have since realized that the damage was more extensive than I originally thought.

    Because the disk wipe writes 200MiB at the beginning of the disk, it would also corrupt the first partition on the journal/db disk.

    In my case, I was incredibly fortunate to have an old unused partition as the first one on that disk.
     
  3. flamozzle

    flamozzle New Member

    Joined:
    Mar 13, 2014
    Messages:
    5
    Likes Received:
    0
    I have now reproduced this on a different server in the same (proxmox and ceph) cluster.

    Here is the second example, following the same procedure as before, but on a different server and with a different OSD (tho it was again a bluestore OSD). In this case /dev/sde is the OSD and /dev/sdd is the journal/db disk.

    I used the same method to recover from it as before.
     
  4. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,805
    Likes Received:
    157
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice