This happened on proxmox 5.3-6, a cluster of 3 servers with ceph OSDs on two of the servers. I was destroying a bluestore OSD to re-create it as a filestore OSD. The system had a mixture of filestore and bluestore OSDs.
I did everything using the web UI. I stopped the OSD, out-ed it. Then I destroyed it.
The *really big* problem is that it wiped the partition table of the journal/db drive in the process.
In the output below, sdc is the OSD disk, and sdf is the journal/db drive (an SSD).
Here's what happened:
After this, the primary GPT on sdf was gone:
Luckily, I was paying close attention, and noticed the problem right away.
I was able to recover from the backup GPT.
I used gdisk to verify the backup GPT looked good:
and then was able to use gdisk to recover the backup GPT:
I am not in a hurry to see if I can reproduce this problem. Experiencing it once was enough excitement for today
I did everything using the web UI. I stopped the OSD, out-ed it. Then I destroyed it.
The *really big* problem is that it wiped the partition table of the journal/db drive in the process.
In the output below, sdc is the OSD disk, and sdf is the journal/db drive (an SSD).
Here's what happened:
destroy OSD osd.6
Remove osd.6 from the CRUSH map
Remove the osd.6 authentication key.
Remove OSD osd.6
Unmount OSD osd.6 from /var/lib/ceph/osd/ceph-6
remove partition /dev/sdc1 (disk '/dev/sdc', partnum 1)
The operation has completed successfully.
remove partition /dev/sdc2 (disk '/dev/sdc', partnum 2)
The operation has completed successfully.
remove partition /dev/sdf14 (disk '/dev/sdf', partnum 14)
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
wipe disk: /dev/sdf
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 2.06369 s, 102 MB/s
wipe disk: /dev/sdc
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 3.04237 s, 68.9 MB/s
TASK OK
After this, the primary GPT on sdf was gone:
root@pm0:~# sfdisk -d /dev/sdf
sfdisk: /dev/sdf: does not contain a recognized partition table
Luckily, I was paying close attention, and noticed the problem right away.
I was able to recover from the backup GPT.
I used gdisk to verify the backup GPT looked good:
root@pm0:~# gdisk -l /dev/sdf
GPT fdisk (gdisk) version 1.0.1
Caution: invalid main GPT header, but valid backup; regenerating main header
from backup!
Caution! After loading partitions, the CRC doesn't check out!
Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!
Warning! One or more CRCs don't match. You should repair the disk!
Partition table scan:
MBR: not present
BSD: not present
APM: not present
GPT: damaged
Found invalid MBR and corrupt GPT. What do you want to do? (Using the
GPT MAY permit recovery of GPT data.)
1 - Use current GPT
2 - Create blank GPT
Your answer: 1
Disk /dev/sdf: 234441648 sectors, 111.8 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): B56DBFD0-6FC3-48D8-9095-A66F94512F70
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 234441614
Partitions will be aligned on 2048-sector boundaries
Total free space is 171527021 sectors (81.8 GiB)
Number Start (sector) End (sector) Size Code Name
3 20973568 31459327 5.0 GiB 8300
4 31459328 41945087 5.0 GiB 8300
10 94373888 104859647 5.0 GiB F802 ceph journal
12 115345408 125831167 5.0 GiB F802 ceph journal
13 125831168 136316927 5.0 GiB F802 ceph journal
15 138414080 148899839 5.0 GiB F802 ceph journal
root@pm0:~#
GPT fdisk (gdisk) version 1.0.1
Caution: invalid main GPT header, but valid backup; regenerating main header
from backup!
Caution! After loading partitions, the CRC doesn't check out!
Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!
Warning! One or more CRCs don't match. You should repair the disk!
Partition table scan:
MBR: not present
BSD: not present
APM: not present
GPT: damaged
Found invalid MBR and corrupt GPT. What do you want to do? (Using the
GPT MAY permit recovery of GPT data.)
1 - Use current GPT
2 - Create blank GPT
Your answer: 1
Disk /dev/sdf: 234441648 sectors, 111.8 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): B56DBFD0-6FC3-48D8-9095-A66F94512F70
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 234441614
Partitions will be aligned on 2048-sector boundaries
Total free space is 171527021 sectors (81.8 GiB)
Number Start (sector) End (sector) Size Code Name
3 20973568 31459327 5.0 GiB 8300
4 31459328 41945087 5.0 GiB 8300
10 94373888 104859647 5.0 GiB F802 ceph journal
12 115345408 125831167 5.0 GiB F802 ceph journal
13 125831168 136316927 5.0 GiB F802 ceph journal
15 138414080 148899839 5.0 GiB F802 ceph journal
root@pm0:~#
and then was able to use gdisk to recover the backup GPT:
root@pm0:~# gdisk /dev/sdf
GPT fdisk (gdisk) version 1.0.1
Caution: invalid main GPT header, but valid backup; regenerating main header
from backup!
Caution! After loading partitions, the CRC doesn't check out!
Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!
Warning! One or more CRCs don't match. You should repair the disk!
Partition table scan:
MBR: not present
BSD: not present
APM: not present
GPT: damaged
Found invalid MBR and corrupt GPT. What do you want to do? (Using the
GPT MAY permit recovery of GPT data.)
1 - Use current GPT
2 - Create blank GPT
Your answer: 1
Command (? for help): v
Problem: The CRC for the main partition table is invalid. This table may be
corrupt. Consider loading the backup partition table ('c' on the recovery &
transformation menu). This report may be a false alarm if you've already
corrected other problems.
Identified 1 problems!
Command (? for help): r
Recovery/transformation command (? for help): c
Warning! This will probably do weird things if you've converted an MBR to
GPT form and haven't yet saved the GPT! Proceed? (Y/N): y
Recovery/transformation command (? for help): v
No problems found. 171527021 free sectors (81.8 GiB) available in 5
segments, the largest of which is 85541775 (40.8 GiB) in size.
Recovery/transformation command (? for help): w
Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!
Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/sdf.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
root@pm0:~# sfdisk -d /dev/sdf
label: gpt
label-id: B56DBFD0-6FC3-48D8-9095-A66F94512F70
device: /dev/sdf
unit: sectors
first-lba: 34
last-lba: 234441614
/dev/sdf3 : start= 20973568, size= 10485760, type=0FC63DAF-8483-4772-8E79-3D69D8477DE4, uuid=2A40747F-1AA6-4A5E-A734-C16294DA01B8
/dev/sdf4 : start= 31459328, size= 10485760, type=0FC63DAF-8483-4772-8E79-3D69D8477DE4, uuid=E83ED104-8726-487E-AB94-E88E4D7F9474
/dev/sdf10 : start= 94373888, size= 10485760, type=45B0969E-9B03-4F30-B4C6-B4B80CEFF106, uuid=27266775-967F-444E-8E3A-A9A5CA372692, name="ceph journal"
/dev/sdf12 : start= 115345408, size= 10485760, type=45B0969E-9B03-4F30-B4C6-B4B80CEFF106, uuid=89FB93BD-43DD-4FC2-BAE2-89746272FAFE, name="ceph journal"
/dev/sdf13 : start= 125831168, size= 10485760, type=45B0969E-9B03-4F30-B4C6-B4B80CEFF106, uuid=44ED149F-CEB1-48B6-96D8-DC48A83D71B7, name="ceph journal"
/dev/sdf15 : start= 138414080, size= 10485760, type=45B0969E-9B03-4F30-B4C6-B4B80CEFF106, uuid=3E75D1C3-45F0-4B3E-9A08-760C1FA467CF, name="ceph journal"
root@pm0:~#
GPT fdisk (gdisk) version 1.0.1
Caution: invalid main GPT header, but valid backup; regenerating main header
from backup!
Caution! After loading partitions, the CRC doesn't check out!
Warning! Main partition table CRC mismatch! Loaded backup partition table
instead of main partition table!
Warning! One or more CRCs don't match. You should repair the disk!
Partition table scan:
MBR: not present
BSD: not present
APM: not present
GPT: damaged
Found invalid MBR and corrupt GPT. What do you want to do? (Using the
GPT MAY permit recovery of GPT data.)
1 - Use current GPT
2 - Create blank GPT
Your answer: 1
Command (? for help): v
Problem: The CRC for the main partition table is invalid. This table may be
corrupt. Consider loading the backup partition table ('c' on the recovery &
transformation menu). This report may be a false alarm if you've already
corrected other problems.
Identified 1 problems!
Command (? for help): r
Recovery/transformation command (? for help): c
Warning! This will probably do weird things if you've converted an MBR to
GPT form and haven't yet saved the GPT! Proceed? (Y/N): y
Recovery/transformation command (? for help): v
No problems found. 171527021 free sectors (81.8 GiB) available in 5
segments, the largest of which is 85541775 (40.8 GiB) in size.
Recovery/transformation command (? for help): w
Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!
Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/sdf.
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.
root@pm0:~# sfdisk -d /dev/sdf
label: gpt
label-id: B56DBFD0-6FC3-48D8-9095-A66F94512F70
device: /dev/sdf
unit: sectors
first-lba: 34
last-lba: 234441614
/dev/sdf3 : start= 20973568, size= 10485760, type=0FC63DAF-8483-4772-8E79-3D69D8477DE4, uuid=2A40747F-1AA6-4A5E-A734-C16294DA01B8
/dev/sdf4 : start= 31459328, size= 10485760, type=0FC63DAF-8483-4772-8E79-3D69D8477DE4, uuid=E83ED104-8726-487E-AB94-E88E4D7F9474
/dev/sdf10 : start= 94373888, size= 10485760, type=45B0969E-9B03-4F30-B4C6-B4B80CEFF106, uuid=27266775-967F-444E-8E3A-A9A5CA372692, name="ceph journal"
/dev/sdf12 : start= 115345408, size= 10485760, type=45B0969E-9B03-4F30-B4C6-B4B80CEFF106, uuid=89FB93BD-43DD-4FC2-BAE2-89746272FAFE, name="ceph journal"
/dev/sdf13 : start= 125831168, size= 10485760, type=45B0969E-9B03-4F30-B4C6-B4B80CEFF106, uuid=44ED149F-CEB1-48B6-96D8-DC48A83D71B7, name="ceph journal"
/dev/sdf15 : start= 138414080, size= 10485760, type=45B0969E-9B03-4F30-B4C6-B4B80CEFF106, uuid=3E75D1C3-45F0-4B3E-9A08-760C1FA467CF, name="ceph journal"
root@pm0:~#
I am not in a hurry to see if I can reproduce this problem. Experiencing it once was enough excitement for today