OK, I have rescued my partitions*** and am up and running again. However, there's still this nagging question...
So "sgdisk <healthy bootable device> -R <new device>" will clone the partition table from "<healthy bootable device>" (so the existing and working disk with PVE on it) to "<new device>" (your empty or factory-prepartitioned new disk).
Reverse that and you will wipe youe working PVE installation.
Maybe you just got confused and used the wrong disks/partitions with those commands?
Everyond else had no problems doing the same stuff with those commands.
I understand. That's honestly how it is supposed to work. And I seriously questioned my sanity.
Honestly, it didn't work that way for me. Makes me nervous about sgdisk and/or parameter parsing and/or something deeper in Linux...or in the hardware/firmware. Why would I suggest this?
As a reasonably experienced person with these tools (been doing disks at a low level for many decades, I have reprocessed everything in my head and logs etc.
There is one anomaly that stands out for me even now, and it made me nervous from the get-go:
- Platform I'm on: HP Z2 G5 SFF
- Motherboard has two NVMe slots, labeled SSD0 and SSD1
- Original (HP/Samsung) storage is in SSD0
- I added new (Crucial) storage in SSD1
- In the BIOS, strangely, it is always listing the Crucial first. Even in the list of PCIe channels.
- In ProxMox Linux, /dev/nvme1n1 is the Samsung, /dev/nvme0n1 is the Crucial
- Using SystemRescue 11.00 the devices are swapped: /dev/nvme0n1 is the Samsung, /dev/nvme1n1 is the Crucial
That just made me nervous.
To be safe in the future with such a crucial step, I'm going to do it by saving the partition table in a file off-disk, then load into the new disk. gdisk does that quite nicely.
*** For those who are interested:
how to rescue the disk partitions on a ProxMox EFI boot drive formatted in the standard way:
- As of today, while the ProxMox installer has a "rescue" mode, it assumes you have good partitions on your boot drive
- Any of several bootable-USB rescue systems can take care of things. I used SystemRescue
Partition Setup is as follows:
Number, Start (Sector), End (sector), Type Code
1, 34, 2047, EF02
2, 2048, 1050623, EF00
3, 1050624, <end>, BF01
Note: the type codes are a gdisk/sgdisk short hand for standard GPT UUID's. The lookup table is available online.
Steps to recovery:
- Do you have the GUID of your boot disk? If so, you can 100% recover without having any other data. The one absolutely necessary tool is
gdisk
. (NOTE: I tried using testdisk
-- it seriously was confused by the drive. A waste of time.)
- Install a recovery system on a USB and configure the host hardware so you can boot from the USB
General Sequence:
- Ensure you're working on the correct disk, Run gdisk
- (Create a blank GPT table if needed)
- Change sector alignment to 1 to load the standard table
- Create all partitions
- Change sector alignment to 8 (the standard used by ProxMox)
- If you have the disk GUID, set it
- Verify correctness
- Write to disk and exit
- Done
The above steps are not too hard:
- lsblk shows all devices. Run
gdisk /dev/<device>
- use p to verify no partitions. Use o to make fresh GPT table if needed.
- use x for expert menu, then l ("L") and 1 to set alignment to 1. Now "m" back to normal menu.
- "n" for new, provide partition number (1 to 3), start and end sectors, and type code. For partition 3 accept the default end-of-disk number.
- use x for expert menu, then l ("L") and 8 to set alignment to 8.
- If you have disk GUID, "g" in expert menu lets you set it.
- "p" in any menu prints the restored partition table and disk GUID. Verify everything.
- "w" writes to disk and exits.
- Reboot as normal!