Hi all,
During a recent minor upgrade of the Proxmox VE (PVE) host I experienced a serious issue with a RAID‑0 array that was managed by mdadm on an AlmaLinux 9.X guest. The disks had been presented to the VM via PCI‑e passthrough, so the guest owned the controllers directly. Disks were used directly without partitioning.
After the host rebooted following the upgrade (performed with apt), the array could no longer be assembled. Inspection of the block devices showed that the md super‑blocks had been overwritten by GPT headers, which destroyed the RAID metadata - including the original disk order.
I backed up the raw disks before attempting any further manipulation and I crafted script (with the help of an AI assistant) that:
The script eventually succeeded in re‑creating a usable filesystem, yet i have no logs as I was working under pressure of time.
My questions
1. Is this a known edge case? Have other users observed mdraid metadata being overwritten by GPT after a PVE minor upgrade, especially when PCI‑e passthrough is involved?
2. What preventive measures could be taken? Are there best‑practice steps?
Any insight, references to existing bug reports, or suggestions for a more robust recovery workflow would be greatly appreciated.Thank you for your help.
At least i have one conclusion - I should have created raid on GPT partitioned disk instead on raw device.
During a recent minor upgrade of the Proxmox VE (PVE) host I experienced a serious issue with a RAID‑0 array that was managed by mdadm on an AlmaLinux 9.X guest. The disks had been presented to the VM via PCI‑e passthrough, so the guest owned the controllers directly. Disks were used directly without partitioning.
After the host rebooted following the upgrade (performed with apt), the array could no longer be assembled. Inspection of the block devices showed that the md super‑blocks had been overwritten by GPT headers, which destroyed the RAID metadata - including the original disk order.
I backed up the raw disks before attempting any further manipulation and I crafted script (with the help of an AI assistant) that:
- created a differential overlay using qemu‑img,
- iteratively forced the md header onto the overlay, tried to assemble the array, and attempted a mount,
- on any critical mount error rolled back the overlay, swapped the disk order, and repeated the process.
The script eventually succeeded in re‑creating a usable filesystem, yet i have no logs as I was working under pressure of time.
My questions
1. Is this a known edge case? Have other users observed mdraid metadata being overwritten by GPT after a PVE minor upgrade, especially when PCI‑e passthrough is involved?
2. What preventive measures could be taken? Are there best‑practice steps?
Any insight, references to existing bug reports, or suggestions for a more robust recovery workflow would be greatly appreciated.Thank you for your help.
At least i have one conclusion - I should have created raid on GPT partitioned disk instead on raw device.