I Destroyed My main-zfs Pool – Seeking Recovery Advice

drd2aiki

Member
Apr 21, 2022
3
0
6
I hope this is on topic for this forum.
I made a critical mistake while trying to replace a failed drive in my main-zfs pool, and now I fear I’ve completely corrupted the metadata. I’m hoping someone here has a suggestion to recover my data, or at least confirm if a professional data recovery service might be able to help.

My Setup:​

  • ZFS Pool Name: main-zfs
  • Configuration: RAIDZ1 with three 4TB Seagate IronWolf drives
  • System: Proxmox server running on Gigabyte Z87X-UD5H, 32GB RAM
  • Storage Connection: Drives are directly connected via SATA
  • OS Drive: Separate 1TB NVMe SSD (Proxmox is installed here)
  • Spare Drive: I had a 4TB replacement drive available in case of failure.

What Went Wrong:​

  1. Drive Failure: One of my three 4TB Seagate drives in main-zfs failed (or at least I thought it did). ZFS reported errors, and zpool status showed it as degraded.
  2. Drive Replacement – But the WRONG One: I mistakenly removed the wrong, healthy drive instead of the actually failed one.
  3. Forced a Rebuild: I inserted the new 4TB drive and issued a zpool replace command, assuming I was correctly swapping the failed disk.
  4. Metadata Corruption: Once the resilvering process started, I realized my mistake. zpool status started showing checksum errors, and soon after, the pool became unmountable.
  5. Attempted Recovery:I tried:
    • zpool import -f main-zfs (no luck)
    • zpool import -F main-zfs (still nothing)
    • Booting into a live recovery environment – no importable pools found
    • Checking /dev/disk/by-id/ to confirm the correct drive order – by then, metadata was likely trashed.

Current State:​

  • zpool status doesn’t list main-zfs at all.
  • zpool import lists the pool as corrupt/unreadable.
  • I have NOT formatted or written anything to the drives since this mistake.
  • Pool was previously working fine before this mess.

My Questions:​

  1. Is there any way to recover the pool metadata?
  2. Would something like zdb help in identifying if any metadata remains?
  3. Would a professional ZFS recovery service (like DriveSavers or Ontrack) be able to help, or is this a lost cause?
  4. Any last-ditch commands I should try before considering recovery specialists?
I fully understand that RAIDZ1 has no redundancy once a drive is removed, and I take full responsibility for this mistake. But if there’s even a small chance of getting my data back, I’d like to exhaust all possible options before accepting the loss.

Any advice or recommendations would be greatly appreciated!
 
Last edited:
1.) take 1:1 backups of your physical disks if you plan on attempting professional recovery
2.) you can try with 'zdb' with -X to see if even earlier, potentially no longer consistent txgs are found
3.) you are probably better off asking in ZFS specific channels, the likelyhood of somebody having used such services in a similar situation will probably be higher there