Corrupt zpool disks not being discovered. How do I rebuild?

user73937393

New Member
Feb 2, 2024
14
1
3
I have a Proxmox server with a few zpools. One of the zpools `rust01` is a 4-disk zpool, where the metadata and write cache were some nvme m.2 drives on the motherboard (one for each - I know, stupid, but that's what was done).

It appears as though the `rust01` has had a catastrophic failure.
  • When I click on `rust01` in the Server View, I get the following error:
    ```could not activate storage 'rust01', zfs error: cannot import 'rust01': I/O error (500)```
  • When I go to { Server } > Disks > ZFS, I do not see the `rust01` zpool.
  • When I go to { Server } > Disks I don't even see the 4-disks or the special metadata file or the read/write cache drives.
  • When I run `zpool status -x` I get `all pools are healthy`
  • When I run `zpool import rust01` I get the following message:
    ```
    cannot import 'rust01': I/O error
    Destroy and re-create the pool from
    a backup source.
    ```
  • When I run `zpool status rust01` I get `cannot open 'rust01': no such pool`,
  • When I reboot the server, below is the error emailed to me:
    ```
    ZFS has detected that a device was removed.
    impact: Fault tolerance of the pool may be compromised.
    eid: 10
    class: statechange
    state: UNAVAIL
    host: pve01
    time: 2024-09-14 21:20:32-0500
    vpath: /dev/nvme2n1p1
    vphys: pci-0000:41:00.0-nvme-1
    vguid: 0x297D516B1F1D6494
    devid: nvme-Samsung_SSD_970_EVO_Plus_2TB_S6S2NS0T815592K-part1
    pool: rust01 (0xE4AAC2680D8B6A7E)
    ```
  • When I run `zpool destroy rust01` I get the following error `cannot open 'rust01': no such pool`.
Ideally, I would like to get `rust01` back online. I am fairly certain the issue is the special metadata disk mentioned in the email above. That said, I would be happy to destroy and recreate `rust01`. All of the VMs on that disk are backed up, so I can easily restore if needed. My problem, however, is that I can't find a way to get Proxmox/ZFS to release the disks associated with the corrupt `rust01` zpool.

There are other VMs on this host running on other zpools, that appear fine. As such, reinstalling everything is not an option I want to entertain.

Any ideas on how to proceed?
 
I gave up and destroyed the corrupt `rust01` zpool. Sorry this post has remained on the forums, but couldn't figure out how to delete this thread. If there are any forum administrators, feel free to delete.
 
Better we keep this thread as another bad example for NOT using consumer SSDs as a Specials Device or ZFS in general...
And there wasn't a rescue for your pool btw. Clean the disks and start new, with learned a lesson hopefully.
 
In case this helps someone else, when trying to destory a corrupt zpool used in Proxmox, you may run into problems where the disk seems always engaged and command line and UI attempts to wipe the disk(s) fail.

For me, the fix was to comment out the zpool in /etc/pve/storage.cfg. After that, you will be able to wipe all of the disks associate with your failed zpool. Click here for more information.
 
Last edited:
  • Like
Reactions: leesteken

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!