[SOLVED] ZFS pool UNAVAILABLE disaster

Serhioromano

Member
Jun 12, 2023
30
7
8
Recentky I had a FAULTED disk in my zpool-2 of 5 drives unfortunately mounted with USB bay.

I was trying to fix it but no success. I rebooted PVE and another disk become FAULTED although all the time with `lsblk` I can see all the disks. ANother rebook and 3d disk has got invalid label error and pool become unavailable. I was trying to export it and reimport but no success. I found suggestion `zpool import -d /dev/disk/by-id tank` to build pool based not on names but on disk IDs which I did not know when pool was created.

Anyway, situation now is this. There is no pool. When I try to import it fails with

Code:
root@pve:~# zpool import -d /dev/disk/by-id tank
cannot import 'tank': I/O error
        Destroy and re-create the pool from
        a backup source.

And I am afraid this time I'll not escape troubles.

Although if I run zpool import I see the pool

Code:
root@pve:~# zpool import
   pool: tank
     id: 11619534766374015595
  state: DEGRADED
status: One or more devices contains corrupted data.
 action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
 config:

        tank                                          DEGRADED
          raidz2-0                                    DEGRADED
            sdb                                       ONLINE
            usb-TOSHIBA_HDWT860_000000123AE8-0:0      ONLINE
            sdd                                       ONLINE
            usb-ST6000VX_001-2BD186_000000123AE8-0:0  ONLINE
            8235686996370194259                       FAULTED  corrupted data

and see all the disks

Code:
root@pve:~# fdisk -l /dev/sdb
Disk /dev/sdb: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: HDWT860        
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 6E99B31A-75AB-8549-8CFD-1DD096B1DA6E
Device           Start         End     Sectors  Size Type
/dev/sdb1         2048 11721027583 11721025536  5.5T Solaris /usr & Apple ZFS
/dev/sdb9  11721027584 11721043967       16384    8M Solaris reserved 1

root@pve:~# fdisk -l /dev/sdc
Disk /dev/sdc: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: HDWT860        
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: CC072369-CE09-CA4F-9071-4464AEBE98C7
Device           Start         End     Sectors  Size Type
/dev/sdc1         2048 11721027583 11721025536  5.5T Solaris /usr & Apple ZFS
/dev/sdc9  11721027584 11721043967       16384    8M Solaris reserved 1

root@pve:~# fdisk -l /dev/sdd
Disk /dev/sdd: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: HDWT860        
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: BE6080F5-A820-364A-9698-25C5FB26AE8E
Device           Start         End     Sectors  Size Type
/dev/sdd1         2048 11721027583 11721025536  5.5T Solaris /usr & Apple ZFS
/dev/sdd9  11721027584 11721043967       16384    8M Solaris reserved 1

root@pve:~# fdisk -l /dev/sde
Disk /dev/sde: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: 001-2BD186    
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: 23C5E1F1-8163-C84C-8318-52D2E1D29FB8
Device           Start         End     Sectors  Size Type
/dev/sde1         2048 11721027583 11721025536  5.5T Solaris /usr & Apple ZFS
/dev/sde9  11721027584 11721043967       16384    8M Solaris reserved 1

root@pve:~# fdisk -l /dev/sdf
Disk /dev/sdf: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: HDWT860        
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: EEC1AB22-1A73-8F44-81E9-A85833A7E38C
Device           Start         End     Sectors  Size Type
/dev/sdf1         2048 11721027583 11721025536  5.5T Solaris /usr & Apple ZFS
/dev/sdf9  11721027584 11721043967       16384    8M Solaris reserved 1
 
Last edited:
That's not looking too bad.

Read "man zpool-import". There are some options to import a damaged pool. I have no experience with this, but "-F -n" looks interesting.


Disclaimer: I am not responsible if those drives explode ;-)
 
  • Like
Reactions: Serhioromano
Full stop, if you are running Proxmox 24/7 then you need to get off USB3. It is notoriously unreliable, especially for ZFS.

Obtain an HBA pcie card in IT mode (or at the very least eSATA) and try importing the pool with that. You might get away with detaching the bad disk and importing without it since you have raidz2.
 
Full stop, if you are running Proxmox 24/7 then you need to get off USB3. It is notoriously unreliable, especially for ZFS.

Obtain an HBA pcie card in IT mode (or at the very least eSATA) and try importing the pool with that. You might get away with detaching the bad disk and importing without it since you have raidz2.
I have a system run on Intel NUC. I do not have a choice to use SATA only USB. At least I have purchased a bay that is mount every disk as separate USB disc and not all disks on one USB. I have a lan to buy a dedicated machine for a server but that is not yet possible with my limited budget. Anyway thank you for a suggestion, this convince me once more that I need to invest in to more reliable system.
 
That's not looking too bad.

Read "man zpool-import". There are some options to import a damaged pool. I have no experience with this, but "-F -n" looks interesting.


Disclaimer: I am not responsible if those drives explode ;-)
Thank you. You were right. I imported without `-d-` and it worked. I was able to replace one disk with itself and it is online now. So, I have only one disk failed. I am one step further from disaster!
 
@Kingneutron @UdoB

What is the best way to rebuild zpool and change identification by ID rather than name. As you know my discs are USB and if I add or delete disk, names may change and that causes mess. I want disks to be identified by ID.

My pool looks right no

Code:
root@pve:~# zpool status -v tank
  pool: tank
 state: ONLINE
  scan: resilvered 30.9G in 00:06:47 with 0 errors on Mon Feb  3 17:33:09 2025
config:

        NAME                                          STATE     READ WRITE CKSUM
        tank                                          ONLINE       0     0     0
          raidz2-0                                    ONLINE       0     0     0
            sdb                                       ONLINE       0     0     0
            usb-ST6000VX_008-2ZP186_000000123AE8-0:0  ONLINE       0     0     0
            sdd                                       ONLINE       0     0     0
            usb-ST6000VX_001-2BD186_000000123AE8-0:0  ONLINE       0     0     0
            sdf                                       ONLINE       0     0     0

errors: No known data errors

My idea is to run zpool replace -f tank sdb /dev/disk/by-id/mydiskid on sdb, sdd and sdf. But perhaps there is a better way to do that.