zfs error: cannot open: no such pool


Feb 10, 2024
Good evening everyone,

I left my machine turned off for 6 days and when I turn it on today one of my HD's is showing the following error:

could not activate storage 'SAMSUNG500GB', zfs error: cannot import 'SAMSUNG500GB': no such pool available (500)

When trying to access via cell phone, I get this error when starting a VM

zfs error: cannot open 'SAMSUNG500GB': no such pool

Can anyone tell me how to solve it?
root@srv-oliveira:~# zpool status
  pool: SAMSUNG1TB
 state: ONLINE
  scan: scrub repaired 0B in 00:28:03 with 0 errors on Sun Jan 14 00:52:04 2024

        NAME                                  STATE     READ WRITE CKSUM
        SAMSUNG1TB                            ONLINE       0     0     0
          ata-SAMSUNG_HD103SI_S23ZJ50Z821074  ONLINE       0     0     0

errors: No known data errors

root@srv-oliveira:~# zpool import
   pool: pfSense
     id: 5583103113661670571
  state: ONLINE
status: One or more devices are configured to use a non-native block size.
        Expect reduced performance.
 action: The pool can be imported using its name or numeric identifier.

        pfSense     ONLINE
          zd48      ONLINE

   pool: PoolOliveiraNet
     id: 4740071158673129699
  state: ONLINE
status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY

        PoolOliveiraNet                         ONLINE
          5ea514dd-b627-4338-892b-6a788e4a2f65  ONLINE

root@srv-oliveira:~# zpool import SAMSUNG500GB
cannot import 'SAMSUNG500GB': no such pool available

root@srv-oliveira:~# pvesm status
zfs error: cannot open 'SAMSUNG500GB': no such pool

zfs error: cannot open 'SAMSUNG500GB': no such pool

could not activate storage 'SAMSUNG500GB', command 'zpool import -d /dev/disk/by-id/ -o 'cachefile=none' SAMSUNG500GB' failed: got timeout

Name                Type     Status           Total            Used       Available        %
SAMSUNG1TB       zfspool     active       942931968       756206980       186724988   80.20%
SAMSUNG500GB     zfspool   inactive               0               0               0    0.00%
local                dir     active        38593904        27220680         9380520   70.53%
local-lvm        lvmthin     active        51736576        17466268        34270307   33.76%
And smartctl -a /dev/sdc? You might also want to run a SMART selftest using smartctl -t long /dev/sdc.
root@srv-oliveira:~# smartctl -a /dev/sdc
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.5.11-8-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

Model Family:     SAMSUNG SpinPoint M7E (AF)
Device Model:     SAMSUNG HM501II
Serial Number:    S2QDJ56BA01828
LU WWN Device Id: 5 0024e9 400d2f6e8
Firmware Version: 2AJ10003
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database 7.3/5319
ATA Version is:   ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Wed Feb 14 19:45:20 2024 -03
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (  41) The self-test routine was interrupted
                                        by the host with a hard or soft reset.
Total time to complete Offline
data collection:                ( 7920) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 132) minutes.
SCT capabilities:              (0x003f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0x002f   001   001   051    Pre-fail  Always   FAILING_NOW 52651
  2 Throughput_Performance  0x0026   252   252   000    Old_age   Always       -       0
  3 Spin_Up_Time            0x0023   089   085   025    Pre-fail  Always       -       3627
  4 Start_Stop_Count        0x0032   095   095   000    Old_age   Always       -       5247
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       9996
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   098   098   000    Old_age   Always       -       2324
 12 Power_Cycle_Count       0x0032   095   095   000    Old_age   Always       -       5356
 13 Read_Soft_Error_Rate    0x003a   100   100   000    Old_age   Always       -       0
191 G-Sense_Error_Rate      0x0022   091   091   000    Old_age   Always       -       92105
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   090   090   000    Old_age   Always       -       104242
194 Temperature_Celsius     0x0002   063   044   000    Old_age   Always       -       37 (Min/Max 11/56)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       11
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       9339
240 Head_Flying_Hours       0x0032   100   100   000    Old_age   Always       -       9734
241 Total_LBAs_Written      0x0032   092   089   000    Old_age   Always       -       11348674
242 Total_LBAs_Read         0x0032   094   088   000    Old_age   Always       -       8670509
254 Free_Fall_Sensor        0x0032   252   252   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      90%      9975         -
# 2  Short offline       Completed without error       00%       468         -
# 3  Short offline       Completed without error       00%       281         -
# 4  Short offline       Completed without error       00%       142         -
# 5  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
    1        0        0  Interrupted [90% left] (0-65535)
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@srv-oliveira:~# smartctl -t long /dev/sdc
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.5.11-8-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 132 minutes for test to complete.
Test will complete after Wed Feb 14 21:58:09 2024 -03
Use smartctl -X to abort test.
Does this mean I lost everything? Is there a way to recover my virtual machines?
That's why you always should have recent backups. While raid is not a backup, I personally wouldn't use a 14 years old HDD model without redundancy with a HDDs life expectation of 5-10 years.

You could check again if a long SMART selftest is able to complete (the last one got aborted before it finished). But yes, if it is really a failing disk there aren't much cheap options for ZFS data rescue.
