Proxmox boots up but doesn't mount zpool

3fficacious

New Member
Nov 30, 2022
2
0
1
For some reason my Proxmox server rebooted a couple of days ago and ever since then it doesn't mount the zpool anymore.

I can get into the system and looking at ZFS it seems like everything is fine. Status is Online. About half it is in use which checks out.

I have already run scrub and resilver to no effect.

Does anyone else have any idea what I could do next besides reinstalling everything?

Attempting to manually mount everything

Bash:
root@proxmox:~# zfs list
NAME                         USED  AVAIL     REFER  MOUNTPOINT
rpool                        182G   178G      104K  /rpool
rpool/ROOT                  18.3G   178G       96K  /rpool/ROOT
rpool/ROOT/pve-1            18.3G   178G     18.3G  /
rpool/data                   164G   178G       96K  /rpool/data
rpool/data/base-100-disk-0  11.8G   178G     11.8G  -
rpool/data/base-110-disk-0  1.90G   178G     1.90G  -
rpool/data/base-130-disk-0  1.66G   178G     1.66G  -
rpool/data/vm-111-disk-0    28.7G   178G     28.7G  -
rpool/data/vm-111-disk-1      56K   178G       56K  -
rpool/data/vm-111-disk-2    98.9G   178G     98.9G  -
rpool/data/vm-120-disk-0    2.82G   178G     2.82G  -
rpool/data/vm-120-disk-1      56K   178G       56K  -
rpool/data/vm-131-disk-0    6.12G   178G     6.12G  -
rpool/data/vm-201-disk-0    11.9G   178G     11.9G  -

root@proxmox:~# zfs mount
rpool/ROOT/pve-1                /
rpool                           /rpool
rpool/ROOT                      /rpool/ROOT
rpool/data                      /rpool/data

root@proxmox:~# zfs mount -a

root@proxmox:~# ls -lah /rpool/ROOT/pve-1/
total 1.0K
drwxr-xr-x 2 root root 2 Oct  8 11:46 .
drwxr-xr-x 3 root root 3 Oct  8 11:46 ..

root@proxmox:~# ls -lah /rpool/ROOT/
total 1.5K
drwxr-xr-x 3 root root 3 Oct  8 11:46 .
drwxr-xr-x 4 root root 4 Oct  8 11:46 ..
drwxr-xr-x 2 root root 2 Oct  8 11:46 pve-1

root@proxmox:~# ls -lah /rpool/data
total 1.0K
drwxr-xr-x 2 root root 2 Oct  8 11:46 .
drwxr-xr-x 4 root root 4 Oct  8 11:46 ..

ZPool Status

Bash:
root@proxmox:~# root@proxmox:~# zpool status
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:09:23 with 0 errors on Wed Nov 30 09:47:37 2022
config:

        NAME                                                  STATE     READ WRITE CKSUM
        rpool                                                 ONLINE       0     0     0
          mirror-0                                            ONLINE       0     0     0
            ata-INTEL_SSDSC2BA400G3_BTTV3245063N400HGN-part3  ONLINE       0     0     0
            ata-INTEL_SSDSC2BA400G3_BTTV325004PC400HGN-part3  ONLINE       0     0     0

errors: No known data errors

root@proxmox:~# zpool list
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
rpool   372G   182G   190G        -         -    27%    48%  1.00x    ONLINE  -

root@proxmox:~# zpool status -v rpool
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 00:09:23 with 0 errors on Wed Nov 30 09:47:37 2022
config:

        NAME                                                  STATE     READ WRITE CKSUM
        rpool                                                 ONLINE       0     0     0
          mirror-0                                            ONLINE       0     0     0
            ata-INTEL_SSDSC2BA400G3_BTTV3245063N400HGN-part3  ONLINE       0     0     0
            ata-INTEL_SSDSC2BA400G3_BTTV325004PC400HGN-part3  ONLINE       0     0     0

errors: No known data errors

ZPool history

Bash:
2022-11-03.11:45:54 zfs snapshot rpool/data/base-130-disk-0@__base__
2022-11-03.11:46:24 zfs create -s -V 52428800k rpool/data/vm-131-disk-0
2022-11-04.17:31:47 zpool import -N rpool
2022-11-04.17:59:54 zpool import -N rpool
2022-11-05.19:01:07 zpool import -N rpool
2022-11-08.07:39:01 zpool import -N rpool
2022-11-08.08:31:18 zpool import -N rpool
2022-11-08.08:53:18 zpool import -N rpool
2022-11-08.09:51:25 zpool import -N rpool
2022-11-11.09:52:43 zpool import -N rpool
2022-11-13.00:24:02 zpool scrub rpool
2022-11-28.02:37:22 zpool import -N rpool
2022-11-29.09:10:08 zpool import -N rpool
2022-11-29.10:56:15 zpool import -N rpool
2022-11-30.09:38:15 zpool scrub rpool
2022-11-30.11:10:17 zpool resilver rpool
 
SMARTCTL for both drives
Bash:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.74-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model:     INTEL SSDSC2BA400G3
Serial Number:    BTTV3245063N400HGN
LU WWN Device Id: 5 001517 8f3646cb0
Firmware Version: 5DV10265
User Capacity:    400,088,457,216 bytes [400 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Nov 30 11:18:46 2022 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    2) seconds.
Offline data collection
capabilities:                    (0x79) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   2) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       53855
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       111
170 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
174 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       94
175 Power_Loss_Cap_Test     0x0033   100   100   010    Pre-fail  Always       -       636 (312 1457)
183 SATA_Downshift_Count    0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   090    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
190 Temperature_Case        0x0022   081   077   000    Old_age   Always       -       19 (Min/Max 13/28)
192 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       94
194 Temperature_Internal    0x0022   100   100   000    Old_age   Always       -       19
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       9763621
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       0
227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always       -       93
228 Workload_Minutes        0x0032   100   100   000    Old_age   Always       -       1457
232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   097   097   000    Old_age   Always       -       0
234 Thermal_Throttle        0x0032   100   100   000    Old_age   Always       -       0/0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       9763621
242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always       -       16143567

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     52553         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Bash:
=== START OF INFORMATION SECTION ===
Model Family:     Intel 730 and DC S35x0/3610/3700 Series SSDs
Device Model:     INTEL SSDSC2BA400G3
Serial Number:    BTTV325004PC400HGN
LU WWN Device Id: 5 001517 8f364a11d
Firmware Version: 5DV10265
User Capacity:    400,088,457,216 bytes [400 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Nov 30 11:20:44 2022 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    2) seconds.
Offline data collection
capabilities:                    (0x79) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   2) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always       -       1
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       53857
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       114
170 Available_Reservd_Space 0x0033   099   099   010    Pre-fail  Always       -       0
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       1
174 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       97
175 Power_Loss_Cap_Test     0x0033   100   100   010    Pre-fail  Always       -       633 (312 1459)
183 SATA_Downshift_Count    0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   090    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
190 Temperature_Case        0x0022   083   079   000    Old_age   Always       -       17 (Min/Max 13/25)
192 Unsafe_Shutdown_Count   0x0032   100   100   000    Old_age   Always       -       97
194 Temperature_Internal    0x0022   100   100   000    Old_age   Always       -       17
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
199 CRC_Error_Count         0x003e   100   100   000    Old_age   Always       -       0
225 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       11824184
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age   Always       -       0
227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age   Always       -       93
228 Workload_Minutes        0x0032   100   100   000    Old_age   Always       -       1459
232 Available_Reservd_Space 0x0033   099   099   010    Pre-fail  Always       -       0
233 Media_Wearout_Indicator 0x0032   097   097   000    Old_age   Always       -       0
234 Thermal_Throttle        0x0032   100   100   000    Old_age   Always       -       0/0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age   Always       -       11824184
242 Host_Reads_32MiB        0x0032   100   100   000    Old_age   Always       -       16634579

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     52555         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.