DegradedArray alert day after upgrading pve8 to 9

CycloneB

Active Member
Jan 26, 2020
26
3
43
44
I need some help understanding the alert I received today. I'll give some background, but I'll hide it as a spoiler in case the extra detail isn't relevant.

I have multiple nodes in a cluster and one of the nodes has a TrueNAS VM. That VM has three NVMe and four spinners passed to it via pcie passthrough, giving TrueNAS total control. These set up as two zfs arrays by drive type. This has been the case since I built this node, those drives never being used by pve. I stayed current on pve8 for some time and only recently started upgrading in place to pve9. Each weekend I did one node and went from least to most important node. Yesterday I did the last node, which is the NAS. In all cases, I ran pve8to9 --full and did any callouts, but none were major (install microcode updates, turn off autoactivate on lvm-thin volumes, etc.).

Today I received the following alert:
Code:
DegradedArray event detected on md device /dev/md126
The /proc/mdstat file currently contains the following:

Personalities : [raid0] [raid1] [raid4] [raid5] [raid6] [raid10] [linear]
md126 : broken (auto-read-only) raid1 sdd1[0]
     2095040 blocks super 1.2 [3/1] [U__]

md127 : broken (auto-read-only) raid1 nvme2n1p1[1]
     2095040 blocks super 1.2 [2/1] [_U]

unused devices: <none>

sdd and nvme2n1p1 are both drives that are passed completely to TrueNAS's control. In TrueNAS, I have no alerts and it says all disks are healthy.

Code:
nvmE zfs raidz1 array
Pool Status: ONLINE
Used Space: 63.54%
Disks with Errors: 0 of 3

spinner zfs raidz1 array
Pool Status: ONLINE
Used Space: 13.91%
Disks with Errors: 0 of 4

Native on the host, if I run mdadm --detail /dev/md126 and mdadm --detail /dev/md127 , I get the following. Sadly, I am not well versed in mdadm (hence why I used TrueNAS to do the heavy lifting for me!). Could someone help me understand the results? Do I really have a problem. Seems odd that 24 hours after the upgrade I'm getting this alert from pve, but not TrueNAS.

mdadm --detail /dev/md126
Code:
/dev/md126:
           Version : 1.2
     Creation Time : Sat Jun 24 13:57:47 2023
        Raid Level : raid1
        Array Size : 2095040 (2045.94 MiB 2145.32 MB)
     Used Dev Size : 2095040 (2045.94 MiB 2145.32 MB)
      Raid Devices : 3
     Total Devices : 1
       Persistence : Superblock is persistent

       Update Time : Sat Mar 21 15:48:26 2026
             State : clean, FAILED
    Active Devices : 1
   Working Devices : 1
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : resync

    Number   Major   Minor   RaidDevice State
       0       8       49        0      active sync   missing
       -       0        0        1      removed
       -       0        0        2      removed

mdadm --detail /dev/md127
Code:
/dev/md127:
           Version : 1.2
     Creation Time : Mon Jun 12 22:28:44 2023
        Raid Level : raid1
        Array Size : 2095040 (2045.94 MiB 2145.32 MB)
     Used Dev Size : 2095040 (2045.94 MiB 2145.32 MB)
      Raid Devices : 2
     Total Devices : 1
       Persistence : Superblock is persistent

       Update Time : Sat Mar 21 15:48:26 2026
             State : clean, FAILED
    Active Devices : 1
   Working Devices : 1
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : resync

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       1     259        5        1      active sync   missing

This node is running pveversion pve-manager/9.1.7/16b139a017452f16 (running kernel: 6.17.13-2-pve)

smartctl -H on the devices complains there is no such disk, which make sense as the VM is active and has the disks.
 
Last edited:
I removed the startup option on TrueNAS, rebooted the host, and ran smartctl to check for things:

Code:
=== START OF INFORMATION SECTION ===
NAME                           MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda                              8:0    0   7.3T  0 disk 
├─sda1                           8:1    0     2G  0 part 
│ └─md126                        9:126  0     2G  0 raid1
└─sda2                           8:2    0   7.3T  0 part 
sdb                              8:16   0   7.3T  0 disk 
├─sdb1                           8:17   0     2G  0 part 
│ └─md126                        9:126  0     2G  0 raid1
└─sdb2                           8:18   0   7.3T  0 part 
sdc                              8:32   0   9.1T  0 disk 
├─sdc1                           8:33   0     2G  0 part 
└─sdc2                           8:34   0   9.1T  0 part 
sdd                              8:48   0   9.1T  0 disk 
├─sdd1                           8:49   0     2G  0 part 
│ └─md126                        9:126  0     2G  0 raid1
└─sdd2                           8:50   0   9.1T  0 part 
nvme2n1                        259:0    0   1.9T  0 disk 
├─nvme2n1p1                    259:2    0     2G  0 part 
│ └─md127                        9:127  0     2G  0 raid1
└─nvme2n1p2                    259:3    0   1.9T  0 part 
nvme1n1                        259:1    0   1.9T  0 disk 
├─nvme1n1p1                    259:4    0     2G  0 part 
│ └─md127                        9:127  0     2G  0 raid1
└─nvme1n1p2                    259:5    0   1.9T  0 part 
nvme3n1                        259:6    0   1.9T  0 disk 
├─nvme3n1p1                    259:7    0     2G  0 part 
└─nvme3n1p2                    259:8    0   1.9T  0 part 
nvme0n1                        259:9    0 931.5G  0 disk 
├─nvme0n1p1                    259:10   0  1007K  0 part 
├─nvme0n1p2                    259:11   0     1G  0 part  /boot/efi
└─nvme0n1p3                    259:12   0   439G  0 part 
  ├─pve-swap                   252:0    0    64G  0 lvm   [SWAP]
  ├─pve-root                   252:1    0    96G  0 lvm   /
  ├─pve-data_tmeta             252:2    0   2.6G  0 lvm
... <snipped> ...

nvme0n1 is the boot SSD, so ignoring that drive.

Code:
Model Number:                       SOLIDIGM SSDPFKNU020TZ
Serial Number:                      <snipped>
Firmware Version:                   002C
PCI Vendor/Subsystem ID:            0x025e
IEEE OUI Identifier:                0xc8d6b7
Controller ID:                      1
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,048,408,248,320 [2.04 TB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Sun Apr  5 17:55:42 2026 EDT
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x00df):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify
Log Page Attributes (0x1f):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     77 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.50W       -        -    0  0  0  0        0       0
 1 +     3.60W       -        -    1  1  1  1     4000    5000
 2 +     2.60W       -        -    2  2  2  2    22000   37000
 3 -   0.0250W       -        -    3  3  3  3      225    2000
 4 -   0.0040W       -        -    4  4  4  4     3000   11999

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        34 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    4%
Data Units Read:                    53,844,490 [27.5 TB]
Data Units Written:                 38,379,267 [19.6 TB]
Host Read Commands:                 261,703,597
Host Write Commands:                1,138,815,227
Controller Busy Time:               14,513
Power Cycles:                       126
Power On Hours:                     17,065
Unsafe Shutdowns:                   61
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Read Self-test Log failed: Invalid Field in Command (0x002)

Code:
=== START OF INFORMATION SECTION ===
Model Number:                       SOLIDIGM SSDPFKNU020TZ
Serial Number:                      <snipped>
Firmware Version:                   002C
PCI Vendor/Subsystem ID:            0x025e
IEEE OUI Identifier:                0xc8d6b7
Controller ID:                      1
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,048,408,248,320 [2.04 TB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Sun Apr  5 17:58:13 2026 EDT
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x00df):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify
Log Page Attributes (0x1f):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     77 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.50W       -        -    0  0  0  0        0       0
 1 +     3.60W       -        -    1  1  1  1     4000    5000
 2 +     2.60W       -        -    2  2  2  2    22000   37000
 3 -   0.0250W       -        -    3  3  3  3      225    2000
 4 -   0.0040W       -        -    4  4  4  4     3000   11999

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        30 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    4%
Data Units Read:                    53,804,276 [27.5 TB]
Data Units Written:                 38,383,892 [19.6 TB]
Host Read Commands:                 261,591,961
Host Write Commands:                1,141,053,479
Controller Busy Time:               14,547
Power Cycles:                       154
Power On Hours:                     17,083
Unsafe Shutdowns:                   66
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Read Self-test Log failed: Invalid Field in Command (0x002)

Code:
=== START OF INFORMATION SECTION ===
Model Number:                       SOLIDIGM SSDPFKNU020TZ
Serial Number:                      <snipped>
Firmware Version:                   002C
PCI Vendor/Subsystem ID:            0x025e
IEEE OUI Identifier:                0xc8d6b7
Controller ID:                      1
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,048,408,248,320 [2.04 TB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Sun Apr  5 17:59:43 2026 EDT
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x00df):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify
Log Page Attributes (0x1f):         S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     77 Celsius
Critical Comp. Temp. Threshold:     80 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     5.50W       -        -    0  0  0  0        0       0
 1 +     3.60W       -        -    1  1  1  1     4000    5000
 2 +     2.60W       -        -    2  2  2  2    22000   37000
 3 -   0.0250W       -        -    3  3  3  3      225    2000
 4 -   0.0040W       -        -    4  4  4  4     3000   11999

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0
 1 -    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        31 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    4%
Data Units Read:                    53,905,267 [27.5 TB]
Data Units Written:                 38,351,632 [19.6 TB]
Host Read Commands:                 262,407,198
Host Write Commands:                1,140,567,875
Controller Busy Time:               14,532
Power Cycles:                       154
Power On Hours:                     17,042
Unsafe Shutdowns:                   67
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Read Self-test Log failed: Invalid Field in Command (0x002)

So the NVMe look good, right? Spinners in the next post due to character limits.
 
Code:
=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5 (SMR)
Device Model:     ST8000DM004-2CX188
Serial Number:    <snipped>
LU WWN Device Id: 5 000c50 0b68f70cf
Firmware Version: 0001
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5425 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Apr  5 18:00:27 2026 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x73) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 976) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x30a5)    SCT Status supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   084   064   006    Pre-fail  Always       -       231010080
  3 Spin_Up_Time            0x0003   093   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       1007
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   076   060   045    Pre-fail  Always       -       37994118
  9 Power_On_Hours          0x0032   071   071   000    Old_age   Always       -       25453h+01m+28.653s
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       157
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   092   000    Old_age   Always       -       43 43 43
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   071   057   040    Old_age   Always       -       29 (Min/Max 23/39)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       107
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1160
194 Temperature_Celsius     0x0022   029   043   000    Old_age   Always       -       29 (0 17 0 0 0)
195 Hardware_ECC_Recovered  0x001a   084   064   000    Old_age   Always       -       231010080
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       1964h+56m+06.110s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       21199868247
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       33156475701

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     25416         -
# 2  Short offline       Completed without error       00%     25248         -
# 3  Short offline       Completed without error       00%     25080         -
# 4  Short offline       Completed without error       00%     24912         -
# 5  Short offline       Completed without error       00%     24857         -
# 6  Short offline       Completed without error       00%     24857         -
# 7  Short offline       Completed without error       00%     24857         -
# 8  Short offline       Completed without error       00%     24727         -
# 9  Short offline       Completed without error       00%     24559         -
#10  Short offline       Completed without error       00%     24391         -
#11  Short offline       Completed without error       00%     24223         -
#12  Short offline       Completed without error       00%     24055         -
#13  Short offline       Completed without error       00%     23887         -
#14  Short offline       Completed without error       00%     23719         -
#15  Short offline       Completed without error       00%     23551         -
#16  Short offline       Completed without error       00%     23383         -
#17  Short offline       Completed without error       00%     23215         -
#18  Short offline       Completed without error       00%     23047         -
#19  Short offline       Completed without error       00%     22879         -
#20  Short offline       Completed without error       00%     22711         -
#21  Short offline       Completed without error       00%     22543         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

Code:
=== START OF INFORMATION SECTION ===
Model Family:     Seagate BarraCuda 3.5 (SMR)
Device Model:     ST8000DM004-2CX188
Serial Number:    <snipped>
LU WWN Device Id: 5 000c50 0b66db595
Firmware Version: 0001
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5425 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database 7.3/5528
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Apr  5 18:02:52 2026 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x73) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 984) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x30a5)    SCT Status supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   062   006    Pre-fail  Always       -       275471
  3 Spin_Up_Time            0x0003   092   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       -       1044
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   075   060   045    Pre-fail  Always       -       31580299
  9 Power_On_Hours          0x0032   071   071   000    Old_age   Always       -       25804h+29m+09.104s
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       146
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   093   000    Old_age   Always       -       101 101 101
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   071   056   040    Old_age   Always       -       29 (Min/Max 23/40)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       102
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1173
194 Temperature_Celsius     0x0022   029   044   000    Old_age   Always       -       29 (0 16 0 0 0)
195 Hardware_ECC_Recovered  0x001a   100   064   000    Old_age   Always       -       275471
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       2256h+33m+30.624s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       31240456649
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       33538807961

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     25767         -
# 2  Short offline       Completed without error       00%     25599         -
# 3  Short offline       Completed without error       00%     25431         -
# 4  Short offline       Completed without error       00%     25263         -
# 5  Short offline       Completed without error       00%     25209         -
# 6  Short offline       Completed without error       00%     25208         -
# 7  Short offline       Completed without error       00%     25078         -
# 8  Short offline       Completed without error       00%     24910         -
# 9  Short offline       Completed without error       00%     24742         -
#10  Short offline       Completed without error       00%     24574         -
#11  Short offline       Completed without error       00%     24406         -
#12  Short offline       Completed without error       00%     24238         -
#13  Short offline       Completed without error       00%     24070         -
#14  Short offline       Completed without error       00%     23902         -
#15  Short offline       Completed without error       00%     23735         -
#16  Short offline       Completed without error       00%     23566         -
#17  Short offline       Completed without error       00%     23398         -
#18  Short offline       Completed without error       00%     23230         -
#19  Short offline       Completed without error       00%     23062         -
#20  Short offline       Completed without error       00%     22895         -
#21  Short offline       Completed without error       00%     22727         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more
 
Code:
=== START OF INFORMATION SECTION ===
Device Model:     HUH721010ALE601
Serial Number:    <snipped>
LU WWN Device Id: 5 000cca 266f6a93c
Firmware Version: LHGL0003
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Apr  5 18:04:41 2026 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (   93) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      (   1) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   131   131   054    Pre-fail  Offline      -       104
  3 Spin_Up_Time            0x0007   150   150   024    Pre-fail  Always       -       429 (Average 447)
  4 Start_Stop_Count        0x0012   090   090   000    Old_age   Always       -       42419
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   092   092   000    Old_age   Always       -       62248
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       138
 22 Unknown_Attribute       0x0023   100   100   025    Pre-fail  Always       -       100
 45 Unknown_Attribute       0x0023   100   100   001    Pre-fail  Always       -       1095233372415
192 Power-Off_Retract_Count 0x0032   064   064   000    Old_age   Always       -       43394
193 Load_Cycle_Count        0x0012   064   064   000    Old_age   Always       -       43394
194 Temperature_Celsius     0x0002   187   187   000    Old_age   Always       -       32 (Min/Max 15/51)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
231 Temperature_Celsius     0x0032   100   100   000    Old_age   Always       -       0
241 Total_LBAs_Written      0x0012   100   100   000    Old_age   Always       -       3535215874925
242 Total_LBAs_Read         0x0012   100   100   000    Old_age   Always       -       5985205885276

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     62211         -
# 2  Short offline       Completed without error       00%     62043         -
# 3  Short offline       Completed without error       00%     61875         -
# 4  Short offline       Completed without error       00%     61707         -
# 5  Short offline       Completed without error       00%     61653         -
# 6  Short offline       Completed without error       00%     61653         -
# 7  Short offline       Completed without error       00%     61514         -
# 8  Short offline       Completed without error       00%     61346         -
# 9  Short offline       Completed without error       00%     61178         -
#10  Short offline       Completed without error       00%     61010         -
#11  Short offline       Completed without error       00%     60842         -
#12  Short offline       Completed without error       00%     60674         -
#13  Short offline       Completed without error       00%     60506         -
#14  Short offline       Completed without error       00%     60338         -
#15  Short offline       Completed without error       00%     60170         -
#16  Short offline       Completed without error       00%     60002         -
#17  Short offline       Completed without error       00%     59834         -
#18  Short offline       Completed without error       00%     59666         -
#19  Short offline       Completed without error       00%     59498         -
#20  Short offline       Completed without error       00%     59331         -
#21  Short offline       Completed without error       00%     59163         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

Code:
=== START OF INFORMATION SECTION ===
Device Model:     HUH721010ALE601
Serial Number:    <snipped>
LU WWN Device Id: 5 000cca 26bc690c2
Firmware Version: LHGL0003
User Capacity:    10,000,831,348,736 bytes [10.0 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database 7.3/5528
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Apr  5 18:05:39 2026 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (   93) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      (   1) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   134   134   054    Pre-fail  Offline      -       96
  3 Spin_Up_Time            0x0007   150   150   024    Pre-fail  Always       -       453 (Average 421)
  4 Start_Stop_Count        0x0012   090   090   000    Old_age   Always       -       43673
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   128   128   020    Pre-fail  Offline      -       18
  9 Power_On_Hours          0x0012   092   092   000    Old_age   Always       -       59212
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1735
 22 Unknown_Attribute       0x0023   100   100   025    Pre-fail  Always       -       100
 45 Unknown_Attribute       0x0023   100   100   001    Pre-fail  Always       -       1095233372415
192 Power-Off_Retract_Count 0x0032   063   063   000    Old_age   Always       -       44606
193 Load_Cycle_Count        0x0012   063   063   000    Old_age   Always       -       44606
194 Temperature_Celsius     0x0002   187   187   000    Old_age   Always       -       32 (Min/Max 14/56)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
231 Temperature_Celsius     0x0032   100   100   000    Old_age   Always       -       0
241 Total_LBAs_Written      0x0012   100   100   000    Old_age   Always       -       2188205270379
242 Total_LBAs_Read         0x0012   100   100   000    Old_age   Always       -       3237503502456

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     59175         -
# 2  Short offline       Completed without error       00%     59007         -
# 3  Short offline       Completed without error       00%     58839         -
# 4  Short offline       Completed without error       00%     58671         -
# 5  Short offline       Completed without error       00%     58616         -
# 6  Short offline       Completed without error       00%     58616         -
# 7  Short offline       Completed without error       00%     58616         -
# 8  Short offline       Completed without error       00%     58486         -
# 9  Short offline       Completed without error       00%     58318         -
#10  Short offline       Completed without error       00%     58150         -
#11  Short offline       Completed without error       00%     57982         -
#12  Short offline       Completed without error       00%     57814         -
#13  Short offline       Completed without error       00%     57646         -
#14  Short offline       Completed without error       00%     57478         -
#15  Short offline       Completed without error       00%     57310         -
#16  Short offline       Completed without error       00%     57142         -
#17  Short offline       Completed without error       00%     56974         -
#18  Short offline       Completed without error       00%     56806         -
#19  Short offline       Completed without error       00%     56638         -
#20  Short offline       Completed without error       00%     56470         -
#21  Short offline       Completed without error       00%     56302         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

The above only provides legacy SMART information - try 'smartctl -x' for more

And the spinners look good too, if I'm reading this right.

When I spin up the VM again, I get these messages:
Code:
Apr 05 18:22:36 <node> kernel: md/raid1:md126: Disk failure on sda1, disabling device.md/raid1:md126: Operation continuing on 2 devices.
Apr 05 18:22:36 <node> kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Apr 05 18:22:36 <node> kernel: sd 0:0:0:0: [sda] Stopping disk
Apr 05 18:22:36 <node> mdadm[809]: mdadm: Fail event detected on md device /dev/md126, component device /dev/sda1
Apr 05 18:22:36 <node> kernel: md/raid1:md126: Disk failure on sdb1, disabling device.md/raid1:md126: Operation continuing on 1 devices.
Apr 05 18:22:36 <node> kernel: sd 1:0:0:0: [sdb] Synchronizing SCSI cache
Apr 05 18:22:36 <node> kernel: sd 1:0:0:0: [sdb] Stopping disk
 
Last edited: