[SOLVED] replace disk in a functional raidz-1 with 5 disks

McChaos · Aug 14, 2023

Hello,

I have a pool running with 5 x 4 TB disks in a raidz-1 configuration and would like to replace one drive by a brand-new one. This pool is running since 2..3 years in a 8-bay server housing. the backplane is connected via adapter from 2xSAS(??) to 8xSATA ports on the mainboard.
Starting situation:

Code:

root@pve1:~# zpool status
  pool: data
 state: ONLINE
  scan: scrub repaired 0B in 07:45:36 with 0 errors on Mon Aug 14 19:03:18 2023
config:

    NAME                                   STATE     READ WRITE CKSUM
    data                                   ONLINE       0     0     0
      raidz1-0                             ONLINE       0     0     0
        ata-HGST_HDN726040ALE614_K7GSY59B  ONLINE       0     0     0
        ata-HGST_HDN726040ALE614_K7GWBU4L  ONLINE       0     0     0
        ata-HGST_HDN726040ALE614_K7GWWEDL  ONLINE       0     0     0
        ata-ST4000VN008-2DR166_ZDH6SYB8    ONLINE       0     0     0
        ata-ST4000VN008-2DR166_ZGY445KR    ONLINE       0     0     0

errors: No known data errors

root@pve1:~#

Now I would like to replace the lowermost drive (because of increasing smart failures) by a another one:
zpool replace ata-ST4000VN008-2DR166_ZGY445KR /dev/disk/by-id/ata-WDC_WD60EFAX-68JH4N1_WD-WX42D51N5Y1D

It starts resilvering for a few minutes (15..30) and unfortunately, this ends with "too many errors" on the brand-new (!) drive:

Code:

  pool: data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Aug 14 20:21:17 2023
    4.66T scanned at 816M/s, 3.19T issued at 558M/s, 15.6T total
    32.7G resilvered, 20.42% done, 06:29:20 to go
config:

    NAME                                            STATE     READ WRITE CKSUM
    data                                            DEGRADED     0     0     0
      raidz1-0                                      DEGRADED     0     0     0
        ata-HGST_HDN726040ALE614_K7GSY59B           ONLINE       0     0     0
        ata-HGST_HDN726040ALE614_K7GWBU4L           ONLINE       0     0     0
        ata-HGST_HDN726040ALE614_K7GWWEDL           ONLINE       0     0     0
        ata-ST4000VN008-2DR166_ZDH6SYB8             ONLINE       0     0     0
        replacing-4                                 DEGRADED     0     0     0
          ata-ST4000VN008-2DR166_ZGY445KR           ONLINE       0     0     0
          ata-WDC_WD60EFAX-68JH4N1_WD-WX42D51N5Y1D  FAULTED      0    49     0  too many errors

errors: No known data errors

I tested it already with a second (brand-new) drive of the sam brand, different locations on the 8-bay case, everytime the same failure.

Smart test on the new drive are of course perfect without any errors (on the second one as well):

Code:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   253   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   100   253   021    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       4
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       53
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       3
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       1
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       11
194 Temperature_Celsius     0x0022   105   103   000    Old_age   Always       -       45
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         0         -
# 2  Conveyance offline  Completed without error       00%         0         -
# 3  Short offline       Completed without error       00%         0         -

Anyone an idea, what is going wrong? What else can I do?
To be honest, I do not expect that two brand-new drives are damaged without showing any smart test failure.

Thank you at all!
Thomas

leesteken · Aug 14, 2023

WD60EFAX is an SMR drive and cannot handle sustained writes and is therfore not suitable for ZFS (or NAS in general, even though WD claims it is). Please search the forum for issues with SMR drives. In case of WD, you need Red Plus instead of Red.
EDIT: You are getting ZFS write errors because the drive is responding too slowly to the writes, because of SMR. Maybe use it for cold storage like ISO's and such.

Dunuin · Aug 15, 2023

I fully agree with leesteken. Try a CMR disk. And keep in mind that of that 6TB only 4TB would be usable, unless you replace all 5 disks with disk models of 6TB or more. So if you intend to replace all with bigger disks later, this would be fine. If not you could save some bucks by getting just a new 4TB disk.

McChaos · Aug 16, 2023

Never saw before, that a harddisk is incompatible with a file system

Ok, thanks for your hints, I sent the disks back and will check other products..

Search

Search

[SOLVED] replace disk in a functional raidz-1 with 5 disks

McChaos

Member

leesteken

Distinguished Member

Dunuin

Distinguished Member

McChaos

Member