Hello,
I have a pool running with 5 x 4 TB disks in a raidz-1 configuration and would like to replace one drive by a brand-new one. This pool is running since 2..3 years in a 8-bay server housing. the backplane is connected via adapter from 2xSAS(??) to 8xSATA ports on the mainboard.
Starting situation:
Now I would like to replace the lowermost drive (because of increasing smart failures) by a another one:
It starts resilvering for a few minutes (15..30) and unfortunately, this ends with "too many errors" on the brand-new (!) drive:
I tested it already with a second (brand-new) drive of the sam brand, different locations on the 8-bay case, everytime the same failure.
Smart test on the new drive are of course perfect without any errors (on the second one as well):
Anyone an idea, what is going wrong? What else can I do?
To be honest, I do not expect that two brand-new drives are damaged without showing any smart test failure.
Thank you at all!
Thomas
I have a pool running with 5 x 4 TB disks in a raidz-1 configuration and would like to replace one drive by a brand-new one. This pool is running since 2..3 years in a 8-bay server housing. the backplane is connected via adapter from 2xSAS(??) to 8xSATA ports on the mainboard.
Starting situation:
Code:
root@pve1:~# zpool status
pool: data
state: ONLINE
scan: scrub repaired 0B in 07:45:36 with 0 errors on Mon Aug 14 19:03:18 2023
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-HGST_HDN726040ALE614_K7GSY59B ONLINE 0 0 0
ata-HGST_HDN726040ALE614_K7GWBU4L ONLINE 0 0 0
ata-HGST_HDN726040ALE614_K7GWWEDL ONLINE 0 0 0
ata-ST4000VN008-2DR166_ZDH6SYB8 ONLINE 0 0 0
ata-ST4000VN008-2DR166_ZGY445KR ONLINE 0 0 0
errors: No known data errors
root@pve1:~#
Now I would like to replace the lowermost drive (because of increasing smart failures) by a another one:
zpool replace ata-ST4000VN008-2DR166_ZGY445KR /dev/disk/by-id/ata-WDC_WD60EFAX-68JH4N1_WD-WX42D51N5Y1D
It starts resilvering for a few minutes (15..30) and unfortunately, this ends with "too many errors" on the brand-new (!) drive:
Code:
pool: data
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Aug 14 20:21:17 2023
4.66T scanned at 816M/s, 3.19T issued at 558M/s, 15.6T total
32.7G resilvered, 20.42% done, 06:29:20 to go
config:
NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
ata-HGST_HDN726040ALE614_K7GSY59B ONLINE 0 0 0
ata-HGST_HDN726040ALE614_K7GWBU4L ONLINE 0 0 0
ata-HGST_HDN726040ALE614_K7GWWEDL ONLINE 0 0 0
ata-ST4000VN008-2DR166_ZDH6SYB8 ONLINE 0 0 0
replacing-4 DEGRADED 0 0 0
ata-ST4000VN008-2DR166_ZGY445KR ONLINE 0 0 0
ata-WDC_WD60EFAX-68JH4N1_WD-WX42D51N5Y1D FAULTED 0 49 0 too many errors
errors: No known data errors
I tested it already with a second (brand-new) drive of the sam brand, different locations on the 8-bay case, everytime the same failure.
Smart test on the new drive are of course perfect without any errors (on the second one as well):
Code:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 253 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 100 253 021 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 4
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 53
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 1
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 11
194 Temperature_Celsius 0x0022 105 103 000 Old_age Always - 45
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 0 -
# 2 Conveyance offline Completed without error 00% 0 -
# 3 Short offline Completed without error 00% 0 -
Anyone an idea, what is going wrong? What else can I do?
To be honest, I do not expect that two brand-new drives are damaged without showing any smart test failure.
Thank you at all!
Thomas