[SOLVED] ZFS: Cannot replace a FAULTED disk, every new disk becomes FAULTED too in a RAIDZ2 rpool

May 28, 2018
68
11
13
38
I'm having trouble replacing a FAULTED disk in my ZFS RAIDZ2 rpool.

I have now bought 2 disks to replace the 1 that is FAULTED, but during the resilvering process, the new disk becomes FAULTED as well.

At first I thought this was a coincidence and throught the new disk must've been damaged somehow, so I othered a 2nd new drive and this one also becomes FAULTED during the resilvering.

Both new disks were added as hot spares and I've let the resilvering run without reboot. The new disk becomes FAULTED when resilvering is 70% complete or so.

Any ideas what could be going on?

I already replaced the power supply and connected the disks using more power cables (so less disks on the same cable). The hot spares are on different SATA ports using different SATA cables...

So yeah, it appears I cannot get the rpool fixed using any amount of new disks.

I'm not using any kind of RAID controller. Just the 8 SATA ports on my Supermicro X11SCA-F motherboard: https://www.supermicro.com/en/products/motherboard/X11SCA-F
 
This does sound like a tricky error ;)

Have you tried to replace the SATA cables or a different port on the motherboard?

Did you do a memory test?
 
And is there anything in the dmesg output or /var/log/syslog that indicates what kind of error is encountered for that drive?

Another thing that just came to mind. What kind of drives are we talking about? Which manufacturer and model?
 
@aaron I have 2 kinds of SATA cables. The hot spares are using the new cables, but fail too. Same for the SATA ports. The hot spares are on previously unused ports. All SATA ports are now in use (8x 6TB disks, 2 hot spares that are trying to replace the original faulted disk).

Memtest86 5.13 won't run (using Ubuntu 20.04 live USB). I tried to test the RAM on 2 different motherboards, but it freezes after 5 sec into the test. Tried the single CPU and failsafe modes, all without luck. Tried testing 1 stick of RAM at a time, same issue (though it fails at a higher % due to less total RAM).

I'm running out of options here.

Could the 4 permanent errors be causing the problem? Should I attempt to fix these first?
root@o1:~# zpool status -v
pool: rpool
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Sep 21 20:39:48 2020
10.8T scanned at 218M/s, 9.94T issued at 202M/s, 12.0T total
1.66T resilvered, 82.90% done, 0 days 02:57:55 to go
config:

NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
wwn-0x50014ee2bb7efdb5-part3 ONLINE 0 0 0
wwn-0x50014ee2bb7f0661-part3 ONLINE 0 0 0
spare-2 DEGRADED 0 0 75
wwn-0x50014ee210d3fa3a-part3 DEGRADED 0 0 0 too many errors (resilvering)
ata-WDC_WD60EFAX-68JH4N0_WD-WX12D40E576K FAULTED 0 41 0 too many errors (resilvering)
ata-WDC_WD60EFAX-68JH4N0_WD-WX12D40ED9P9 ONLINE 0 1 0 (resilvering)
wwn-0x50014ee210d3fc99-part3 ONLINE 0 0 0
wwn-0x50014ee2bb8b61ab-part3 ONLINE 0 0 0
wwn-0x50014ee266294f11-part3 ONLINE 0 0 0
logs
nvme-INTEL_SSDPE21D480GA_PHM28134004Q480BGN ONLINE 0 0 0
cache
nvme0n1 ONLINE 0 0 0
spares
ata-WDC_WD60EFAX-68JH4N0_WD-WX12D40ED9P9 INUSE currently in use
ata-WDC_WD60EFAX-68JH4N0_WD-WX12D40E576K INUSE currently in use

errors: Permanent errors have been detected in the following files:

rpool/ROOT/pve-1/a6e8244673f14deed356339b3cd9b03c3e31a596f5cd4ed7928862b47197f3f9:<0x0>
rpool/ROOT/pve-1/2997c7e8253323806e979ba26ba86af0d9ac86d621e2b9db0732e3ed7daeb227:<0x0>
rpool/ROOT/pve-1/f065eac675c65d812221a5bbdcca186017d875035a10172ec8e26e9440ea0241:<0x0>
rpool/ROOT/pve-1/914314486dfc1609e4f16364c2336221fea70f23cc70ea9b666cbb872b10eb10:<0x0>
 
Dmesg output.... yes there is something here...
[ 256.192415] device vethbf91461 entered promiscuous mode
[ 256.484239] eth0: renamed from veth716a00f
[ 256.520143] IPv6: ADDRCONF(NETDEV_CHANGE): vethbf91461: link becomes ready
[ 256.520172] br-4a44e939efcc: port 11(vethbf91461) entered blocking state
[ 256.520172] br-4a44e939efcc: port 11(vethbf91461) entered forwarding state
[ 438.835830] sda: sda1 sda9
[ 479.650183] sde: sde1 sde9
[ 794.371419] sda: sda1 sda9
[19066.181650] perf: interrupt took too long (2535 > 2500), lowering kernel.perf_event_max_sample_rate to 78750
[25013.478229] INFO: task zfs:3085 blocked for more than 120 seconds.
[25013.478246] Tainted: P O 5.4.60-1-pve #1
[25013.478257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[25013.478271] zfs D 0 3085 2798 0x00000000
[25013.478272] Call Trace:
[25013.478276] __schedule+0x2e6/0x6f0
[25013.478278] schedule+0x33/0xa0
[25013.478279] io_schedule+0x16/0x40
[25013.478283] cv_wait_common+0xb5/0x130 [spl]
[25013.478284] ? wait_woken+0x80/0x80
[25013.478286] __cv_wait_io+0x18/0x20 [spl]
[25013.478320] txg_wait_synced_impl+0xc9/0x110 [zfs]
[25013.478346] txg_wait_synced+0x10/0x40 [zfs]
[25013.478369] dsl_sync_task_common+0x1b5/0x290 [zfs]
[25013.478390] ? dsl_dataset_hold+0x20/0x20 [zfs]
[25013.478411] ? dsl_dataset_snapshot_sync_impl+0x800/0x800 [zfs]
[25013.478431] ? dsl_dataset_hold+0x20/0x20 [zfs]
[25013.478450] ? dsl_dataset_snapshot_sync_impl+0x800/0x800 [zfs]
[25013.478471] dsl_sync_task+0x1a/0x20 [zfs]
[25013.478490] dsl_dataset_snapshot+0x131/0x360 [zfs]
[25013.478514] ? spa_name_compare+0xe/0x20 [zfs]
[25013.478516] ? avl_find+0x5f/0x90 [zavl]
[25013.478518] ? security_capable+0x3f/0x60
[25013.478520] ? ns_capable_common+0x2f/0x50
[25013.478521] ? capable+0x19/0x20
[25013.478543] ? priv_policy.isra.3.part.4+0x11/0x20 [zfs]
[25013.478567] ? secpolicy_zinject+0x3a/0x40 [zfs]
[25013.478590] ? secpolicy_zfs+0xe/0x10 [zfs]
[25013.478616] ? zfs_secpolicy_write_perms+0x32/0xe0 [zfs]
[25013.478643] zfs_ioc_snapshot+0x270/0x360 [zfs]
[25013.478670] zfsdev_ioctl+0x1e0/0x8f0 [zfs]
[25013.478671] do_vfs_ioctl+0xa9/0x640
[25013.478673] ? handle_mm_fault+0xc9/0x1f0
[25013.478674] ksys_ioctl+0x67/0x90
[25013.478674] __x64_sys_ioctl+0x1a/0x20
[25013.478676] do_syscall_64+0x57/0x190
[25013.478677] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[25013.478678] RIP: 0033:0x7f1e4529b427
[25013.478681] Code: Bad RIP value.
[25013.478681] RSP: 002b:00007ffd5a9d5ac8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[25013.478682] RAX: ffffffffffffffda RBX: 00007ffd5a9d5af0 RCX: 00007f1e4529b427
[25013.478682] RDX: 00007ffd5a9d5af0 RSI: 0000000000005a23 RDI: 0000000000000005
[25013.478683] RBP: 00007ffd5a9d90d0 R08: 0000000000000003 R09: 00007f1e45367420
[25013.478683] R10: 00005638fce4b010 R11: 0000000000000246 R12: 00007ffd5a9d9248
[25013.478683] R13: 0000000000005a23 R14: 0000000000000005 R15: 0000000000005a23
[26349.085608] device tap102i0 entered promiscuous mode
[26349.107353] fwbr102i0: port 1(fwln102i0) entered blocking state
[26349.107355] fwbr102i0: port 1(fwln102i0) entered disabled state
[26349.107423] device fwln102i0 entered promiscuous mode
[26349.107458] fwbr102i0: port 1(fwln102i0) entered blocking state
[26349.107460] fwbr102i0: port 1(fwln102i0) entered forwarding state
[26349.114161] vmbr0: port 6(fwpr102p0) entered blocking state
[26349.114162] vmbr0: port 6(fwpr102p0) entered disabled state
[26349.114210] device fwpr102p0 entered promiscuous mode
[26349.114225] vmbr0: port 6(fwpr102p0) entered blocking state
[26349.114226] vmbr0: port 6(fwpr102p0) entered forwarding state
[26349.116692] fwbr102i0: port 2(tap102i0) entered blocking state
[26349.116693] fwbr102i0: port 2(tap102i0) entered disabled state
[26349.116758] fwbr102i0: port 2(tap102i0) entered blocking state
[26349.116760] fwbr102i0: port 2(tap102i0) entered forwarding state
[27853.083747] fwbr102i0: port 2(tap102i0) entered disabled state
[27853.111630] fwbr102i0: port 1(fwln102i0) entered disabled state
[27853.112557] vmbr0: port 6(fwpr102p0) entered disabled state
[27853.113427] device fwln102i0 left promiscuous mode
[27853.113428] fwbr102i0: port 1(fwln102i0) entered disabled state
[27853.133786] device fwpr102p0 left promiscuous mode
[27853.133788] vmbr0: port 6(fwpr102p0) entered disabled state
[30142.424030] perf: interrupt took too long (3171 > 3168), lowering kernel.perf_event_max_sample_rate to 63000
[40492.199816] ata1.00: exception Emask 0x0 SAct 0x10008 SErr 0x0 action 0x0
[40492.199836] ata1.00: irq_stat 0x40000008
[40492.199844] ata1.00: failed command: WRITE FPDMA QUEUED
[40492.199855] ata1.00: cmd 61/08:18:20:4f:0c/00:00:27:02:00/40 tag 3 ncq dma 4096 out
res 41/10:00:20:4f:0c/00:00:27:02:00/00 Emask 0x481 (invalid argument) <F>
[40492.199881] ata1.00: status: { DRDY ERR }
[40492.199889] ata1.00: error: { IDNF }
[40492.203569] ata1.00: configured for UDMA/133
[40492.203590] sd 0:0:0:0: [sda] tag#3 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[40492.203591] sd 0:0:0:0: [sda] tag#3 Sense Key : Illegal Request [current]
[40492.203592] sd 0:0:0:0: [sda] tag#3 Add. Sense: Logical block address out of range
[40492.203593] sd 0:0:0:0: [sda] tag#3 CDB: Write(16) 8a 00 00 00 00 02 27 0c 4f 20 00 00 00 08 00 00
[40492.203594] blk_update_request: I/O error, dev sda, sector 9245052704 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[40492.203633] zio pool=rpool vdev=/dev/disk/by-id/ata-WDC_WD60EFAX-68JH4N0_WD-WX12D40E576K-part1 error=5 type=2 offset=4733465935872 size=4096 flags=1808aa
[40492.203640] ata1: EH complete
[40542.638349] ata1.00: exception Emask 0x0 SAct 0x80001 SErr 0x0 action 0x0
[40542.638370] ata1.00: irq_stat 0x40000008
[40542.638379] ata1.00: failed command: WRITE FPDMA QUEUED
[40542.638390] ata1.00: cmd 61/08:00:e8:d9:93/00:00:a5:01:00/40 tag 0 ncq dma 4096 out
res 41/10:00:e8:d9:93/00:00:a5:01:00/00 Emask 0x481 (invalid argument) <F>
[40542.638416] ata1.00: status: { DRDY ERR }
[40542.638424] ata1.00: error: { IDNF }
[40542.640071] ata1.00: configured for UDMA/133
[40542.640076] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[40542.640077] sd 0:0:0:0: [sda] tag#0 Sense Key : Illegal Request [current]
[40542.640078] sd 0:0:0:0: [sda] tag#0 Add. Sense: Logical block address out of range
[40542.640079] sd 0:0:0:0: [sda] tag#0 CDB: Write(16) 8a 00 00 00 00 01 a5 93 d9 e8 00 00 00 08 00 00
[40542.640081] blk_update_request: I/O error, dev sda, sector 7072897512 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[40542.640104] zio pool=rpool vdev=/dev/disk/by-id/ata-WDC_WD60EFAX-68JH4N0_WD-WX12D40E576K-part1 error=5 type=2 offset=3621322477568 size=4096 flags=180880
[40542.640111] ata1: EH complete
[40570.925538] ata1.00: exception Emask 0x0 SAct 0x80400000 SErr 0x0 action 0x0
[40570.925557] ata1.00: irq_stat 0x40000008
[40570.925566] ata1.00: failed command: WRITE FPDMA QUEUED
[40570.925578] ata1.00: cmd 61/40:f8:b0:01:c5/00:00:e0:00:00/40 tag 31 ncq dma 32768 out
res 41/10:00:b0:01:c5/00:00:e0:00:00/00 Emask 0x481 (invalid argument) <F>
[40570.925604] ata1.00: status: { DRDY ERR }
[40570.925612] ata1.00: error: { IDNF }
[40570.928167] ata1.00: configured for UDMA/133
[40570.928177] sd 0:0:0:0: [sda] tag#31 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[40570.928179] sd 0:0:0:0: [sda] tag#31 Sense Key : Illegal Request [current]
[40570.928179] sd 0:0:0:0: [sda] tag#31 Add. Sense: Logical block address out of range
[40570.928181] sd 0:0:0:0: [sda] tag#31 CDB: Write(16) 8a 00 00 00 00 00 e0 c5 01 b0 00 00 00 40 00 00
[40570.928182] blk_update_request: I/O error, dev sda, sector 3771007408 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[40570.928206] zio pool=rpool vdev=/dev/disk/by-id/ata-WDC_WD60EFAX-68JH4N0_WD-WX12D40E576K-part1 error=5 type=2 offset=1930754744320 size=32768 flags=1808aa
[40570.928212] ata1: EH complete
[40615.979475] ata1.00: exception Emask 0x0 SAct 0xa0800000 SErr 0x0 action 0x0
[40615.979495] ata1.00: irq_stat 0x40000008
[40615.979504] ata1.00: failed command: WRITE FPDMA QUEUED
[40615.979515] ata1.00: cmd 61/08:b8:10:0b:00/00:00:00:00:00/40 tag 23 ncq dma 4096 out
res 41/10:00:10:0b:00/00:00:00:00:00/00 Emask 0x481 (invalid argument) <F>
[40615.979541] ata1.00: status: { DRDY ERR }
[40615.979549] ata1.00: error: { IDNF }
[40615.980842] ata1.00: configured for UDMA/133
[40615.980850] sd 0:0:0:0: [sda] tag#23 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[40615.980851] sd 0:0:0:0: [sda] tag#23 Sense Key : Illegal Request [current]
[40615.980852] sd 0:0:0:0: [sda] tag#23 Add. Sense: Logical block address out of range
[40615.980853] sd 0:0:0:0: [sda] tag#23 CDB: Write(16) 8a 00 00 00 00 00 00 00 0b 10 00 00 00 08 00 00
[40615.980855] blk_update_request: I/O error, dev sda, sector 2832 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[40615.980880] zio pool=rpool vdev=/dev/disk/by-id/ata-WDC_WD60EFAX-68JH4N0_WD-WX12D40E576K-part1 error=5 type=2 offset=401408 size=4096 flags=180ac0
[40615.980887] ata1: EH complete
[40623.324804] ata1.00: exception Emask 0x0 SAct 0x10700000 SErr 0x0 action 0x0
[40623.324820] ata1.00: irq_stat 0x40000008
[40623.324829] ata1.00: failed command: WRITE FPDMA QUEUED
[40623.324841] ata1.00: cmd 61/08:e0:10:ad:a0/00:00:ba:02:00/40 tag 28 ncq dma 4096 out
res 41/10:00:10:ad:a0/00:00:ba:02:00/00 Emask 0x481 (invalid argument) <F>
[40623.324867] ata1.00: status: { DRDY ERR }
[40623.324875] ata1.00: error: { IDNF }
[40623.325974] ata1.00: configured for UDMA/133
 
More dmesg
[40623.325981] sd 0:0:0:0: [sda] tag#28 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[40623.325982] sd 0:0:0:0: [sda] tag#28 Sense Key : Illegal Request [current]
[40623.325983] sd 0:0:0:0: [sda] tag#28 Add. Sense: Logical block address out of range
[40623.325984] sd 0:0:0:0: [sda] tag#28 CDB: Write(16) 8a 00 00 00 00 02 ba a0 ad 10 00 00 00 08 00 00
[40623.325986] blk_update_request: I/O error, dev sda, sector 11721026832 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[40623.326009] zio pool=rpool vdev=/dev/disk/by-id/ata-WDC_WD60EFAX-68JH4N0_WD-WX12D40E576K-part1 error=5 type=2 offset=6001164689408 size=4096 flags=180ac0
[40623.326013] ata1: EH complete
[40645.409352] ata1.00: exception Emask 0x0 SAct 0xc00000 SErr 0x0 action 0x0
[40645.409373] ata1.00: irq_stat 0x40000008
[40645.409382] ata1.00: failed command: WRITE FPDMA QUEUED
[40645.409393] ata1.00: cmd 61/08:b0:38:da:f8/00:00:15:01:00/40 tag 22 ncq dma 4096 out
res 41/10:00:38:da:f8/00:00:15:01:00/00 Emask 0x481 (invalid argument) <F>
[40645.409419] ata1.00: status: { DRDY ERR }
[40645.409427] ata1.00: error: { IDNF }
[40645.410866] ata1.00: configured for UDMA/133
[40645.410873] sd 0:0:0:0: [sda] tag#22 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[40645.410874] sd 0:0:0:0: [sda] tag#22 Sense Key : Illegal Request [current]
[40645.410875] sd 0:0:0:0: [sda] tag#22 Add. Sense: Logical block address out of range
[40645.410876] sd 0:0:0:0: [sda] tag#22 CDB: Write(16) 8a 00 00 00 00 01 15 f8 da 38 00 00 00 08 00 00
[40645.410877] blk_update_request: I/O error, dev sda, sector 4663597624 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[40645.410902] zio pool=rpool vdev=/dev/disk/by-id/ata-WDC_WD60EFAX-68JH4N0_WD-WX12D40E576K-part1 error=5 type=2 offset=2387760934912 size=4096 flags=1808aa
[40645.410910] ata1: EH complete
[40683.023986] ata1.00: exception Emask 0x0 SAct 0x4002000 SErr 0x0 action 0x0
[40683.024056] ata1.00: irq_stat 0x40000008
[40683.024065] ata1.00: failed command: WRITE FPDMA QUEUED
[40683.024076] ata1.00: cmd 61/08:68:28:04:3e/00:00:16:01:00/40 tag 13 ncq dma 4096 out
res 41/10:00:28:04:3e/00:00:16:01:00/00 Emask 0x481 (invalid argument) <F>
[40683.024102] ata1.00: status: { DRDY ERR }
[40683.024110] ata1.00: error: { IDNF }
[40683.025270] ata1.00: configured for UDMA/133
[40683.025276] sd 0:0:0:0: [sda] tag#13 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[40683.025277] sd 0:0:0:0: [sda] tag#13 Sense Key : Illegal Request [current]
[40683.025278] sd 0:0:0:0: [sda] tag#13 Add. Sense: Logical block address out of range
[40683.025279] sd 0:0:0:0: [sda] tag#13 CDB: Write(16) 8a 00 00 00 00 01 16 3e 04 28 00 00 00 08 00 00
[40683.025281] blk_update_request: I/O error, dev sda, sector 4668130344 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[40683.025303] zio pool=rpool vdev=/dev/disk/by-id/ata-WDC_WD60EFAX-68JH4N0_WD-WX12D40E576K-part1 error=5 type=2 offset=2390081687552 size=4096 flags=1808aa
[40683.025311] ata1: EH complete
[41007.591237] ata1.00: exception Emask 0x0 SAct 0x40020000 SErr 0x0 action 0x0
[41007.591256] ata1.00: irq_stat 0x40000008
[41007.591265] ata1.00: failed command: WRITE FPDMA QUEUED
[41007.591277] ata1.00: cmd 61/30:88:d8:aa:69/00:00:e5:00:00/40 tag 17 ncq dma 24576 out
res 41/10:00:d8:aa:69/00:00:e5:00:00/00 Emask 0x481 (invalid argument) <F>
[41007.591303] ata1.00: status: { DRDY ERR }
[41007.591311] ata1.00: error: { IDNF }
[41007.592663] ata1.00: configured for UDMA/133
[41007.592671] sd 0:0:0:0: [sda] tag#17 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[41007.592672] sd 0:0:0:0: [sda] tag#17 Sense Key : Illegal Request [current]
[41007.592673] sd 0:0:0:0: [sda] tag#17 Add. Sense: Logical block address out of range
[41007.592674] sd 0:0:0:0: [sda] tag#17 CDB: Write(16) 8a 00 00 00 00 00 e5 69 aa d8 00 00 00 30 00 00
[41007.592675] blk_update_request: I/O error, dev sda, sector 3848907480 op 0x1:(WRITE) flags 0x700 phys_seg 6 prio class 0
[41007.592703] zio pool=rpool vdev=/dev/disk/by-id/ata-WDC_WD60EFAX-68JH4N0_WD-WX12D40E576K-part1 error=5 type=2 offset=1970639581184 size=24576 flags=1808aa
[41007.592711] ata1: EH complete
[41093.591077] ata1.00: exception Emask 0x0 SAct 0x4020 SErr 0x0 action 0x0
[41093.591096] ata1.00: irq_stat 0x40000008
[41093.591105] ata1.00: failed command: WRITE FPDMA QUEUED
[41093.591116] ata1.00: cmd 61/00:28:b0:6c:9a/08:00:e5:00:00/40 tag 5 ncq dma 1048576 ou
res 41/10:00:b0:6c:9a/00:00:e5:00:00/00 Emask 0x481 (invalid argument) <F>
[41093.591143] ata1.00: status: { DRDY ERR }
[41093.591151] ata1.00: error: { IDNF }
[41093.592667] ata1.00: configured for UDMA/133
[41093.592675] sd 0:0:0:0: [sda] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[41093.592676] sd 0:0:0:0: [sda] tag#5 Sense Key : Illegal Request [current]
[41093.592678] sd 0:0:0:0: [sda] tag#5 Add. Sense: Logical block address out of range
[41093.592679] sd 0:0:0:0: [sda] tag#5 CDB: Write(16) 8a 00 00 00 00 00 e5 9a 6c b0 00 00 08 00 00 00
[41093.592681] blk_update_request: I/O error, dev sda, sector 3852102832 op 0x1:(WRITE) flags 0x700 phys_seg 16 prio class 0
[41093.592706] zio pool=rpool vdev=/dev/disk/by-id/ata-WDC_WD60EFAX-68JH4N0_WD-WX12D40E576K-part1 error=5 type=2 offset=1972275601408 size=1048576 flags=40080caa
[41093.592714] ata1: EH complete
[50299.070473] ata5.00: exception Emask 0x0 SAct 0x4008000 SErr 0x0 action 0x0
[50299.070494] ata5.00: irq_stat 0x40000008
[50299.070503] ata5.00: failed command: WRITE FPDMA QUEUED
[50299.070514] ata5.00: cmd 61/08:d0:c8:dc:5e/00:00:80:01:00/40 tag 26 ncq dma 4096 out
res 41/10:00:c8:dc:5e/00:00:80:01:00/00 Emask 0x481 (invalid argument) <F>
[50299.070540] ata5.00: status: { DRDY ERR }
[50299.070548] ata5.00: error: { IDNF }
[50299.075132] ata5.00: configured for UDMA/133
[50299.075141] sd 4:0:0:0: [sde] tag#26 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[50299.075142] sd 4:0:0:0: [sde] tag#26 Sense Key : Illegal Request [current]
[50299.075143] sd 4:0:0:0: [sde] tag#26 Add. Sense: Logical block address out of range
[50299.075145] sd 4:0:0:0: [sde] tag#26 CDB: Write(16) 8a 00 00 00 00 01 80 5e dc c8 00 00 00 08 00 00
[50299.075146] blk_update_request: I/O error, dev sde, sector 6448667848 op 0x1:(WRITE) flags 0x700 phys_seg 1 prio class 0
[50299.075170] zio pool=rpool vdev=/dev/disk/by-id/ata-WDC_WD60EFAX-68JH4N0_WD-WX12D40ED9P9-part1 error=5 type=2 offset=3301716889600 size=4096 flags=1808aa
[50299.075177] ata5: EH complete
 
I ordered new WD Red Pro drives (with CMR) and will update this post as I get them.

@aaron Any ideas how to fix the metadata? There are 4 errors, but I can't tell which files for example. Do I simply scrub a few times? Read that this works for some people.
 
Last edited:
Do I simply scrub a few times?

Just once should be fine, I don't see that multiple runs would fix multiple problems.
If you have a "hole punch" or ZFS-equivalent of this, you will have a corrupt file (and ZFS knows that then) and you need to restore the file from backup (or if it is not "user data", you may just copy it back from another install: e.g. cmd.exe or some linux binary/library).
 
@LnxBil the errors have indeed disappeared after the resilver process. I'm now running a scrub.

The 4 errors were all metadata errors, so that's why I wasn't sure how to fix them. I guess ZFS must've taken care of it?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!