Hi All,
I have received a ZFS issue twice in the last 2 weeks, where the ZFS says there is a i/o fault.
"The number of I/O errors associated with a ZFS device exceeded
acceptable levels. ZFS has marked the device as faulted.
impact: Fault tolerance of the pool may be compromised.
eid: 42
class: statechange
state: FAULTED"
This sunday it ran a scrub by himself and i got the following email:
"
ZFS has finished a scrub:
eid: 26
class: scrub_finish
host: new-stan
time: 2022-05-08 00:32:04+0200
pool: rpool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 00:08:03 with 0 errors on Sun May 8 00:32:04 2022
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme-eui.00000000000000008ce38ee20dba8601-part3 ONLINE 0 16 0
nvme-eui.00000000000000008ce38ee20dba8801-part3 ONLINE 0 1 0
errors: No known data errors"
After that we tested both of the drives are tested and they appear to be in perfect shape (they are a month old). I ran a manual scrub today and that finished without errors after clearing the pool.
The other weird thing is that the usage of the nvme list command is different, one drives states a usage of 1.88TB and the other 1.95TB.
Do you guys have some tips on how to further diagnose this issue?
I have received a ZFS issue twice in the last 2 weeks, where the ZFS says there is a i/o fault.
"The number of I/O errors associated with a ZFS device exceeded
acceptable levels. ZFS has marked the device as faulted.
impact: Fault tolerance of the pool may be compromised.
eid: 42
class: statechange
state: FAULTED"
This sunday it ran a scrub by himself and i got the following email:
"
ZFS has finished a scrub:
eid: 26
class: scrub_finish
host: new-stan
time: 2022-05-08 00:32:04+0200
pool: rpool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 00:08:03 with 0 errors on Sun May 8 00:32:04 2022
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
nvme-eui.00000000000000008ce38ee20dba8601-part3 ONLINE 0 16 0
nvme-eui.00000000000000008ce38ee20dba8801-part3 ONLINE 0 1 0
errors: No known data errors"
After that we tested both of the drives are tested and they appear to be in perfect shape (they are a month old). I ran a manual scrub today and that finished without errors after clearing the pool.
The other weird thing is that the usage of the nvme list command is different, one drives states a usage of 1.88TB and the other 1.95TB.
Do you guys have some tips on how to further diagnose this issue?