Hi, I am using PBS for more than 6 month now, with 4x2TB HDD in a 80% full zfs pool (not RAID, I know, it's bad!) and one drive is starting to fail.
SMART show increasing number of "197 Current_Pending_Sector" and ZFS start to complain with degraded state.
If I understand well, the PBS mechanism seems to be renaming of bad chunck (Input/output error (os error 5)) with 0.bad at end of name by the verify job.
This is ok, so I can indentify bad chunck and restore them from S3 backup and re-run verify on bad snapshot, which go green again.
The problem IMHO is the GC job, who start to remove bad chunck (log : Removed bad chunks: 1), freeing unreliable HDD sector, re-used by next snapshot, failing again in verify job.
Wouldn't that be a good idea to have a check box for GC for not removing bad chunck for this particular case ?
my 2cents comment.
Best regards,
(P.S. if someone know how to force HDD to remap sectors)
SMART show increasing number of "197 Current_Pending_Sector" and ZFS start to complain with degraded state.
If I understand well, the PBS mechanism seems to be renaming of bad chunck (Input/output error (os error 5)) with 0.bad at end of name by the verify job.
This is ok, so I can indentify bad chunck and restore them from S3 backup and re-run verify on bad snapshot, which go green again.
The problem IMHO is the GC job, who start to remove bad chunck (log : Removed bad chunks: 1), freeing unreliable HDD sector, re-used by next snapshot, failing again in verify job.
Wouldn't that be a good idea to have a check box for GC for not removing bad chunck for this particular case ?
my 2cents comment.
Best regards,
(P.S. if someone know how to force HDD to remap sectors)