High `Raw_Read_Error_Rate` and `Seek_Error_Rate`

user73937393

New Member
Feb 2, 2024
15
6
3
Hi Everyone,

A few weeks ago we had a hard drive failure in our ZFS array. We replaced the hard drive with a new one, and all has been working OK except that I am noticing the `Raw_Read_Error_Rate` and `Seek_Error_Rate` is climbing on all of our disks in the ZFS pool. We're currently at 193251856 errors.

Under ZFS the Health of the 4-disk pools shows ONLINE. The "Scan" message says:
```
resilvered 246G in 03:05:30 with 0 errors on Thu Nov 21 02:36:08 2024
```
And there are `No known data errors`.

I'm noticing that the VMs on the `rust02` ZFS pools are running a little slow and sometimes jittery, so I am raising this form post in case the climbing `Raw_Read_Error_Rate` and `Seek_Error_Rate` counts are indicative of a ticking time bomb.

Our Proxmox server is pretty heavily used. We have 30+ VMs and containers running. Some of them pretty heavily, processing 2Million database transactions per day (test system). So it's not clear to me if `Raw_Read_Error_Rate` and `Seek_Error_Rate` are normal, and something that's going to climb given how much we tax our platform. Any feedback would be much appreciated.

I am preparing to introduce SSD cache disks to `rust02`, but before doing so I wanted to raise this with the community.
 
Neither of those stats mean much for disk health. Run a SMART long test - if it passes, should be good.


https://www.backblaze.com/blog/what-smart-stats-indicate-hard-drive-failures/


[[

For the last few years we've used the following five SMART stats as a means of helping determine if a drive is going to fail.


Attribute Description
SMART 5 Reallocated Sectors Count
SMART 187 Reported Uncorrectable Errors
SMART 188 Command Timeout
SMART 197 Current Pending Sector Count
SMART 198 Uncorrectable Sector Count
]]
 
Hi Everyone,

A few weeks ago we had a hard drive failure in our ZFS array. We replaced the hard drive with a new one, and all has been working OK except that I am noticing the `Raw_Read_Error_Rate` and `Seek_Error_Rate` is climbing on all of our disks in the ZFS pool. We're currently at 193251856 errors.

Under ZFS the Health of the 4-disk pools shows ONLINE. The "Scan" message says:
```
resilvered 246G in 03:05:30 with 0 errors on Thu Nov 21 02:36:08 2024
```
And there are `No known data errors`.

I'm noticing that the VMs on the `rust02` ZFS pools are running a little slow and sometimes jittery, so I am raising this form post in case the climbing `Raw_Read_Error_Rate` and `Seek_Error_Rate` counts are indicative of a ticking time bomb.

Our Proxmox server is pretty heavily used. We have 30+ VMs and containers running. Some of them pretty heavily, processing 2Million database transactions per day (test system). So it's not clear to me if `Raw_Read_Error_Rate` and `Seek_Error_Rate` are normal, and something that's going to climb given how much we tax our platform. Any feedback would be much appreciated.

I am preparing to introduce SSD cache disks to `rust02`, but before doing so I wanted to raise this with the community.
Hi,

What manufacturer are these drives from?
I know Seagate is storing some other information along with the actual error rate. There are some online converter to find out if there are actually errors (https://www.disktuna.com/seagate-raw-smart-attributes-to-error-convertertest/) also the manufacturer have usually some information on that, google is your friend.

So long