High `Raw_Read_Error_Rate` and `Seek_Error_Rate`

user73937393

New Member
Feb 2, 2024
15
1
3
Hi Everyone,

A few weeks ago we had a hard drive failure in our ZFS array. We replaced the hard drive with a new one, and all has been working OK except that I am noticing the `Raw_Read_Error_Rate` and `Seek_Error_Rate` is climbing on all of our disks in the ZFS pool. We're currently at 193251856 errors.

Under ZFS the Health of the 4-disk pools shows ONLINE. The "Scan" message says:
```
resilvered 246G in 03:05:30 with 0 errors on Thu Nov 21 02:36:08 2024
```
And there are `No known data errors`.

I'm noticing that the VMs on the `rust02` ZFS pools are running a little slow and sometimes jittery, so I am raising this form post in case the climbing `Raw_Read_Error_Rate` and `Seek_Error_Rate` counts are indicative of a ticking time bomb.

Our Proxmox server is pretty heavily used. We have 30+ VMs and containers running. Some of them pretty heavily, processing 2Million database transactions per day (test system). So it's not clear to me if `Raw_Read_Error_Rate` and `Seek_Error_Rate` are normal, and something that's going to climb given how much we tax our platform. Any feedback would be much appreciated.

I am preparing to introduce SSD cache disks to `rust02`, but before doing so I wanted to raise this with the community.
 
Neither of those stats mean much for disk health. Run a SMART long test - if it passes, should be good.


https://www.backblaze.com/blog/what-smart-stats-indicate-hard-drive-failures/


[[

For the last few years we've used the following five SMART stats as a means of helping determine if a drive is going to fail.


Attribute Description
SMART 5 Reallocated Sectors Count
SMART 187 Reported Uncorrectable Errors
SMART 188 Command Timeout
SMART 197 Current Pending Sector Count
SMART 198 Uncorrectable Sector Count
]]
 
Hi Everyone,

A few weeks ago we had a hard drive failure in our ZFS array. We replaced the hard drive with a new one, and all has been working OK except that I am noticing the `Raw_Read_Error_Rate` and `Seek_Error_Rate` is climbing on all of our disks in the ZFS pool. We're currently at 193251856 errors.

Under ZFS the Health of the 4-disk pools shows ONLINE. The "Scan" message says:
```
resilvered 246G in 03:05:30 with 0 errors on Thu Nov 21 02:36:08 2024
```
And there are `No known data errors`.

I'm noticing that the VMs on the `rust02` ZFS pools are running a little slow and sometimes jittery, so I am raising this form post in case the climbing `Raw_Read_Error_Rate` and `Seek_Error_Rate` counts are indicative of a ticking time bomb.

Our Proxmox server is pretty heavily used. We have 30+ VMs and containers running. Some of them pretty heavily, processing 2Million database transactions per day (test system). So it's not clear to me if `Raw_Read_Error_Rate` and `Seek_Error_Rate` are normal, and something that's going to climb given how much we tax our platform. Any feedback would be much appreciated.

I am preparing to introduce SSD cache disks to `rust02`, but before doing so I wanted to raise this with the community.
Hi,

What manufacturer are these drives from?
I know Seagate is storing some other information along with the actual error rate. There are some online converter to find out if there are actually errors (https://www.disktuna.com/seagate-raw-smart-attributes-to-error-convertertest/) also the manufacturer have usually some information on that, google is your friend.

So long
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!