Hi Everyone,
A few weeks ago we had a hard drive failure in our ZFS array. We replaced the hard drive with a new one, and all has been working OK except that I am noticing the `Raw_Read_Error_Rate` and `Seek_Error_Rate` is climbing on all of our disks in the ZFS pool. We're currently at 193251856 errors.
Under ZFS the Health of the 4-disk pools shows ONLINE. The "Scan" message says:
```
resilvered 246G in 03:05:30 with 0 errors on Thu Nov 21 02:36:08 2024
```
And there are `No known data errors`.
I'm noticing that the VMs on the `rust02` ZFS pools are running a little slow and sometimes jittery, so I am raising this form post in case the climbing `Raw_Read_Error_Rate` and `Seek_Error_Rate` counts are indicative of a ticking time bomb.
Our Proxmox server is pretty heavily used. We have 30+ VMs and containers running. Some of them pretty heavily, processing 2Million database transactions per day (test system). So it's not clear to me if `Raw_Read_Error_Rate` and `Seek_Error_Rate` are normal, and something that's going to climb given how much we tax our platform. Any feedback would be much appreciated.
I am preparing to introduce SSD cache disks to `rust02`, but before doing so I wanted to raise this with the community.
A few weeks ago we had a hard drive failure in our ZFS array. We replaced the hard drive with a new one, and all has been working OK except that I am noticing the `Raw_Read_Error_Rate` and `Seek_Error_Rate` is climbing on all of our disks in the ZFS pool. We're currently at 193251856 errors.
Under ZFS the Health of the 4-disk pools shows ONLINE. The "Scan" message says:
```
resilvered 246G in 03:05:30 with 0 errors on Thu Nov 21 02:36:08 2024
```
And there are `No known data errors`.
I'm noticing that the VMs on the `rust02` ZFS pools are running a little slow and sometimes jittery, so I am raising this form post in case the climbing `Raw_Read_Error_Rate` and `Seek_Error_Rate` counts are indicative of a ticking time bomb.
Our Proxmox server is pretty heavily used. We have 30+ VMs and containers running. Some of them pretty heavily, processing 2Million database transactions per day (test system). So it's not clear to me if `Raw_Read_Error_Rate` and `Seek_Error_Rate` are normal, and something that's going to climb given how much we tax our platform. Any feedback would be much appreciated.
I am preparing to introduce SSD cache disks to `rust02`, but before doing so I wanted to raise this with the community.