Hi,
I have a Proxmox instance installed that I use at home for storage(Truenass VM), and a couple other VMs running.
I have several sata HDD and SSD used some directly on the Proxmox, others mapped to the Truenass VM.
All discs experience randomly RW I/O errors when used intensively. This happens if I use torrent file download at high speed (~8-10MBps) or if I play from local Plex instance at above 1080p resolutions.
Once the a drive fails the only fix is to reboot the Proxmox instance to restore access to that drive.
Until now I tried the following:
- replacing the HDD and SSDs > no change, the new ones are failing two
- adding a sata pcie controller and moving the disks to this controller instead of the main-board sata ports > no change, drives still failing.
- replacing the PSU with a more powerful PSU > no change drives still failing
- replacing the mainboard > no change, drives still failing
I think I have eliminated all possible hardware failures so I belive the hypervisor software is the problem.
pool: rpool
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub in progress since Fri Apr 7 11:36:31 2023
269G scanned at 380M/s, 136G issued at 191M/s, 453G total
0B repaired, 29.93% done, 00:28:19 to go
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
ata-Samsung_SSD_870_QVO_1TB_S5SVNF0NB05808R-part3 ONLINE 0 0 0
ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T409531B-part3 FAULTED 6 0 0 too many errors
Attached are the sata failures seen from Proxmox dmesg.
I have a Proxmox instance installed that I use at home for storage(Truenass VM), and a couple other VMs running.
I have several sata HDD and SSD used some directly on the Proxmox, others mapped to the Truenass VM.
All discs experience randomly RW I/O errors when used intensively. This happens if I use torrent file download at high speed (~8-10MBps) or if I play from local Plex instance at above 1080p resolutions.
Once the a drive fails the only fix is to reboot the Proxmox instance to restore access to that drive.
Until now I tried the following:
- replacing the HDD and SSDs > no change, the new ones are failing two
- adding a sata pcie controller and moving the disks to this controller instead of the main-board sata ports > no change, drives still failing.
- replacing the PSU with a more powerful PSU > no change drives still failing
- replacing the mainboard > no change, drives still failing
I think I have eliminated all possible hardware failures so I belive the hypervisor software is the problem.
pool: rpool
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub in progress since Fri Apr 7 11:36:31 2023
269G scanned at 380M/s, 136G issued at 191M/s, 453G total
0B repaired, 29.93% done, 00:28:19 to go
config:
NAME STATE READ WRITE CKSUM
rpool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
ata-Samsung_SSD_870_QVO_1TB_S5SVNF0NB05808R-part3 ONLINE 0 0 0
ata-Samsung_SSD_870_QVO_1TB_S5RRNF0T409531B-part3 FAULTED 6 0 0 too many errors
Attached are the sata failures seen from Proxmox dmesg.