server timeout for some seconds and zfs error log

openaspace

Active Member
Sep 16, 2019
486
13
38
Italy
Hello after some seconds of server timeout i found this performing zfs log:
What I can do?
Thanks.

Code:
root@prx1:~# zpool status rpool -v
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Mon Aug 14 10:37:14 2023
        73.0G scanned at 0B/s, 51.3G issued at 303M/s, 73.0G total
        0B repaired, 70.18% done, 00:01:13 to go
config:

        NAME                                                      STATE     READ WRITE CKSUM
        rpool                                                     ONLINE       0     0     0
          mirror-0                                                ONLINE       0     0     0
            ata-SanDisk_SSD_PLUS_240GB_191730800683-part3         ONLINE       0     0     2
            ata-SPCC_Solid_State_Disk_B57B079917B300189124-part3  ONLINE       0     0     3

errors: Permanent errors have been detected in the following files:

        //var/log/journal/36e4da022d77476baf47226c55fea384/system@000602cefb361c60-ab8cb24833722ad7.journal~
 
I would not worry too much about the corrupt log file but you might need to delete or overwrite it to clear the permanent error message.
Did the scrub find any more errors (one and a half minute later)? Do the logs (journalctl) show more information around the time of the incident? Hopefully that part of the log is not the corrupted part.
Checksum error means that data was read but corrupted (like drive silently failing or cable issue) or the original write went silently wrong. It could also be a memory error (during the read or original write), so check it with a memtest86. Check the drives themselves with a long SMART test and maybe disconnect are reconnect them a few times to clear the connection/cables. It depends on the messages you find in the logs and SMART whether the drives are still dependable.
 
  • Like
Reactions: openaspace