Checksum errors, need help diagnosing.

dairou

New Member
Aug 22, 2022
2
0
1
Hi guys, I ran out of ideas, maybe both my drives are bad (they are new)... But I'd like your opinion. I'm using this as a home NAS and media server, with no real critical data, hence the use of consumer hardware that I mostly got from upgrading my main PCs.

I'm getting a lot of checksum errors on my actual NAS drives, I first tried using ZFS on proxmox, and got lots of errors (sorry I didn't save logs of that) so then I tried on TrueNAS as a VM, where I'm still getting them.

I tried swapping the data and power cables between the pool with no issues and the one with issues, and the behavior persists.

Ran memtest for 11 hours with 0 errors.

memtest.jpg


What else can I try?
Specs:
  • Proxmox Virtual Environment 7.2-7 (bare-metal)
  • TrueNAS-13.0-U1.1 (VM under proxmox)
  • CPU: Ryzen 5 2600X
  • GPU: Gigabyte GeForce 210 1 GB (because there are no integrated graphics on the CPU)
  • Memory: Corsair Vengeance LPX 32GB DDR4 3600MHz C18 at 2933 MT/s(CMK32GX4M2D3600C18)
  • Motherboard: ASRock B450M Steel Legend BIOS version p3.60
  • Boot drive: Kingston A2000 250 GB M.2-2280 NVME

Disks:

ZFS on Proxmox (cheapo 2,5" drives):
Code:
NAME                                    STATE     READ WRITE CKSUM
local-madmen                            ONLINE       0     0     0
  mirror-0                              ONLINE       0     0     0
    ata-ST1000LM035-1RK172_(redacted)   ONLINE       0     0     0
    ata-TOSHIBA_MQ04ABF100_(redacted)   ONLINE       0     0     0

ZFS on TrueNAS using Seagate IronWolf 4TB, Proxmox passthrough of ST4000VN008-2DR166
Code:
NAME                                            STATE     READ WRITE CKSUM
sidekick                                        ONLINE       0     0 0
  mirror-0                                      ONLINE       0     0 0
    gptid/685769b9-1e62-11ed-b64f-9f55033a5f72  ONLINE       0     0 1.75K
    gptid/6865c552-1e62-11ed-b64f-9f55033a5f72  ONLINE       0     0 1.75K

errors: Permanent errors have been detected in the following files:

        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/df-mnt-sidekick-iocage/df_complex-reserved.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/load/load.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/memory/memory-active.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/memory/memory-cache.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/memory/memory-free.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/memory/memory-inactive.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/memory/memory-laundry.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/memory/memory-wired.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/df-mnt-sidekick-iocage/df_complex-used.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/df-mnt-sidekick-iocage-images/df_complex-reserved.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/df-mnt-sidekick-iocage-log/df_complex-free.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/df-mnt-sidekick-vault/df_complex-used.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/df-mnt-sidekick-vault/df_complex-reserved.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/df-root/df_complex-free.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/df-root/df_complex-reserved.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/df-root/df_complex-used.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/df-mnt-sidekick/df_complex-free.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/df-mnt-sidekick/df_complex-reserved.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/df-mnt-sidekick/df_complex-used.rrd
        /var/db/system/rrd-81963fc7279b4cf49c43e5a8cbe36cdb/localhost/df-mnt-sidekick-iocage-log/df_complex-reserved.rrd
 
Last edited:
Hey :)

Did you done yours firmware BIOS update ? ^^ Had the same problem with older BIOS ;)

Cordially,
 
  • Like
Reactions: dairou
Hey :)

Did you done yours firmware BIOS update ? ^^ Had the same problem with older BIOS ;)

Cordially,
I had a recent version (3.60 from 2020) but not the latest, as I did not see anything in the updates about storage. But I just updated it to the latest version (4.30). Thanks for the suggestion.

Also, as per suggestions from the TrueNAS forum, I ran badblocks destructive test on all my drives and SMART long test. They all passed.

Hopefully, it was the BIOS, but I also ordered an HBA card to rule out SATA controller and cable issues.
 
Did you get this resolved? I'm having a very similar issue.

Running for months with zero errors on Truenas 13 on proxmox 6.2. Upgraded a couple days ago to proxmox 7.2 and started getting hundreds (500-1000) of errors on all 6 drives in a raidz2 setup (all cksum errors, zero read or write errors).

Thanks in advance for any advice / what did or didn't work for you.
 
PS. Resolved my issue, which was bad ram. The new memtest86+ is so much faster than old versions. It found errors within 1-2 minutes, and completed a full pass run on some good ram in around 20 minutes.
 
  • Like
Reactions: leesteken