Persistent ZFS Pool Errors and Data Corruption Issues – Assistance Needed

Quintas-air4

New Member
Nov 2, 2024
3
0
1
I am experiencing issues with a ZFS pool (named "DISK") running on a Proxmox VE server with TrueNAS as a VM. The pool consists of two 20 TB disks in a mirrored configuration (RAID 1). The main problem is that the pool reports data errors and shows permanent data loss for certain files. I have conducted multiple zpool scrub operations, and while they report no repaired errors, data errors persist. Attempts to import the pool with various flags (-f, -o readonly=on) often result in I/O errors and segmentation faults.

SMART tests have been run on the disks, showing no critical errors, yet TrueNAS encounters access issues with specific blocks, indicating underlying read problems. Additionally, zdb commands reveal block errors and leaked space.

I am seeking assistance with:

  • Understanding the root causes of these data errors.
  • Recommendations on potential repair options to recover data, given that I do not have a separate backup for this data.
Any insights or advice on how to proceed to minimize further data loss and potentially recover the data would be greatly appreciated.
 
If it possible

1. run ZFS pool in read only
2. Copy what you can.
3. To copy corrupted files (if needed) try zfs_send_corrupt_data

Problem: Mostly hardware

can you print? # zpool status -x -v
 
If it possible

1. run ZFS pool in read only
2. Copy what you can.
3. To copy corrupted files (if needed) try zfs_send_corrupt_data

Problem: Mostly hardware

can you print? # zpool status -x -v
root@truenas[~]# zpool status -x -v
pool: DISK
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 09:01:26 with 0 errors on Sat Nov 2 22:10:37 2024
config:

NAME STATE READ WRITE CKSUM
DISK ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/589e2493-9380-11ef-89ea-bc2411e60803 ONLINE 0 0 456
gptid/58ac781b-9380-11ef-89ea-bc2411e60803 ONLINE 0 0 456

errors: Permanent errors have been detected in the following files:

DISK/.system/services:<0x0>
DISK/iocage:<0x0>
 

Attachments

  • Skärmbild 2024-11-03 160334.png
    Skärmbild 2024-11-03 160334.png
    45.1 KB · Views: 4
  • Skärmbild 2024-11-03 160355.png
    Skärmbild 2024-11-03 160355.png
    72.7 KB · Views: 4
Try # zpool clear DISK

Those <0x0> may indicate the problem of the past. Like if you delete corrupted file zpool still keeps its information.

Some times all catalog could be corrupted.

.system/services and iocage - can you recreate it ?
 
Try # zpool clear DISK

Those <0x0> may indicate the problem of the past. Like if you delete corrupted file zpool still keeps its information.

Some times all catalog could be corrupted.

.system/services and iocage - can you recreate it ?
Thank you for your input! I have already run zpool clear DISK and have been managing the issues that persist afterward. I understand that these <0x0> entries might indicate past problems, possibly related to deleted corrupted files that ZFS still retains metadata for. I've also noted that the catalog may be partially corrupted, and I can now see the .system/services and iocage directories.

Do you have further advice on how to recreate these specific directories or manage the corruption without risking data loss?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!