ZFS on root pve works but with error

bobzer

Well-Known Member
Nov 14, 2017
37
8
48
37
Hi,

I got a proxmox baremetal with rpool on a mirror-0 and cache.
I got some error I did a scrub but there are still there.
Can i fix it or I must reinstall everything ?

Code:
# zpool status rpool -v
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 03:48:38 with 1504 errors on Wed May 31 16:22:37 2023
config:

        NAME                                        STATE     READ WRITE CKSUM
        rpool                                       ONLINE       0     0     0
          mirror-0                                  ONLINE       0     0     0
            ata-ST8000NE001-2M7101_WSD39YC4-part3   ONLINE       0     0 3.04K
            ata-ST8000VN0022-2EL112_ZA1E0Q42-part3  ONLINE       0     0 3.04K
        cache
          sdl1                                      ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        rpool/data/vm-620-disk-0:<0x1>
        rpool/data/vm-622-disk-0:<0x1>
        rpool/ROOT/pve-1:<0x7f725>
        //var/log/journal/dd6a19a94159411c9971e99cde8dfa9d/system@bd3ac7844efb489887661d9d330efa17-000000000004b8c5-0005fcb1a6858ba2.journal
        rpool/ROOT/pve-1:<0x7fe76>
        rpool/data/vm-110-disk-0:<0x1>
        rpool/data/subvol-101-disk-1:<0x0>
        rpool/data/subvol-101-disk-1:<0x78314>

Code:
# zdb rpool/ROOT/pve-1:0x7f725
failed to hold dataset 'rpool/ROOT/pve-1:0x7f725': No such file or directory
zdb: can't open 'rpool/ROOT/pve-1:0x7f725': No such file or directory

ZFS_DBGMSG(zdb) START:
spa.c:5181:spa_open_common(): spa_open_common: opening rpool/ROOT/pve-1:0x7f725
spa_misc.c:418:spa_load_note(): spa_load(rpool, config trusted): LOADING
vdev.c:160:vdev_dbgmsg(): disk vdev '/dev/disk/by-id/ata-ST8000NE001-2M7101_WSD39YC4-part3': best uberblock found for spa rpool. txg 6544389
spa_misc.c:418:spa_load_note(): spa_load(rpool, config untrusted): using uberblock with txg=6544389
spa_misc.c:418:spa_load_note(): spa_load(rpool, config trusted): spa_load_verify found 0 metadata errors and 4 data errors
spa.c:8360:spa_async_request(): spa=rpool async request task=2048
spa_misc.c:418:spa_load_note(): spa_load(rpool, config trusted): LOADED
ZFS_DBGMSG(zdb) END

Code:
# zdb -R rpool ROOT/pve-1:0x7f725
Invalid block specifier: ROOT/pve-1:0x7f725  - offset must be a multiple of sector size

Hope you can help me
Have a great day
 
I don't have a recent backup for the root (i do for all lxc/vm)
I would really like to not have to reinstall everything (I got unlock vgpu)
I am wondering if i can get more information on the object that have errors ?, because i realise that maybe it's a temp file a log or whatever not that important.
About the drive it's kind of strange.
I had 4x4To and 6x 8To, I replace the 4x4To by 4x16To and then i got random read, write and checksum error on most of the 8To drive.
but each time a clear and scrub fix it.
I could see in the dmesg that it was a hardware problem related to link reset and communication failure.
So i replace 2 sata cable and add 2 lines of psu to spread the power consumption.
Since i did that i had no read write error at all but i still do have checksum error.
I don't understand enough what cause checksum error, i thought that it shouldn't exist if there is no read/write error ?
 
Is it possible to get more information the block object that have problem ?