Trap error ZFS ?

frankz

Well-Known Member
Nov 16, 2020
396
25
48
Hello everyone, I often see kernel errors in PBS logs:





Code:
task:txg_sync        state:D stack:    0 pid: 1001 ppid:     2 flags:0x00004000
[322861.638666] Call Trace:
[322861.638669]  <TASK>
[322861.638673]  __schedule+0x33d/0x1750
[322861.638682]  ? lock_timer_base+0x3b/0xd0
[322861.638689]  ? __mod_timer+0x271/0x440
[322861.638693]  schedule+0x4e/0xc0
[322861.638695]  schedule_timeout+0x87/0x140
[322861.638699]  ? __bpf_trace_tick_stop+0x20/0x20
[322861.638702]  io_schedule_timeout+0x51/0x80
[322861.638706]  __cv_timedwait_common+0x135/0x170 [spl]
[322861.638719]  ? wait_woken+0x70/0x70
[322861.638723]  __cv_timedwait_io+0x19/0x20 [spl]
[322861.638735]  zio_wait+0x137/0x300 [zfs]
[322861.638907]  ? __cond_resched+0x1a/0x50
[322861.638910]  dsl_pool_sync+0xcc/0x4f0 [zfs]
[322861.639039]  ? spa_suspend_async_destroy+0x60/0x60 [zfs]
[322861.639175]  ? add_timer+0x20/0x30
[322861.639177]  spa_sync+0x55a/0x1020 [zfs]
[322861.639311]  ? spa_txg_history_init_io+0x10a/0x120 [zfs]
[322861.639446]  txg_sync_thread+0x2d3/0x460 [zfs]
[322861.639579]  ? txg_init+0x2c0/0x2c0 [zfs]
[322861.639712]  thread_generic_wrapper+0x61/0x80 [spl]
[322861.639723]  ? __thread_exit+0x20/0x20 [spl]
[322861.639734]  kthread+0x127/0x150
[322861.639738]  ? set_kthread_struct+0x50/0x50
[322861.639741]  ret_from_fork+0x1f/0x30
[322861.639747]  </TASK>


I think the error generated is due to a time out:
Code:
INFO: task txg_sync:1001 blocked for more than 120 seconds.
[323828.285384]       Tainted: P           O      5.15.39-2-pve #1
[323828.285388] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 
Last edited:
that just tells you that a kernel task (ZFS syncing to disk in this case) didn't make any progress for 2 minutes. usually that is the result of some kernel bug (like a deadlock somewhere). could you check the logs for the first occurrence of a kernel trace (for this boot)? anything special about your ZFS setup?
 
that just tells you that a kernel task (ZFS syncing to disk in this case) didn't make any progress for 2 minutes. usually that is the result of some kernel bug (like a deadlock somewhere). could you check the logs for the first occurrence of a kernel trace (for this boot)? anything special about your ZFS setup?
Hi Fabian and thank you for replying. I would say that I don't have any particular configurations. I am aware that backups take place on USB disks and I think this generates strong latencies. At least I think so.
 
the zpool is on one or more USB disk(s)? that is a rather brittle setup, as ZFS really doesn't like disks disappearing (which can easily happen with USB!).
 
the zpool is on one or more USB disk(s)? that is a rather brittle setup, as ZFS really doesn't like disks disappearing (which can easily happen with USB!).
Code:
root@pbs:~# zpool status
  pool: ZFS_STORAGE
 state: ONLINE
  scan: scrub repaired 0B in 15:18:23 with 0 errors on Sun Aug 14 15:42:24 2022
config:

        NAME                      STATE     READ WRITE CKSUM
        ZFS_STORAGE               ONLINE       0     0     0
          wwn-0x50014ee65d2002a6  ONLINE       0     0     0

errors: No known data errors

  pool: ZFS_USB500
 state: ONLINE
  scan: scrub repaired 0B in 01:45:28 with 0 errors on Sun Aug 14 02:09:51 2022
config:

        NAME                      STATE     READ WRITE CKSUM
        ZFS_USB500                ONLINE       0     0     0
          wwn-0x50024e9201cf8165  ONLINE       0     0     0

errors: No known data errors

  pool: ZFS_WD
 state: ONLINE
  scan: scrub repaired 0B in 19:42:46 with 0 errors on Sun Aug 14 20:07:16 2022
config:

        NAME                      STATE     READ WRITE CKSUM
        ZFS_WD                    ONLINE       0     0     0
          wwn-0x50014ee604e5c46b  ONLINE       0     0     0

errors: No known data errors
root@pbs:~#
 

Attachments

  • zpool.png
    zpool.png
    26.1 KB · Views: 4