Hi, couple days ago a node from cluster have error from dmesg:
Now the IO Wait are 30-50% and make this node are very heavy. The zpool status said:
Finding answer from the internet, I didn't find the current zfs version by invoke this command:
Any hints to solve without reboot? Or just reboot the node to solve this problem?
Any feedbacks are appreciated <3
Bash:
[Sat Jun 24 22:24:34 2023] INFO: task txg_sync:7463 blocked for more than 120 seconds.
[Sat Jun 24 22:24:34 2023] Tainted: P O 5.15.53-1-pve #1
[Sat Jun 24 22:24:34 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Jun 24 22:24:34 2023] task:txg_sync state:D stack: 0 pid: 7463 ppid: 2 flags:0x00004000
[Sat Jun 24 22:24:34 2023] Call Trace:
[Sat Jun 24 22:24:34 2023] <TASK>
[Sat Jun 24 22:24:34 2023] __schedule+0x33d/0x1750
[Sat Jun 24 22:24:34 2023] ? lock_timer_base+0x3b/0xd0
[Sat Jun 24 22:24:34 2023] ? __mod_timer+0x271/0x440
[Sat Jun 24 22:24:34 2023] schedule+0x4e/0xc0
[Sat Jun 24 22:24:34 2023] schedule_timeout+0x87/0x140
[Sat Jun 24 22:24:34 2023] ? __bpf_trace_tick_stop+0x20/0x20
[Sat Jun 24 22:24:34 2023] io_schedule_timeout+0x51/0x80
[Sat Jun 24 22:24:34 2023] __cv_timedwait_common+0x135/0x170 [spl]
[Sat Jun 24 22:24:34 2023] ? wait_woken+0x70/0x70
[Sat Jun 24 22:24:34 2023] __cv_timedwait_io+0x19/0x20 [spl]
[Sat Jun 24 22:24:34 2023] zio_wait+0x137/0x300 [zfs]
[Sat Jun 24 22:24:34 2023] ? __cond_resched+0x1a/0x50
[Sat Jun 24 22:24:34 2023] dsl_pool_sync+0xcc/0x4f0 [zfs]
[Sat Jun 24 22:24:34 2023] ? spa_suspend_async_destroy+0x60/0x60 [zfs]
[Sat Jun 24 22:24:34 2023] ? add_timer+0x20/0x30
[Sat Jun 24 22:24:34 2023] spa_sync+0x55a/0x1020 [zfs]
[Sat Jun 24 22:24:34 2023] ? spa_txg_history_init_io+0x10a/0x120 [zfs]
[Sat Jun 24 22:24:34 2023] txg_sync_thread+0x2d3/0x460 [zfs]
[Sat Jun 24 22:24:34 2023] ? txg_init+0x2c0/0x2c0 [zfs]
[Sat Jun 24 22:24:34 2023] thread_generic_wrapper+0x64/0x80 [spl]
[Sat Jun 24 22:24:34 2023] ? __thread_exit+0x20/0x20 [spl]
[Sat Jun 24 22:24:34 2023] kthread+0x12a/0x150
[Sat Jun 24 22:24:34 2023] ? set_kthread_struct+0x50/0x50
[Sat Jun 24 22:24:34 2023] ret_from_fork+0x22/0x30
[Sat Jun 24 22:24:34 2023] </TASK>
[Sat Jun 24 22:26:35 2023] INFO: task txg_sync:7463 blocked for more than 241 seconds.
[Sat Jun 24 22:26:35 2023] Tainted: P O 5.15.53-1-pve #1
[Sat Jun 24 22:26:35 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Jun 24 22:26:35 2023] task:txg_sync state:D stack: 0 pid: 7463 ppid: 2 flags:0x00004000
[Sat Jun 24 22:26:35 2023] Call Trace:
[Sat Jun 24 22:26:35 2023] <TASK>
[Sat Jun 24 22:26:35 2023] __schedule+0x33d/0x1750
[Sat Jun 24 22:26:35 2023] ? lock_timer_base+0x3b/0xd0
[Sat Jun 24 22:26:35 2023] ? __mod_timer+0x271/0x440
[Sat Jun 24 22:26:35 2023] schedule+0x4e/0xc0
[Sat Jun 24 22:26:35 2023] schedule_timeout+0x87/0x140
[Sat Jun 24 22:26:35 2023] ? __bpf_trace_tick_stop+0x20/0x20
[Sat Jun 24 22:26:35 2023] io_schedule_timeout+0x51/0x80
[Sat Jun 24 22:26:35 2023] __cv_timedwait_common+0x135/0x170 [spl]
[Sat Jun 24 22:26:35 2023] ? wait_woken+0x70/0x70
[Sat Jun 24 22:26:35 2023] __cv_timedwait_io+0x19/0x20 [spl]
[Sat Jun 24 22:26:35 2023] zio_wait+0x137/0x300 [zfs]
[Sat Jun 24 22:26:35 2023] ? __cond_resched+0x1a/0x50
[Sat Jun 24 22:26:35 2023] dsl_pool_sync+0xcc/0x4f0 [zfs]
[Sat Jun 24 22:26:35 2023] ? spa_suspend_async_destroy+0x60/0x60 [zfs]
[Sat Jun 24 22:26:35 2023] ? add_timer+0x20/0x30
[Sat Jun 24 22:26:35 2023] spa_sync+0x55a/0x1020 [zfs]
[Sat Jun 24 22:26:35 2023] ? spa_txg_history_init_io+0x10a/0x120 [zfs]
[Sat Jun 24 22:26:35 2023] txg_sync_thread+0x2d3/0x460 [zfs]
[Sat Jun 24 22:26:35 2023] ? txg_init+0x2c0/0x2c0 [zfs]
[Sat Jun 24 22:26:35 2023] thread_generic_wrapper+0x64/0x80 [spl]
[Sat Jun 24 22:26:35 2023] ? __thread_exit+0x20/0x20 [spl]
[Sat Jun 24 22:26:35 2023] kthread+0x12a/0x150
[Sat Jun 24 22:26:35 2023] ? set_kthread_struct+0x50/0x50
[Sat Jun 24 22:26:35 2023] ret_from_fork+0x22/0x30
[Sat Jun 24 22:26:35 2023] </TASK>
[Sat Jun 24 22:30:37 2023] INFO: task txg_sync:7463 blocked for more than 120 seconds.
[Sat Jun 24 22:30:37 2023] Tainted: P O 5.15.53-1-pve #1
[Sat Jun 24 22:30:37 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Jun 24 22:30:37 2023] task:txg_sync state:D stack: 0 pid: 7463 ppid: 2 flags:0x00004000
[Sat Jun 24 22:30:37 2023] Call Trace:
[Sat Jun 24 22:30:37 2023] <TASK>
[Sat Jun 24 22:30:37 2023] __schedule+0x33d/0x1750
[Sat Jun 24 22:30:37 2023] ? lock_timer_base+0x3b/0xd0
[Sat Jun 24 22:30:37 2023] ? __mod_timer+0x271/0x440
[Sat Jun 24 22:30:37 2023] schedule+0x4e/0xc0
[Sat Jun 24 22:30:37 2023] schedule_timeout+0x87/0x140
[Sat Jun 24 22:30:37 2023] ? __bpf_trace_tick_stop+0x20/0x20
[Sat Jun 24 22:30:37 2023] io_schedule_timeout+0x51/0x80
[Sat Jun 24 22:30:37 2023] __cv_timedwait_common+0x135/0x170 [spl]
[Sat Jun 24 22:30:37 2023] ? wait_woken+0x70/0x70
[Sat Jun 24 22:30:37 2023] __cv_timedwait_io+0x19/0x20 [spl]
[Sat Jun 24 22:30:37 2023] zio_wait+0x137/0x300 [zfs]
[Sat Jun 24 22:30:37 2023] ? __cond_resched+0x1a/0x50
[Sat Jun 24 22:30:37 2023] dsl_pool_sync+0xcc/0x4f0 [zfs]
[Sat Jun 24 22:30:37 2023] ? spa_suspend_async_destroy+0x60/0x60 [zfs]
[Sat Jun 24 22:30:37 2023] ? add_timer+0x20/0x30
[Sat Jun 24 22:30:37 2023] spa_sync+0x55a/0x1020 [zfs]
[Sat Jun 24 22:30:37 2023] ? spa_txg_history_init_io+0x10a/0x120 [zfs]
[Sat Jun 24 22:30:37 2023] txg_sync_thread+0x2d3/0x460 [zfs]
[Sat Jun 24 22:30:37 2023] ? txg_init+0x2c0/0x2c0 [zfs]
[Sat Jun 24 22:30:37 2023] thread_generic_wrapper+0x64/0x80 [spl]
[Sat Jun 24 22:30:37 2023] ? __thread_exit+0x20/0x20 [spl]
[Sat Jun 24 22:30:37 2023] kthread+0x12a/0x150
[Sat Jun 24 22:30:37 2023] ? set_kthread_struct+0x50/0x50
[Sat Jun 24 22:30:37 2023] ret_from_fork+0x22/0x30
[Sat Jun 24 22:30:37 2023] </TASK>
Now the IO Wait are 30-50% and make this node are very heavy. The zpool status said:
Bash:
zpool status
pool: vmdata
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 03:31:01 with 0 errors on Sun Jun 11 03:55:02 2023
config:
NAME STATE READ WRITE CKSUM
vmdata ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sdh ONLINE 0 0 0
sdi ONLINE 0 0 0
errors: No known data errors
Bash:
# zpool get version vmdata
NAME PROPERTY VALUE SOURCE
vmdata version - default
Any hints to solve without reboot? Or just reboot the node to solve this problem?
Any feedbacks are appreciated <3