Server node Cluster Problem: INFO: task z_zvol:779 blocked for more than 120 seconds.

supermicro_server

Renowned Member
Sep 13, 2017
116
8
83
45
Dear All,
we have a cluster, 3 nodes, with local storage (Intel SSD with Raid ZFS-2)
We use Proxmox VE 5.1
the servers are new, CPU dual Xeon E5-2630v4, 256 GB RAM, 8 SSD Intel, ... so the servers are strong!
the server run Well! very low iodelay!
However, sometimes, without any apparent reasons, we see strange (alarming) messages on dmege:

[Sat Mar 31 12:06:04 2018] INFO: task z_zvol:902 blocked for more than 120 seconds.
[Sat Mar 31 12:06:04 2018] Tainted: P O 4.13.13-6-pve #1
[Sat Mar 31 12:06:04 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Mar 31 12:06:04 2018] z_zvol D 0 902 2 0x00000000
[Sat Mar 31 12:06:04 2018] Call Trace:
[Sat Mar 31 12:06:04 2018] __schedule+0x3e0/0x870
[Sat Mar 31 12:06:04 2018] schedule+0x36/0x80
[Sat Mar 31 12:06:04 2018] taskq_wait_outstanding+0x8c/0xd0 [spl]
[Sat Mar 31 12:06:04 2018] ? wait_woken+0x80/0x80
[Sat Mar 31 12:06:04 2018] zvol_task_cb+0x1f9/0x5b0 [zfs]
[Sat Mar 31 12:06:04 2018] ? __schedule+0x3e8/0x870
[Sat Mar 31 12:06:04 2018] taskq_thread+0x2ae/0x4d0 [spl]
[Sat Mar 31 12:06:04 2018] ? wake_up_q+0x80/0x80
[Sat Mar 31 12:06:04 2018] kthread+0x10c/0x140
[Sat Mar 31 12:06:04 2018] ? taskq_thread_should_stop+0x70/0x70 [spl]
[Sat Mar 31 12:06:04 2018] ? kthread_create_on_node+0x70/0x70
[Sat Mar 31 12:06:04 2018] ret_from_fork+0x35/0x40
[Sat Mar 31 12:08:05 2018] INFO: task z_zvol:902 blocked for more than 120 seconds.
[Sat Mar 31 12:08:05 2018] Tainted: P O 4.13.13-6-pve #1
[Sat Mar 31 12:08:05 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Mar 31 12:08:05 2018] z_zvol D 0 902 2 0x00000000
[Sat Mar 31 12:08:05 2018] Call Trace:
[Sat Mar 31 12:08:05 2018] __schedule+0x3e0/0x870
[Sat Mar 31 12:08:05 2018] schedule+0x36/0x80
[Sat Mar 31 12:08:05 2018] taskq_wait_outstanding+0x8c/0xd0 [spl]
[Sat Mar 31 12:08:05 2018] ? wait_woken+0x80/0x80
[Sat Mar 31 12:08:05 2018] zvol_task_cb+0x1f9/0x5b0 [zfs]
[Sat Mar 31 12:08:05 2018] ? __schedule+0x3e8/0x870
[Sat Mar 31 12:08:05 2018] taskq_thread+0x2ae/0x4d0 [spl]
[Sat Mar 31 12:08:05 2018] ? wake_up_q+0x80/0x80
[Sat Mar 31 12:08:05 2018] kthread+0x10c/0x140
[Sat Mar 31 12:08:05 2018] ? taskq_thread_should_stop+0x70/0x70 [spl]
[Sat Mar 31 12:08:05 2018] ? kthread_create_on_node+0x70/0x70
[Sat Mar 31 12:08:05 2018] ret_from_fork+0x35/0x40
[Sat Mar 31 12:10:06 2018] INFO: task z_zvol:902 blocked for more than 120 seconds.
[Sat Mar 31 12:10:06 2018] Tainted: P O 4.13.13-6-pve #1
[Sat Mar 31 12:10:06 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Mar 31 12:10:06 2018] z_zvol D 0 902 2 0x00000000
[Sat Mar 31 12:10:06 2018] Call Trace:
[Sat Mar 31 12:10:06 2018] __schedule+0x3e0/0x870
[Sat Mar 31 12:10:06 2018] schedule+0x36/0x80
[Sat Mar 31 12:10:06 2018] taskq_wait_outstanding+0x8c/0xd0 [spl]
[Sat Mar 31 12:10:06 2018] ? wait_woken+0x80/0x80
[Sat Mar 31 12:10:06 2018] zvol_task_cb+0x1f9/0x5b0 [zfs]
[Sat Mar 31 12:10:06 2018] ? __schedule+0x3e8/0x870
[Sat Mar 31 12:10:06 2018] taskq_thread+0x2ae/0x4d0 [spl]
[Sat Mar 31 12:10:06 2018] ? wake_up_q+0x80/0x80
[Sat Mar 31 12:10:06 2018] kthread+0x10c/0x140
[Sat Mar 31 12:10:06 2018] ? taskq_thread_should_stop+0x70/0x70 [spl]
[Sat Mar 31 12:10:06 2018] ? kthread_create_on_node+0x70/0x70
[Sat Mar 31 12:10:06 2018] ret_from_fork+0x35/0x40
[Sat Mar 31 12:12:07 2018] INFO: task z_zvol:902 blocked for more than 120 seconds.
[Sat Mar 31 12:12:07 2018] Tainted: P O 4.13.13-6-pve #1
[Sat Mar 31 12:12:07 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Mar 31 12:12:07 2018] z_zvol D 0 902 2 0x00000000
[Sat Mar 31 12:12:07 2018] Call Trace:
[Sat Mar 31 12:12:07 2018] __schedule+0x3e0/0x870
[Sat Mar 31 12:12:07 2018] schedule+0x36/0x80
[Sat Mar 31 12:12:07 2018] taskq_wait_outstanding+0x8c/0xd0 [spl]
[Sat Mar 31 12:12:07 2018] ? wait_woken+0x80/0x80
[Sat Mar 31 12:12:07 2018] zvol_task_cb+0x1f9/0x5b0 [zfs]
[Sat Mar 31 12:12:07 2018] ? __schedule+0x3e8/0x870
[Sat Mar 31 12:12:07 2018] taskq_thread+0x2ae/0x4d0 [spl]
[Sat Mar 31 12:12:07 2018] ? wake_up_q+0x80/0x80
[Sat Mar 31 12:12:07 2018] kthread+0x10c/0x140
[Sat Mar 31 12:12:07 2018] ? taskq_thread_should_stop+0x70/0x70 [spl]
[Sat Mar 31 12:12:07 2018] ? kthread_create_on_node+0x70/0x70
[Sat Mar 31 12:12:07 2018] ret_from_fork+0x35/0x40
[Sat Mar 31 12:14:08 2018] INFO: task z_zvol:902 blocked for more than 120 seconds.
[Sat Mar 31 12:14:08 2018] Tainted: P O 4.13.13-6-pve #1
[Sat Mar 31 12:14:08 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Mar 31 12:14:08 2018] z_zvol D 0 902 2 0x00000000
[Sat Mar 31 12:14:08 2018] Call Trace:
[Sat Mar 31 12:14:08 2018] __schedule+0x3e0/0x870
[Sat Mar 31 12:14:08 2018] schedule+0x36/0x80
[Sat Mar 31 12:14:08 2018] taskq_wait_outstanding+0x8c/0xd0 [spl]
[Sat Mar 31 12:14:08 2018] ? wait_woken+0x80/0x80
[Sat Mar 31 12:14:08 2018] zvol_task_cb+0x1f9/0x5b0 [zfs]
[Sat Mar 31 12:14:08 2018] ? __schedule+0x3e8/0x870
[Sat Mar 31 12:14:08 2018] taskq_thread+0x2ae/0x4d0 [spl]
[Sat Mar 31 12:14:08 2018] ? wake_up_q+0x80/0x80
[Sat Mar 31 12:14:08 2018] kthread+0x10c/0x140
[Sat Mar 31 12:14:08 2018] ? taskq_thread_should_stop+0x70/0x70 [spl]
[Sat Mar 31 12:14:08 2018] ? kthread_create_on_node+0x70/0x70
[Sat Mar 31 12:14:08 2018] ret_from_fork+0x35/0x40
[Sat Mar 31 12:16:09 2018] INFO: task z_zvol:902 blocked for more than 120 seconds.
[Sat Mar 31 12:16:09 2018] Tainted: P O 4.13.13-6-pve #1
[Sat Mar 31 12:16:09 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Mar 31 12:16:09 2018] z_zvol D 0 902 2 0x00000000
[Sat Mar 31 12:16:09 2018] Call Trace:
[Sat Mar 31 12:16:09 2018] __schedule+0x3e0/0x870
[Sat Mar 31 12:16:09 2018] schedule+0x36/0x80
[Sat Mar 31 12:16:09 2018] taskq_wait_outstanding+0x8c/0xd0 [spl]
[Sat Mar 31 12:16:09 2018] ? wait_woken+0x80/0x80
[Sat Mar 31 12:16:09 2018] zvol_task_cb+0x1f9/0x5b0 [zfs]
[Sat Mar 31 12:16:09 2018] ? __schedule+0x3e8/0x870
[Sat Mar 31 12:16:09 2018] taskq_thread+0x2ae/0x4d0 [spl]
[Sat Mar 31 12:16:09 2018] ? wake_up_q+0x80/0x80
[Sat Mar 31 12:16:09 2018] kthread+0x10c/0x140
[Sat Mar 31 12:16:09 2018] ? taskq_thread_should_stop+0x70/0x70 [spl]
[Sat Mar 31 12:16:09 2018] ? kthread_create_on_node+0x70/0x70
[Sat Mar 31 12:16:09 2018] ret_from_fork+0x35/0x40
[Sat Mar 31 12:18:09 2018] INFO: task z_zvol:902 blocked for more than 120 seconds.
[Sat Mar 31 12:18:09 2018] Tainted: P O 4.13.13-6-pve #1
[Sat Mar 31 12:18:09 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Mar 31 12:18:09 2018] z_zvol D 0 902 2 0x00000000
[Sat Mar 31 12:18:09 2018] Call Trace:
[Sat Mar 31 12:18:09 2018] __schedule+0x3e0/0x870
[Sat Mar 31 12:18:09 2018] schedule+0x36/0x80
[Sat Mar 31 12:18:09 2018] taskq_wait_outstanding+0x8c/0xd0 [spl]
[Sat Mar 31 12:18:09 2018] ? wait_woken+0x80/0x80
[Sat Mar 31 12:18:09 2018] zvol_task_cb+0x1f9/0x5b0 [zfs]
[Sat Mar 31 12:18:09 2018] ? __schedule+0x3e8/0x870
[Sat Mar 31 12:18:09 2018] taskq_thread+0x2ae/0x4d0 [spl]
[Sat Mar 31 12:18:09 2018] ? wake_up_q+0x80/0x80
[Sat Mar 31 12:18:09 2018] kthread+0x10c/0x140
[Sat Mar 31 12:18:09 2018] ? taskq_thread_should_stop+0x70/0x70 [spl]
[Sat Mar 31 12:18:09 2018] ? kthread_create_on_node+0x70/0x70
[Sat Mar 31 12:18:09 2018] ret_from_fork+0x35/0x40
[Sat Mar 31 12:20:10 2018] INFO: task z_zvol:902 blocked for more than 120 seconds.
[Sat Mar 31 12:20:10 2018] Tainted: P O 4.13.13-6-pve #1
[Sat Mar 31 12:20:10 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Mar 31 12:20:10 2018] z_zvol D 0 902 2 0x00000000
[Sat Mar 31 12:20:10 2018] Call Trace:
[Sat Mar 31 12:20:10 2018] __schedule+0x3e0/0x870
[Sat Mar 31 12:20:10 2018] schedule+0x36/0x80
[Sat Mar 31 12:20:10 2018] taskq_wait_outstanding+0x8c/0xd0 [spl]
[Sat Mar 31 12:20:10 2018] ? wait_woken+0x80/0x80
[Sat Mar 31 12:20:10 2018] zvol_task_cb+0x1f9/0x5b0 [zfs]
[Sat Mar 31 12:20:10 2018] ? __schedule+0x3e8/0x870
[Sat Mar 31 12:20:10 2018] taskq_thread+0x2ae/0x4d0 [spl]
[Sat Mar 31 12:20:10 2018] ? wake_up_q+0x80/0x80
[Sat Mar 31 12:20:10 2018] kthread+0x10c/0x140
[Sat Mar 31 12:20:10 2018] ? taskq_thread_should_stop+0x70/0x70 [spl]
[Sat Mar 31 12:20:10 2018] ? kthread_create_on_node+0x70/0x70
[Sat Mar 31 12:20:10 2018] ret_from_fork+0x35/0x40
[Sat Mar 31 12:22:11 2018] INFO: task z_zvol:902 blocked for more than 120 seconds.
[Sat Mar 31 12:22:11 2018] Tainted: P O 4.13.13-6-pve #1
[Sat Mar 31 12:22:11 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Mar 31 12:22:11 2018] z_zvol D 0 902 2 0x00000000
[Sat Mar 31 12:22:11 2018] Call Trace:
[Sat Mar 31 12:22:11 2018] __schedule+0x3e0/0x870
[Sat Mar 31 12:22:11 2018] schedule+0x36/0x80
[Sat Mar 31 12:22:11 2018] taskq_wait_outstanding+0x8c/0xd0 [spl]
[Sat Mar 31 12:22:11 2018] ? wait_woken+0x80/0x80
[Sat Mar 31 12:22:11 2018] zvol_task_cb+0x1f9/0x5b0 [zfs]
[Sat Mar 31 12:22:11 2018] ? __schedule+0x3e8/0x870
[Sat Mar 31 12:22:11 2018] taskq_thread+0x2ae/0x4d0 [spl]
[Sat Mar 31 12:22:11 2018] ? wake_up_q+0x80/0x80
[Sat Mar 31 12:22:11 2018] kthread+0x10c/0x140
[Sat Mar 31 12:22:11 2018] ? taskq_thread_should_stop+0x70/0x70 [spl]
[Sat Mar 31 12:22:11 2018] ? kthread_create_on_node+0x70/0x70
[Sat Mar 31 12:22:11 2018] ret_from_fork+0x35/0x40
[Sat Mar 31 12:24:12 2018] INFO: task z_zvol:902 blocked for more than 120 seconds.
[Sat Mar 31 12:24:12 2018] Tainted: P O 4.13.13-6-pve #1
[Sat Mar 31 12:24:12 2018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Sat Mar 31 12:24:12 2018] z_zvol D 0 902 2 0x00000000
[Sat Mar 31 12:24:12 2018] Call Trace:
[Sat Mar 31 12:24:12 2018] __schedule+0x3e0/0x870
[Sat Mar 31 12:24:12 2018] schedule+0x36/0x80
[Sat Mar 31 12:24:12 2018] taskq_wait_outstanding+0x8c/0xd0 [spl]
[Sat Mar 31 12:24:12 2018] ? wait_woken+0x80/0x80
[Sat Mar 31 12:24:12 2018] zvol_task_cb+0x1f9/0x5b0 [zfs]
[Sat Mar 31 12:24:12 2018] ? __schedule+0x3e8/0x870
[Sat Mar 31 12:24:12 2018] taskq_thread+0x2ae/0x4d0 [spl]
[Sat Mar 31 12:24:12 2018] ? wake_up_q+0x80/0x80
[Sat Mar 31 12:24:12 2018] kthread+0x10c/0x140
[Sat Mar 31 12:24:12 2018] ? taskq_thread_should_stop+0x70/0x70 [spl]
[Sat Mar 31 12:24:12 2018] ? kthread_create_on_node+0x70/0x70
[Sat Mar 31 12:24:12 2018] ret_from_fork+0x35/0x40

it seems that the server (storage) is busy and not answar...
somebody know what happen?
How can I study in deep the problem?
 
Hi fabian,
thanks for you answer.
about: github.com/zfsonlinux/zfs/issues/6330
it can be my case!

studied in deep the problem: the messages appares in the log (dmesg) 5 or 10 minutes later the write intensive operation. (move disk from the nfs storage to the server).
In my case, the server is not completly blocked.
but the bug ware closed a few days ago...

below my "as is" situation:

pveversion -v
proxmox-ve: 5.1-40 (running kernel: 4.13.13-6-pve)
pve-manager: 5.1-46 (running version: 5.1-46/ae8241d4)
pve-kernel-4.13.13-6-pve: 4.13.13-40
pve-kernel-4.13.4-1-pve: 4.13.4-26
corosync: 2.4.2-pve3
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-common-perl: 5.0-27
libpve-guest-common-perl: 2.0-14
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-17
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-2
novnc-pve: 0.6-4
proxmox-widget-toolkit: 1.0-11
pve-cluster: 5.0-20
pve-container: 2.0-19
pve-docs: 5.1-16
pve-firewall: 3.0-5
pve-firmware: 2.0-3
pve-ha-manager: 2.0-5
pve-i18n: 1.0-4
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.9.1-8
pve-xtermjs: 1.0-2
qemu-server: 5.0-21
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.4-pve2~bpo9

do you know if the new packages of proxmox can fix the problem?
do you know if I had better to update my system to fix the problem?

thanks!
 
the fix for that specific issue was slated for 0.7.8 (but in the meantime there has been a regression fix released as 0.7.8, so it will probably be part of 0.7.9).
 
This looks suspiciously like the issue I'm running into on 4.4. Will 0.7.9 be coming in an update to 4.4?