pvestatd took 506 seconds and caused node to hang

shadowq · Oct 8, 2016

Hi there,

I've been running a new node with 4.3 for a couple of weeks now and everything has been pretty great.

Just now, the server became unresponsive (even to pings) for several minutes. Looking in syslog I found a few kernel errors:

Code:

Oct 08 12:12:16 s1 kernel: INFO: task pvestatd:5982 blocked for more than 120 seconds.
Oct 08 12:12:16 s1 kernel:       Tainted: P           O    4.4.16-1-pve #1
Oct 08 12:12:16 s1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 08 12:12:16 s1 kernel: pvestatd        D ffff8809422dfc18     0  5982   4453 0x00000004
Oct 08 12:12:16 s1 kernel:  ffff8809422dfc18 00000000c1965920 ffff881ff27d6e00 ffff881fd9888000
Oct 08 12:12:16 s1 kernel:  ffff8809422e0000 ffff8809422dfc58 ffff881faf31ebc0 ffff881ccdf4e050
Oct 08 12:12:16 s1 kernel:  fffffffffffffe00 ffff8809422dfc30 ffffffff8184d835 ffff881faf31eaf0
Oct 08 12:12:16 s1 kernel: Call Trace:
Oct 08 12:12:16 s1 kernel:  [<ffffffff8184d835>] schedule+0x35/0x80
Oct 08 12:12:16 s1 kernel:  [<ffffffff8131871f>] request_wait_answer+0x12f/0x280
Oct 08 12:12:16 s1 kernel:  [<ffffffff810c3fe0>] ? wait_woken+0x90/0x90
Oct 08 12:12:16 s1 kernel:  [<ffffffff813188d9>] __fuse_request_send+0x69/0x90
Oct 08 12:12:16 s1 kernel:  [<ffffffff81318927>] fuse_request_send+0x27/0x30
Oct 08 12:12:16 s1 kernel:  [<ffffffff8131b40b>] fuse_simple_request+0xcb/0x1a0
Oct 08 12:12:16 s1 kernel:  [<ffffffff813252b2>] fuse_statfs+0xe2/0x160
Oct 08 12:12:16 s1 kernel:  [<ffffffff81242a7f>] statfs_by_dentry+0x6f/0x90
Oct 08 12:12:16 s1 kernel:  [<ffffffff81242abb>] vfs_statfs+0x1b/0xb0
Oct 08 12:12:16 s1 kernel:  [<ffffffff81242ba8>] user_statfs+0x58/0xa0
Oct 08 12:12:16 s1 kernel:  [<ffffffff81242c17>] SYSC_statfs+0x27/0x60
Oct 08 12:12:16 s1 kernel:  [<ffffffff81242dfe>] SyS_statfs+0xe/0x10
Oct 08 12:12:16 s1 kernel:  [<ffffffff81851936>] entry_SYSCALL_64_fastpath+0x16/0x75

3 of the above.

At the end, we have:

Code:

Oct 08 12:17:07 s1 pvestatd[4453]: status update time (506.590 seconds)

However, not long before, pvestatd updated in a sufficient amount of time:

Code:

Oct 08 12:02:31 s1 pvestatd[4453]: status update time (18.277 seconds)

Looking at /etc/pve/.rrd, the results from the status query seem fine. I do have 1 remote machine mounted via sshfs, however the first query (only 15 minutes before) reported fine. And that remote share has been mounted for a week or so now.

Any suggestions as to why this happened, so I can make sure it doesn't happen again!

Thank you in advance,
Jarrod.

Search

Search

pvestatd took 506 seconds and caused node to hang

shadowq

Well-Known Member