pvestatd took 506 seconds and caused node to hang

shadowq

Well-Known Member
Mar 12, 2013
30
1
46
Hi there,

I've been running a new node with 4.3 for a couple of weeks now and everything has been pretty great.

Just now, the server became unresponsive (even to pings) for several minutes. Looking in syslog I found a few kernel errors:

Code:
Oct 08 12:12:16 s1 kernel: INFO: task pvestatd:5982 blocked for more than 120 seconds.
Oct 08 12:12:16 s1 kernel:       Tainted: P           O    4.4.16-1-pve #1
Oct 08 12:12:16 s1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 08 12:12:16 s1 kernel: pvestatd        D ffff8809422dfc18     0  5982   4453 0x00000004
Oct 08 12:12:16 s1 kernel:  ffff8809422dfc18 00000000c1965920 ffff881ff27d6e00 ffff881fd9888000
Oct 08 12:12:16 s1 kernel:  ffff8809422e0000 ffff8809422dfc58 ffff881faf31ebc0 ffff881ccdf4e050
Oct 08 12:12:16 s1 kernel:  fffffffffffffe00 ffff8809422dfc30 ffffffff8184d835 ffff881faf31eaf0
Oct 08 12:12:16 s1 kernel: Call Trace:
Oct 08 12:12:16 s1 kernel:  [<ffffffff8184d835>] schedule+0x35/0x80
Oct 08 12:12:16 s1 kernel:  [<ffffffff8131871f>] request_wait_answer+0x12f/0x280
Oct 08 12:12:16 s1 kernel:  [<ffffffff810c3fe0>] ? wait_woken+0x90/0x90
Oct 08 12:12:16 s1 kernel:  [<ffffffff813188d9>] __fuse_request_send+0x69/0x90
Oct 08 12:12:16 s1 kernel:  [<ffffffff81318927>] fuse_request_send+0x27/0x30
Oct 08 12:12:16 s1 kernel:  [<ffffffff8131b40b>] fuse_simple_request+0xcb/0x1a0
Oct 08 12:12:16 s1 kernel:  [<ffffffff813252b2>] fuse_statfs+0xe2/0x160
Oct 08 12:12:16 s1 kernel:  [<ffffffff81242a7f>] statfs_by_dentry+0x6f/0x90
Oct 08 12:12:16 s1 kernel:  [<ffffffff81242abb>] vfs_statfs+0x1b/0xb0
Oct 08 12:12:16 s1 kernel:  [<ffffffff81242ba8>] user_statfs+0x58/0xa0
Oct 08 12:12:16 s1 kernel:  [<ffffffff81242c17>] SYSC_statfs+0x27/0x60
Oct 08 12:12:16 s1 kernel:  [<ffffffff81242dfe>] SyS_statfs+0xe/0x10
Oct 08 12:12:16 s1 kernel:  [<ffffffff81851936>] entry_SYSCALL_64_fastpath+0x16/0x75

3 of the above.

At the end, we have:

Code:
Oct 08 12:17:07 s1 pvestatd[4453]: status update time (506.590 seconds)

However, not long before, pvestatd updated in a sufficient amount of time:

Code:
Oct 08 12:02:31 s1 pvestatd[4453]: status update time (18.277 seconds)

Looking at /etc/pve/.rrd, the results from the status query seem fine. I do have 1 remote machine mounted via sshfs, however the first query (only 15 minutes before) reported fine. And that remote share has been mounted for a week or so now.

Any suggestions as to why this happened, so I can make sure it doesn't happen again!

Thank you in advance,
Jarrod.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!