pvestatd task hung for > 120 secs since latest (v6.4-13, Aug 2021) updates

ssenator · Sep 4, 2021

Since applying the most recent debian and proxmox updates, from the official debian (deb http://ftp.us.debian.org/debian) and proxmox sources (deb https://enterprise.proxmox.com/debian/pve) have been seeing hung pvestatd task hangs. When this occurs, the immediate set of symptoms can be very long delays or timeouts trying to login to the proxmox web interface, or if ssh to the proxmox host, many commands hang indefinitely. This may be hung access to the pve file system.

I have seen this on two different proxmox nodes. It appears to track with the movement and startup of a VM running FreeBSD 13 as a guest node.

I am writing to:
1) see if anyone else is experiencing this
2) if anyone else is experiencing this, see if there are any known workarounds or configuration to mitigate this
3) provide access or information to one of these nodes in distress, so that this can be diagnosed.

The stack trace is as follows (from /var/log/kern.log):
Aug 30 10:06:12 n2 kernel: [2246610.864709] INFO: task pvestatd:2284 blocked for more than 120 seconds.
Aug 30 10:06:12 n2 kernel: [2246610.865743] Tainted: P IO 5.4.124-1-pve #1
Aug 30 10:06:12 n2 kernel: [2246610.866652] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 30 10:06:12 n2 kernel: [2246610.867561] pvestatd D 0 2284 1 0x00000004
Aug 30 10:06:12 n2 kernel: [2246610.868469] Call Trace:
Aug 30 10:06:12 n2 kernel: [2246610.869433] __schedule+0x2e6/0x6f0
Aug 30 10:06:12 n2 kernel: [2246610.870316] ? filename_parentat.isra.55.part.56+0xf7/0x180
Aug 30 10:06:12 n2 kernel: [2246610.871202] schedule+0x33/0xa0
Aug 30 10:06:12 n2 kernel: [2246610.872072] rwsem_down_write_slowpath+0x2ed/0x4a0
Aug 30 10:06:12 n2 kernel: [2246610.873025] ? enqueue_hrtimer+0x3c/0x90
Aug 30 10:06:12 n2 kernel: [2246610.873890] down_write+0x3d/0x40
Aug 30 10:06:12 n2 kernel: [2246610.874748] filename_create+0x8e/0x180
Aug 30 10:06:12 n2 kernel: [2246610.875600] do_mkdirat+0x59/0x110
Aug 30 10:06:12 n2 kernel: [2246610.876441] __x64_sys_mkdir+0x1b/0x20
Aug 30 10:06:12 n2 kernel: [2246610.877352] do_syscall_64+0x57/0x190
Aug 30 10:06:12 n2 kernel: [2246610.878179] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug 30 10:06:12 n2 kernel: [2246610.879004] RIP: 0033:0x7f8c82e0c0d7
Aug 30 10:06:12 n2 kernel: [2246610.879820] Code: Bad RIP value.

node n2:
% uname -a
Linux n2 5.4.128-1-pve #1 SMP PVE 5.4.128-1 (Wed, 21 Jul 2021 18:32:02 +0200) x86_64 GNU/Linux

% apt list --installed | grep pve-
libpve-access-control/stable,now 6.4-3 all [installed]
libpve-apiclient-perl/stable,now 3.1-3 all [installed]
libpve-cluster-api-perl/stable,now 6.4-1 all [installed]
libpve-cluster-perl/stable,now 6.4-1 all [installed]
libpve-common-perl/stable,now 6.4-3 all [installed]
libpve-guest-common-perl/stable,now 3.1-5 all [installed]
libpve-http-server-perl/stable,now 3.2-3 all [installed]
libpve-storage-perl/stable,now 6.4-1 all [installed]
libpve-u2f-server-perl/stable,now 1.1-1 amd64 [installed]
pve-cluster/stable,now 6.4-1 amd64 [installed]
pve-container/stable,now 3.3-6 all [installed]
pve-docs/stable,now 6.4-2 all [installed]
pve-edk2-firmware/stable,now 2.20200531-1 all [installed]
pve-firewall/stable,now 4.1-4 amd64 [installed]
pve-firmware/stable,now 3.2-4 all [installed]
pve-ha-manager/stable,now 3.1-1 amd64 [installed]
pve-i18n/stable,now 2.3-1 all [installed]
pve-kernel-5.4.124-1-pve/stable,now 5.4.124-2 amd64 [installed,automatic]
pve-kernel-5.4.128-1-pve/stable,now 5.4.128-1 amd64 [installed,automatic]
pve-kernel-5.4.34-1-pve/stable,now 5.4.34-2 amd64 [installed]
pve-kernel-5.4/stable,now 6.4-5 all [installed]
pve-kernel-helper/stable,now 6.4-5 all [installed]
pve-lxc-syscalld/stable,now 0.9.1-1 amd64 [installed]
pve-manager/stable,now 6.4-13 amd64 [installed]
pve-qemu-kvm/stable,now 5.2.0-6 amd64 [installed]
pve-xtermjs/stable,now 4.7.0-3 amd64 [installed]

Thank you,
-Steven Senator

Stoiko Ivanov · Sep 6, 2021

hm - how is the system setup? which filesystem is on root?
if possible - could you try installing the pve-kernel-5.11 meta-package and see if the issue also occurs with the 5.11 kernel?

ssenator · Sep 6, 2021

The root fs is zfs, using a striped mirror, as below. I will move some resources off of this machine and can then run the experiment as you suggest.

---
% zpool status rpool
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 00:13:07 with 0 errors on Sun Aug 8 00:37:08 2021
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
scsi-35000c5002908447f-part3 ONLINE 0 0 0
scsi-35000c50029084adf-part3 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
scsi-35000c50028b9f823-part3 ONLINE 0 0 0
scsi-35000c50028ee7f93-part3 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
ata-OCZ-REVODRIVE3_OCZ-M18DGOS072K9663R-part8 ONLINE 0 0 0
ata-OCZ-REVODRIVE3_OCZ-TAIN9M3G24S566VS-part8 ONLINE 0 0 0
cache
sdb16 ONLINE 0 0 0
sdc16 ONLINE 0 0 0
---

ssenator · Sep 12, 2021

After applying all of the latest packages from debian and proxmox, this problem does not re-occur. Thank you.

Search

Search

pvestatd task hung for > 120 secs since latest (v6.4-13, Aug 2021) updates

ssenator

New Member

Stoiko Ivanov

Proxmox Staff Member

ssenator

New Member

ssenator

New Member

We value your privacy