pvestatd not reaping properly ? process table full -- system slowdown

tarball

Member
Nov 1, 2011
24
1
23
Hi it looks like there's a potential issue with pvestatd ? On one of our systems we noticed that the process table was full (62K+ processes being defunct pvestatd).
The systems slows down to a crawl at that point.
After a stop/start of the daemon; everything's fine again.

Running pve-manager/3.4-6/102d4547 (running kernel: 2.6.32-39-pve)


root 977213 0.5 0.2 211732 38648 ? Ss 15:54 0:00 pvestatd
root 977239 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977241 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977270 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977272 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977304 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977306 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977324 0.0 0.0 4052 516 ? S 15:54 0:00 sleep 60
root 977340 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977342 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977372 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977374 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977402 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977404 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977433 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977435 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977462 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977464 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977474 0.0 0.0 16792 1284 pts/0 R+ 15:55 0:00 ps auxwww


├─pvestatd,978703
│ ├─(pvestatd,978740)
│ ├─(pvestatd,978742)
│ ├─(pvestatd,978769)
│ ├─(pvestatd,978771)
│ ├─(pvestatd,978799)
│ ├─(pvestatd,978801)
│ ├─(pvestatd,978835)
│ ├─(pvestatd,978837)
│ ├─(pvestatd,978870)
....
 
Last edited:
What kind of storage do you use? Please check if all storage are online and accessible with:

# pvesm status
 
Hi Dietmar,

pvesm status
zfs error: open3: exec of zpool list -o name -H failed at /usr/share/perl5/PVE/Tools.pm line 328
zfs error: open3: exec of zpool list -o name -H failed at /usr/share/perl5/PVE/Tools.pm line 328
local dir 1 1031992064 272851220 706712044 28.35%
ovz3-bk nfs 1 7546520832 4298153984 3248366848 57.46%
zfs zfspool 0 0 0 0 100.00%
zfs1 zfspool 0 0 0 0 100.00%

root@ovz3:~# lsmod|grep zfs
root@ovz3:~#


oot@ovz3:~# zpool list -o name -H
-bash: zpool: command not found
root@ovz3:~# which zpool
root@ovz3:~# whereis zpool
zpool:


We use regular software raid
root@ovz3:~# cat /proc/mdstat
Personalities : [raid10] [raid1]
md2 : active raid1 sde2[2] sdc2[3]
9715584 blocks super 1.2 [2/2] [UU]


md1 : active raid1 sde1[2] sdc1[3]
224574272 blocks super 1.2 [2/2] [UU]


md0 : active raid10 sda1[0] sdf1[3] sdd1[2] sdb1[1]
1953258496 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]


root@ovz3:~# pveversion -v
proxmox-ve-2.6.32: 3.4-156 (running kernel: 2.6.32-39-pve)
pve-manager: 3.4-6 (running version: 3.4-6/102d4547)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-39-pve: 2.6.32-156
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-17
qemu-server: 3.4-6
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-33
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-10
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
 
I think a ZFS pool been defined at the (non-HA) cluster level but this specific host (ovz3) does not have a ZFS pool; doesn't even have the various ZFS tools I think.

I just checked and a few older PVEs that were upgraded to the latest or at least the ZFS-enabled PVE are exhibiting the same behavior.
 
Last edited:
I think a ZFS pool been defined at the (non-HA) cluster level but this specific host (ovz3) does not have a ZFS pool; doesn't even have the various ZFS tools I think.

You should define the nodes if a storage is available (if not available on all nodes).
 
Ok I see the problem will fix it.