pvestatd not reaping properly ? process table full -- system slowdown

tarball · May 27, 2015

Hi it looks like there's a potential issue with pvestatd ? On one of our systems we noticed that the process table was full (62K+ processes being defunct pvestatd).
The systems slows down to a crawl at that point.
After a stop/start of the daemon; everything's fine again.

Running pve-manager/3.4-6/102d4547 (running kernel: 2.6.32-39-pve)

root 977213 0.5 0.2 211732 38648 ? Ss 15:54 0:00 pvestatd
root 977239 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977241 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977270 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977272 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977304 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977306 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977324 0.0 0.0 4052 516 ? S 15:54 0:00 sleep 60
root 977340 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977342 0.0 0.0 0 0 ? Z 15:54 0:00 [pvestatd] <defunct>
root 977372 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977374 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977402 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977404 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977433 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977435 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977462 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977464 0.0 0.0 0 0 ? Z 15:55 0:00 [pvestatd] <defunct>
root 977474 0.0 0.0 16792 1284 pts/0 R+ 15:55 0:00 ps auxwww

├─pvestatd,978703
│ ├─(pvestatd,978740)
│ ├─(pvestatd,978742)
│ ├─(pvestatd,978769)
│ ├─(pvestatd,978771)
│ ├─(pvestatd,978799)
│ ├─(pvestatd,978801)
│ ├─(pvestatd,978835)
│ ├─(pvestatd,978837)
│ ├─(pvestatd,978870)
....

dietmar · May 27, 2015

What kind of storage do you use? Please check if all storage are online and accessible with:

# pvesm status

tarball · May 27, 2015

Hi Dietmar,

pvesm status
zfs error: open3: exec of zpool list -o name -H failed at /usr/share/perl5/PVE/Tools.pm line 328
zfs error: open3: exec of zpool list -o name -H failed at /usr/share/perl5/PVE/Tools.pm line 328
local dir 1 1031992064 272851220 706712044 28.35%
ovz3-bk nfs 1 7546520832 4298153984 3248366848 57.46%
zfs zfspool 0 0 0 0 100.00%
zfs1 zfspool 0 0 0 0 100.00%

root@ovz3:~# lsmod|grep zfs
root@ovz3:~#

oot@ovz3:~# zpool list -o name -H
-bash: zpool: command not found
root@ovz3:~# which zpool
root@ovz3:~# whereis zpool
zpool:

We use regular software raid
root@ovz3:~# cat /proc/mdstat
Personalities : [raid10] [raid1]
md2 : active raid1 sde2[2] sdc2[3]
9715584 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sde1[2] sdc1[3]
224574272 blocks super 1.2 [2/2] [UU]

md0 : active raid10 sda1[0] sdf1[3] sdd1[2] sdb1[1]
1953258496 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]

root@ovz3:~# pveversion -v
proxmox-ve-2.6.32: 3.4-156 (running kernel: 2.6.32-39-pve)
pve-manager: 3.4-6 (running version: 3.4-6/102d4547)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-39-pve: 2.6.32-156
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-17
qemu-server: 3.4-6
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-33
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-10
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

dietmar · May 28, 2015

Why do you define a zfspool storage if you do not have one? Or do you what to use zfs?

tarball · May 28, 2015

I think a ZFS pool been defined at the (non-HA) cluster level but this specific host (ovz3) does not have a ZFS pool; doesn't even have the various ZFS tools I think.

I just checked and a few older PVEs that were upgraded to the latest or at least the ZFS-enabled PVE are exhibiting the same behavior.

dietmar · May 28, 2015

tarball said:
I think a ZFS pool been defined at the (non-HA) cluster level but this specific host (ovz3) does not have a ZFS pool; doesn't even have the various ZFS tools I think.

You should define the nodes if a storage is available (if not available on all nodes).

wolfgang · May 28, 2015

Ok I see the problem will fix it.

wolfgang · May 28, 2015

This is a perl IPC:Open3 bug.
See https://rt.perl.org/Public/Ticket/A...in-case-where-exception-has-been-thrown.patch
This happens if we try to call a Program what do not exists.
Workaround install to all your PVE zfs.

tarball · May 28, 2015

Thanks Dietmar & Wolfgang !

Search

Search

pvestatd not reaping properly ? process table full -- system slowdown

tarball

Member

dietmar

Proxmox Staff Member

tarball

Member

dietmar

Proxmox Staff Member

tarball

Member

dietmar

Proxmox Staff Member

wolfgang

Proxmox Retired Staff

wolfgang

Proxmox Retired Staff

tarball

Member