pvestatd causing unresponsive status in web GUI

Coolguy3289 · Jun 29, 2020

Hello,
I've just added a 4th node to my existing proxmox cluster, and I'm running into an intermittent issue with the status of the node and VMs on that node turning to the gray question mark, storage stats not loading. When I run `systemctl status pvestatd` the service shows as running. When I try to restart the service it hangs on the following:

Code:

Jun 29 17:37:19 vm4 pvestatd[33691]: received signal TERM
Jun 29 17:37:19 vm4 pvestatd[33691]: server closing
Jun 29 17:37:19 vm4 pvestatd[33691]: server stopped
Jun 29 17:38:50 vm4 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Jun 29 17:38:50 vm4 systemd[1]: pvestatd.service: Killing process 6339 (lvs) with signal SIGKILL.
Jun 29 17:38:50 vm4 systemd[1]: pvestatd.service: Killing process 33803 (vgs) with signal SIGKILL.
Jun 29 17:40:20 vm4 systemd[1]: pvestatd.service: Processes still around after SIGKILL. Ignoring.

These processes not killing seems to hang the restart process with pvestatd.

System info:

Code:

root@vm4:~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.44-1-pve)
pve-manager: 6.2-6 (running version: 6.2-6/ee1d7754)
pve-kernel-5.4: 6.2-3
pve-kernel-helper: 6.2-3
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve2
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-3
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-8
pve-cluster: 6.1-8
pve-container: 3.1-8
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1

willybong · Jun 30, 2020

if you want a good ceph video tutorial have a look here https://www.youtube.com/channel/UC0IeRJNbVkui8A8f48Gmceg

Coolguy3289 · Jun 30, 2020

I've considered ceph, and def appreciate the video since it will help me when I decide to switch. At the moment I don't have the time to do so. Do you think that has something to do with this issue?

Stik · Sep 6, 2021

Hello, any news about this problem?

I've read several threads about gray question mark and pvestatd but haven't found a solution.
I've experienced these issues on several nodes (not new, btw) and I've found these problems about month or two ago - it wasn't the case before.

Same symptoms - gray question mark, active pvestatd but restart hangs and pvestatd status hangs too.

I've tried to upgrade our pve to the latest version (before 7.0) but nothing changed:

Bash:

root@vu202adm:~# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.128-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-5
pve-kernel-helper: 6.4-5
pve-kernel-5.4.128-1-pve: 5.4.128-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1

Bash:

сен 06 16:14:15 vu202adm systemd[1]: pvestatd.service: Found left-over process 3649 (pvestatd) in control group while starting unit. Ignoring.
сен 06 16:14:15 vu202adm systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
сен 06 16:14:15 vu202adm systemd[1]: pvestatd.service: Found left-over process 4619 (pvestatd) in control group while starting unit. Ignoring.
сен 06 16:14:15 vu202adm systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
сен 06 16:14:15 vu202adm systemd[1]: Starting PVE Status Daemon...
-- Subject: A start job for unit pvestatd.service has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pvestatd.service has begun execution.
--
-- The job identifier is 671108.
сен 06 16:14:43 vu202adm fusioninventory-agent[5966]: forking process 0 to handle task Maintenance
сен 06 16:15:45 vu202adm systemd[1]: pvestatd.service: Start operation timed out. Terminating.
сен 06 16:15:49 vu202adm pmxcfs[2074]: [status] notice: received log
сен 06 16:16:35 vu202adm fusioninventory-agent[6181]: forking process 0 to handle task Maintenance
сен 06 16:17:01 vu202adm CRON[6241]: pam_unix(cron:session): session opened for user root by (uid=0)
сен 06 16:17:01 vu202adm CRON[6242]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
сен 06 16:17:01 vu202adm CRON[6241]: pam_unix(cron:session): session closed for user root
сен 06 16:17:15 vu202adm systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
сен 06 16:17:15 vu202adm systemd[1]: pvestatd.service: Killing process 5908 (pvestatd) with signal SIGKILL.
сен 06 16:17:15 vu202adm systemd[1]: pvestatd.service: Killing process 3649 (pvestatd) with signal SIGKILL.
сен 06 16:17:15 vu202adm systemd[1]: pvestatd.service: Killing process 4619 (pvestatd) with signal SIGKILL.
сен 06 16:18:32 vu202adm fusioninventory-agent[6421]: forking process 0 to handle task Maintenance
сен 06 16:18:46 vu202adm systemd[1]: pvestatd.service: Processes still around after SIGKILL. Ignoring.
сен 06 16:20:12 vu202adm fusioninventory-agent[6615]: forking process 0 to handle task Maintenance
сен 06 16:20:16 vu202adm systemd[1]: pvestatd.service: State 'stop-final-sigterm' timed out. Killing.
сен 06 16:20:16 vu202adm systemd[1]: pvestatd.service: Killing process 3649 (pvestatd) with signal SIGKILL.
сен 06 16:20:16 vu202adm systemd[1]: pvestatd.service: Killing process 4619 (pvestatd) with signal SIGKILL.
сен 06 16:20:16 vu202adm systemd[1]: pvestatd.service: Killing process 5908 (pvestatd) with signal SIGKILL.
сен 06 16:20:27 vu202adm pmxcfs[2074]: [status] notice: received log
сен 06 16:21:46 vu202adm systemd[1]: pvestatd.service: Processes still around after final SIGKILL. Entering failed mode.
сен 06 16:21:46 vu202adm systemd[1]: pvestatd.service: Failed with result 'timeout'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit pvestatd.service has entered the 'failed' state with result 'timeout'.
сен 06 16:21:46 vu202adm systemd[1]: Failed to start PVE Status Daemon.
-- Subject: A start job for unit pvestatd.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pvestatd.service has finished with a failure.
--
-- The job identifier is 671108 and the job result is failed.

Only way to restart service and fix gray mark - hard reset for node. Even soft reset don't work because node cannot stop VM's.
Can anyone help?

pvestatd causing unresponsive status in web GUI

Coolguy3289

Active Member

willybong

Well-Known Member

Coolguy3289

Active Member

Stik

Member

We value your privacy