pvestatd causing unresponsive status in web GUI

Mar 10, 2017
31
1
28
Ohio
primeserversinc.com
Hello,
I've just added a 4th node to my existing proxmox cluster, and I'm running into an intermittent issue with the status of the node and VMs on that node turning to the gray question mark, storage stats not loading. When I run `systemctl status pvestatd` the service shows as running. When I try to restart the service it hangs on the following:
Code:
Jun 29 17:37:19 vm4 pvestatd[33691]: received signal TERM
Jun 29 17:37:19 vm4 pvestatd[33691]: server closing
Jun 29 17:37:19 vm4 pvestatd[33691]: server stopped
Jun 29 17:38:50 vm4 systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
Jun 29 17:38:50 vm4 systemd[1]: pvestatd.service: Killing process 6339 (lvs) with signal SIGKILL.
Jun 29 17:38:50 vm4 systemd[1]: pvestatd.service: Killing process 33803 (vgs) with signal SIGKILL.
Jun 29 17:40:20 vm4 systemd[1]: pvestatd.service: Processes still around after SIGKILL. Ignoring.
These processes not killing seems to hang the restart process with pvestatd.

System info:
Code:
root@vm4:~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.44-1-pve)
pve-manager: 6.2-6 (running version: 6.2-6/ee1d7754)
pve-kernel-5.4: 6.2-3
pve-kernel-helper: 6.2-3
pve-kernel-5.0: 6.0-11
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve2
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-3
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-8
pve-cluster: 6.1-8
pve-container: 3.1-8
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-4
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-3
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1
 
Hello, any news about this problem?

I've read several threads about gray question mark and pvestatd but haven't found a solution.
I've experienced these issues on several nodes (not new, btw) and I've found these problems about month or two ago - it wasn't the case before.

Same symptoms - gray question mark, active pvestatd but restart hangs and pvestatd status hangs too.

I've tried to upgrade our pve to the latest version (before 7.0) but nothing changed:
Bash:
root@vu202adm:~# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.128-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-5
pve-kernel-helper: 6.4-5
pve-kernel-5.4.128-1-pve: 5.4.128-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.1.0-1
libpve-access-control: 6.4-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-3
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.13-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.6-1
pve-cluster: 6.4-1
pve-container: 3.3-6
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-4
pve-firmware: 3.2-4
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1

Bash:
сен 06 16:14:15 vu202adm systemd[1]: pvestatd.service: Found left-over process 3649 (pvestatd) in control group while starting unit. Ignoring.
сен 06 16:14:15 vu202adm systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
сен 06 16:14:15 vu202adm systemd[1]: pvestatd.service: Found left-over process 4619 (pvestatd) in control group while starting unit. Ignoring.
сен 06 16:14:15 vu202adm systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
сен 06 16:14:15 vu202adm systemd[1]: Starting PVE Status Daemon...
-- Subject: A start job for unit pvestatd.service has begun execution
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pvestatd.service has begun execution.
--
-- The job identifier is 671108.
сен 06 16:14:43 vu202adm fusioninventory-agent[5966]: forking process 0 to handle task Maintenance
сен 06 16:15:45 vu202adm systemd[1]: pvestatd.service: Start operation timed out. Terminating.
сен 06 16:15:49 vu202adm pmxcfs[2074]: [status] notice: received log
сен 06 16:16:35 vu202adm fusioninventory-agent[6181]: forking process 0 to handle task Maintenance
сен 06 16:17:01 vu202adm CRON[6241]: pam_unix(cron:session): session opened for user root by (uid=0)
сен 06 16:17:01 vu202adm CRON[6242]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
сен 06 16:17:01 vu202adm CRON[6241]: pam_unix(cron:session): session closed for user root
сен 06 16:17:15 vu202adm systemd[1]: pvestatd.service: State 'stop-sigterm' timed out. Killing.
сен 06 16:17:15 vu202adm systemd[1]: pvestatd.service: Killing process 5908 (pvestatd) with signal SIGKILL.
сен 06 16:17:15 vu202adm systemd[1]: pvestatd.service: Killing process 3649 (pvestatd) with signal SIGKILL.
сен 06 16:17:15 vu202adm systemd[1]: pvestatd.service: Killing process 4619 (pvestatd) with signal SIGKILL.
сен 06 16:18:32 vu202adm fusioninventory-agent[6421]: forking process 0 to handle task Maintenance
сен 06 16:18:46 vu202adm systemd[1]: pvestatd.service: Processes still around after SIGKILL. Ignoring.
сен 06 16:20:12 vu202adm fusioninventory-agent[6615]: forking process 0 to handle task Maintenance
сен 06 16:20:16 vu202adm systemd[1]: pvestatd.service: State 'stop-final-sigterm' timed out. Killing.
сен 06 16:20:16 vu202adm systemd[1]: pvestatd.service: Killing process 3649 (pvestatd) with signal SIGKILL.
сен 06 16:20:16 vu202adm systemd[1]: pvestatd.service: Killing process 4619 (pvestatd) with signal SIGKILL.
сен 06 16:20:16 vu202adm systemd[1]: pvestatd.service: Killing process 5908 (pvestatd) with signal SIGKILL.
сен 06 16:20:27 vu202adm pmxcfs[2074]: [status] notice: received log
сен 06 16:21:46 vu202adm systemd[1]: pvestatd.service: Processes still around after final SIGKILL. Entering failed mode.
сен 06 16:21:46 vu202adm systemd[1]: pvestatd.service: Failed with result 'timeout'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- The unit pvestatd.service has entered the 'failed' state with result 'timeout'.
сен 06 16:21:46 vu202adm systemd[1]: Failed to start PVE Status Daemon.
-- Subject: A start job for unit pvestatd.service has failed
-- Defined-By: systemd
-- Support: https://www.debian.org/support
--
-- A start job for unit pvestatd.service has finished with a failure.
--
-- The job identifier is 671108 and the job result is failed.


Only way to restart service and fix gray mark - hard reset for node. Even soft reset don't work because node cannot stop VM's.
Can anyone help?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!