pvestatd[2973]: got timeout errors

John Allison

Well-Known Member
Feb 1, 2018
33
4
48
Gateshead UK
www.adlinktech.com
I have 3 beefy nodes (each with 256GB RAM, SSD's, 64 VCPUS) in a cluster, and recently reconfigured the nic to support link aggregation. Everything works and testing using iperf and copying various vm's from node to node is nice n speedy.

However ive now started getting errors on all nodes along the lines of:
"pvestatd[2973]: got timeout errors"
Cant see any other errors, just these timeout errors.

I guess something needs adjusting but what and where?
 
I assume that there are some other log entries before that one, can you please post them as well?
Did you already restart pvestatd?

The pvestatd queries VMs, containers and storages, my first guess would be that one of the storages isn't responding in time and so the deamon runs into a timeout.
 
Looking through the logs a bit closer (always a good idea!), i can see that pvestatd is at times unable to activate storage saying it doesnt exist or is unreachable (see below). But this isnt always the case, sometimes the timeouts appear around some pmxfs activity and the systemd starting replication runner messages. But its all a bit random.
The storage ISOImages & ProxmoxBackups are both hosted on a NAS box, connected via NFS so i guess the problem lies there?


Started Proxmox VE replication runner.
Jan 18 09:30:01 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:30:09 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:30:09 pvhost3 pvestatd[2335]: unable to activate storage 'ISOImages' - directory '/mnt/pve/ISOImages' does not exist or is unreachable
Jan 18 09:30:29 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:30:50 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:00 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:00 pvhost3 systemd[1]: Starting Proxmox VE replication runner...
Jan 18 09:31:01 pvhost3 systemd[1]: Started Proxmox VE replication runner.
Jan 18 09:31:02 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:09 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:40 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:42 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:49 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:51 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:59 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:59 pvhost3 pvestatd[2335]: unable to activate storage 'ProxmoxBackups' - directory '/mnt/pve/ProxmoxBackups' does not exist or is unreachable
Jan 18 09:32:00 pvhost3 systemd[1]: Starting Proxmox VE replication runner...
Jan 18 09:32:01 pvhost3 systemd[1]: Started Proxmox VE replication runner.
Jan 18 09:32:01 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:32:09 pvhost3 pvestatd[2335]: got timeout


18 11:29:53 pvhost3 pvestatd[2335]: got timeout
Jan 18 11:29:55 pvhost3 pmxcfs[2110]: [status] notice: received log
Jan 18 11:29:56 pvhost3 pvestatd[2335]: got timeout
Jan 18 11:30:00 pvhost3 systemd[1]: Starting Proxmox VE replication runner...
Jan 18 11:30:01 pvhost3 systemd[1]: Started Proxmox VE replication runner.
Jan 18 11:30:03 pvhost3 pvestatd[2335]: got timeout
Jan 18 11:30:05 pvhost3 pvestatd[2335]: got timeout
 
Yep, you have to make sure that your nfs shares are reachable.
Check your storage, network configuration and check your NAS.

You can check your exports with:
Code:
# showmount -e <ip-address>
 
Hello tim,

i have the same problem but i haven't any NFS shares or something.
The "pvestatd got timeout" message only appears on one node - other nodes are not affected.

Log:
Code:
Dec 25 01:10:08 pve-n2 pvestatd[14119]: got timeout
Dec 25 01:10:08 pve-n2 pvestatd[14119]: status update time (5.074 seconds)
Dec 25 01:10:38 pve-n2 pvestatd[14119]: got timeout
Dec 25 01:10:38 pve-n2 pvestatd[14119]: status update time (5.068 seconds)
Dec 25 01:11:00 pve-n2 systemd[1]: Starting Proxmox VE replication runner...
Dec 25 01:11:01 pve-n2 systemd[1]: pvesr.service: Succeeded.
Dec 25 01:11:01 pve-n2 systemd[1]: Started Proxmox VE replication runner.
Dec 25 01:11:28 pve-n2 pvestatd[14119]: got timeout
Dec 25 01:11:28 pve-n2 pvestatd[14119]: status update time (5.068 seconds)
Dec 25 01:12:00 pve-n2 systemd[1]: Starting Proxmox VE replication runner...
Dec 25 01:12:01 pve-n2 systemd[1]: pvesr.service: Succeeded.
Dec 25 01:12:01 pve-n2 systemd[1]: Started Proxmox VE replication runner.
Dec 25 01:12:08 pve-n2 pvestatd[14119]: got timeout
Dec 25 01:12:08 pve-n2 pvestatd[14119]: status update time (5.060 seconds)
Dec 25 01:12:49 pve-n2 pmxcfs[2232]: [status] notice: received log
Dec 25 01:12:58 pve-n2 pvestatd[14119]: got timeout
Dec 25 01:12:58 pve-n2 pvestatd[14119]: status update time (5.059 seconds)
Dec 25 01:13:00 pve-n2 systemd[1]: Starting Proxmox VE replication runner...
Dec 25 01:13:01 pve-n2 systemd[1]: pvesr.service: Succeeded.
Dec 25 01:13:01 pve-n2 systemd[1]: Started Proxmox VE replication runner.
Dec 25 01:13:07 pve-n2 pvedaemon[2551]: <root@pam> successful auth for user 'root@pam'
Dec 25 01:13:28 pve-n2 pvestatd[14119]: got timeout
Dec 25 01:13:28 pve-n2 pvestatd[14119]: status update time (5.058 seconds)

do you have any idea for me?
 
Okay, i have a external rbd storage that is causing the problems.
BUT:
on pve-n1 (node 1) i have the "old" ceph version running:
Code:
ceph-common/stable,now 12.2.11+dfsg1-2.1+b1 amd64 [installed]
ceph-fuse/stable,now 12.2.11+dfsg1-2.1+b1 amd64 [installed]
libcephfs2/stable,now 12.2.11+dfsg1-2.1+b1 amd64 [installed]
and have NO errors.

on pve-n2 (node 2) have the pve-ceph version running:
Code:
ceph-base/now 14.2.5-pve1 amd64 [installed,local]
ceph-common/now 14.2.5-pve1 amd64 [installed,local]
ceph-fuse/now 14.2.5-pve1 amd64 [installed,local]
ceph-mds/now 14.2.5-pve1 amd64 [installed,local]
ceph-mgr/now 14.2.5-pve1 amd64 [installed,local]
ceph-mon/now 14.2.5-pve1 amd64 [installed,local]
ceph-osd/now 14.2.5-pve1 amd64 [installed,local]
libcephfs2/now 14.2.5-pve1 amd64 [installed,local]
python-ceph-argparse/now 14.2.5-pve1 all [installed,local]
python-cephfs/now 14.2.5-pve1 amd64 [installed,local]
and have errors.


Is there a way to switch back to the old ceph version?
 
please post the output of "pveversion -v" from both hosts
 
pve-n1: (the good one)
Code:
root@pve-n1:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.13-1-pve)
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
pve-kernel-5.3: 6.1-1
pve-kernel-helper: 6.1-1
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.3.10-1-pve: 5.3.10-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-4-pve: 5.0.21-9
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.21-2-pve: 5.0.21-7
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 1.2.8-1+pve4
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12+deb10u1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.1-2
pve-container: 3.0-15
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-4
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2

pve-n2 (the bad one):
Code:
root@pve-n2:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.13-1-pve)
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
pve-kernel-5.3: 6.1-1
pve-kernel-helper: 6.1-1
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.3.10-1-pve: 5.3.10-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-4-pve: 5.0.21-9
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.21-2-pve: 5.0.21-7
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 14.2.5-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 1.2.8-1+pve4
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.1-2
pve-container: 3.0-15
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-4
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!