pvestatd[2973]: got timeout errors

John Allison · Jan 18, 2019

I have 3 beefy nodes (each with 256GB RAM, SSD's, 64 VCPUS) in a cluster, and recently reconfigured the nic to support link aggregation. Everything works and testing using iperf and copying various vm's from node to node is nice n speedy.

However ive now started getting errors on all nodes along the lines of:
"pvestatd[2973]: got timeout errors"
Cant see any other errors, just these timeout errors.

I guess something needs adjusting but what and where?

tim · Jan 18, 2019

I assume that there are some other log entries before that one, can you please post them as well?
Did you already restart pvestatd?

The pvestatd queries VMs, containers and storages, my first guess would be that one of the storages isn't responding in time and so the deamon runs into a timeout.

John Allison · Jan 18, 2019

Looking through the logs a bit closer (always a good idea!), i can see that pvestatd is at times unable to activate storage saying it doesnt exist or is unreachable (see below). But this isnt always the case, sometimes the timeouts appear around some pmxfs activity and the systemd starting replication runner messages. But its all a bit random.
The storage ISOImages & ProxmoxBackups are both hosted on a NAS box, connected via NFS so i guess the problem lies there?

Started Proxmox VE replication runner.
Jan 18 09:30:01 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:30:09 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:30:09 pvhost3 pvestatd[2335]: unable to activate storage 'ISOImages' - directory '/mnt/pve/ISOImages' does not exist or is unreachable
Jan 18 09:30:29 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:30:50 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:00 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:00 pvhost3 systemd[1]: Starting Proxmox VE replication runner...
Jan 18 09:31:01 pvhost3 systemd[1]: Started Proxmox VE replication runner.
Jan 18 09:31:02 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:09 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:40 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:42 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:49 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:51 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:59 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:31:59 pvhost3 pvestatd[2335]: unable to activate storage 'ProxmoxBackups' - directory '/mnt/pve/ProxmoxBackups' does not exist or is unreachable
Jan 18 09:32:00 pvhost3 systemd[1]: Starting Proxmox VE replication runner...
Jan 18 09:32:01 pvhost3 systemd[1]: Started Proxmox VE replication runner.
Jan 18 09:32:01 pvhost3 pvestatd[2335]: got timeout
Jan 18 09:32:09 pvhost3 pvestatd[2335]: got timeout

18 11:29:53 pvhost3 pvestatd[2335]: got timeout
Jan 18 11:29:55 pvhost3 pmxcfs[2110]: [status] notice: received log
Jan 18 11:29:56 pvhost3 pvestatd[2335]: got timeout
Jan 18 11:30:00 pvhost3 systemd[1]: Starting Proxmox VE replication runner...
Jan 18 11:30:01 pvhost3 systemd[1]: Started Proxmox VE replication runner.
Jan 18 11:30:03 pvhost3 pvestatd[2335]: got timeout
Jan 18 11:30:05 pvhost3 pvestatd[2335]: got timeout

tim · Jan 18, 2019

Yep, you have to make sure that your nfs shares are reachable.
Check your storage, network configuration and check your NAS.

You can check your exports with:

Code:

# showmount -e <ip-address>

MarcelW · Dec 25, 2019

Hello tim,

i have the same problem but i haven't any NFS shares or something.
The "pvestatd got timeout" message only appears on one node - other nodes are not affected.

Log:

Code:

Dec 25 01:10:08 pve-n2 pvestatd[14119]: got timeout
Dec 25 01:10:08 pve-n2 pvestatd[14119]: status update time (5.074 seconds)
Dec 25 01:10:38 pve-n2 pvestatd[14119]: got timeout
Dec 25 01:10:38 pve-n2 pvestatd[14119]: status update time (5.068 seconds)
Dec 25 01:11:00 pve-n2 systemd[1]: Starting Proxmox VE replication runner...
Dec 25 01:11:01 pve-n2 systemd[1]: pvesr.service: Succeeded.
Dec 25 01:11:01 pve-n2 systemd[1]: Started Proxmox VE replication runner.
Dec 25 01:11:28 pve-n2 pvestatd[14119]: got timeout
Dec 25 01:11:28 pve-n2 pvestatd[14119]: status update time (5.068 seconds)
Dec 25 01:12:00 pve-n2 systemd[1]: Starting Proxmox VE replication runner...
Dec 25 01:12:01 pve-n2 systemd[1]: pvesr.service: Succeeded.
Dec 25 01:12:01 pve-n2 systemd[1]: Started Proxmox VE replication runner.
Dec 25 01:12:08 pve-n2 pvestatd[14119]: got timeout
Dec 25 01:12:08 pve-n2 pvestatd[14119]: status update time (5.060 seconds)
Dec 25 01:12:49 pve-n2 pmxcfs[2232]: [status] notice: received log
Dec 25 01:12:58 pve-n2 pvestatd[14119]: got timeout
Dec 25 01:12:58 pve-n2 pvestatd[14119]: status update time (5.059 seconds)
Dec 25 01:13:00 pve-n2 systemd[1]: Starting Proxmox VE replication runner...
Dec 25 01:13:01 pve-n2 systemd[1]: pvesr.service: Succeeded.
Dec 25 01:13:01 pve-n2 systemd[1]: Started Proxmox VE replication runner.
Dec 25 01:13:07 pve-n2 pvedaemon[2551]: <root@pam> successful auth for user 'root@pam'
Dec 25 01:13:28 pve-n2 pvestatd[14119]: got timeout
Dec 25 01:13:28 pve-n2 pvestatd[14119]: status update time (5.058 seconds)

do you have any idea for me?

MarcelW · Dec 25, 2019

Okay, i have a external rbd storage that is causing the problems.
BUT:
on pve-n1 (node 1) i have the "old" ceph version running:

Code:

ceph-common/stable,now 12.2.11+dfsg1-2.1+b1 amd64 [installed]
ceph-fuse/stable,now 12.2.11+dfsg1-2.1+b1 amd64 [installed]
libcephfs2/stable,now 12.2.11+dfsg1-2.1+b1 amd64 [installed]

and have NO errors.

on pve-n2 (node 2) have the pve-ceph version running:

Code:

ceph-base/now 14.2.5-pve1 amd64 [installed,local]
ceph-common/now 14.2.5-pve1 amd64 [installed,local]
ceph-fuse/now 14.2.5-pve1 amd64 [installed,local]
ceph-mds/now 14.2.5-pve1 amd64 [installed,local]
ceph-mgr/now 14.2.5-pve1 amd64 [installed,local]
ceph-mon/now 14.2.5-pve1 amd64 [installed,local]
ceph-osd/now 14.2.5-pve1 amd64 [installed,local]
libcephfs2/now 14.2.5-pve1 amd64 [installed,local]
python-ceph-argparse/now 14.2.5-pve1 all [installed,local]
python-cephfs/now 14.2.5-pve1 amd64 [installed,local]

and have errors.

Is there a way to switch back to the old ceph version?

tim · Dec 27, 2019

please post the output of "pveversion -v" from both hosts

MarcelW · Dec 27, 2019

pve-n1: (the good one)

Code:

root@pve-n1:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.13-1-pve)
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
pve-kernel-5.3: 6.1-1
pve-kernel-helper: 6.1-1
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.3.10-1-pve: 5.3.10-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-4-pve: 5.0.21-9
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.21-2-pve: 5.0.21-7
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 1.2.8-1+pve4
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
openvswitch-switch: 2.10.0+2018.08.28+git.8ca7c82b7d+ds1-12+deb10u1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.1-2
pve-container: 3.0-15
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-4
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2

pve-n2 (the bad one):

Code:

root@pve-n2:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.13-1-pve)
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
pve-kernel-5.3: 6.1-1
pve-kernel-helper: 6.1-1
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.3.10-1-pve: 5.3.10-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-5.0.21-4-pve: 5.0.21-9
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.21-2-pve: 5.0.21-7
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 14.2.5-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 1.2.8-1+pve4
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-9
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-1
pve-cluster: 6.1-2
pve-container: 3.0-15
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 3.13.2-1
qemu-server: 6.1-4
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve2

MarcelW · Jan 7, 2020

Anyone got a solution for me?

itNGO · Jun 1, 2023

MarcelW said:
Anyone got a solution for me?

Was this ever solved?

Search

Search

pvestatd[2973]: got timeout errors

John Allison

Well-Known Member

tim

Proxmox Retired Staff

John Allison

Well-Known Member

tim

Proxmox Retired Staff

MarcelW

Member

MarcelW

Member

tim

Proxmox Retired Staff

MarcelW

Member

MarcelW

Member

itNGO

Renowned Member