pvedaemon (API) becomes slow (optimization tips?)

encore

Well-Known Member
May 4, 2018
108
1
58
35
Hi there,

we are hosting thousands of CTs on proxmox using your API.
From time to time it seems like pvedaemon worker processes get stucked.
I see 3x pve daemon worker process with 100% each permanent. When I kill them and restart the pvedaemon service, they run smooth again on 10-40% cpu usage.

When it comes to these 100% issue, the API responses very slow.
A kill & restart fix it for a while.

Any idea to optimize the pvedaemon? Maybe tell him to use more workers? Is there any log where I might find a reason for those stucking processes?

Thank you,
Marvin
 
I am having a problem with this as well. on some clusters api responses are reasonable, on others it can be really slow (as in 20sec for a reply.) Each node has between 80-100 cts. I tried sending a request to different nodes but the end result is the same:

Code:
# time pvesh get /nodes/node15/lxc
...
real    0m19.848s
user    0m1.148s
sys     0m0.150s

Code:
 #pveversion -v
# pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.17-2-pve)
pve-manager: 5.2-1 (running version: 5.2-1/0fcd7879)
pve-kernel-4.15: 5.2-2
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 12.2.5-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-16
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-23
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
openvswitch-switch: 2.7.0-2
proxmox-widget-toolkit: 1.0-18
pve-cluster: 5.0-27
pve-container: 2.0-23
pve-docs: 5.2-4
pve-firewall: 3.0-9
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-5
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-5
qemu-server: 5.0-26
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9

Code:
# pvecm status
Quorum information
------------------
Date:             Fri Aug 24 11:16:35 2018
Quorum provider:  corosync_votequorum
Nodes:            9
Node ID:          0x00000008
Ring ID:          1/752
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   9
Highest expected: 9
Total votes:      9
Quorum:           5
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.19.1.8
0x00000002          1 10.19.1.9
0x00000003          1 10.19.1.10
0x00000004          1 10.19.1.11
0x00000005          1 10.19.1.12
0x00000006          1 10.19.1.13
0x00000007          1 10.19.1.14
0x00000008          1 10.19.1.15 (local)
0x00000009          1 10.19.1.16

sysctl fs variables:
fs.aio-max-nr = 1048576
fs.aio-nr = 62896
fs.binfmt_misc.status = enabled
fs.dentry-state = 2829612 2404098 45 0 0 0
fs.dir-notify-enable = 1
fs.epoll.max_user_watches = 40549478
fs.file-max = 19769117
fs.file-nr = 73680 0 19769117
fs.inode-nr = 1748660 130375
fs.inode-state = 1748660 130375 0 0 0 0 0
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 131072
fs.inotify.max_user_watches = 524288
 
I am having a problem with this as well. on some clusters api responses are reasonable, on others it can be really slow (as in 20sec for a reply.) Each node has between 80-100 cts. I tried sending a request to different nodes but the end result is the same:

Code:
# time pvesh get /nodes/node15/lxc
...
real    0m19.848s
user    0m1.148s
sys     0m0.150s

Code:
 #pveversion -v
# pveversion -v
proxmox-ve: 5.2-2 (running kernel: 4.15.17-2-pve)
pve-manager: 5.2-1 (running version: 5.2-1/0fcd7879)
pve-kernel-4.15: 5.2-2
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-2-pve: 4.13.13-33
ceph: 12.2.5-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-4
libpve-common-perl: 5.0-31
libpve-guest-common-perl: 2.0-16
libpve-http-server-perl: 2.0-8
libpve-storage-perl: 5.0-23
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.0-3
lxcfs: 3.0.0-1
novnc-pve: 0.6-4
openvswitch-switch: 2.7.0-2
proxmox-widget-toolkit: 1.0-18
pve-cluster: 5.0-27
pve-container: 2.0-23
pve-docs: 5.2-4
pve-firewall: 3.0-9
pve-firmware: 2.0-4
pve-ha-manager: 2.0-5
pve-i18n: 1.0-5
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.1-5
pve-xtermjs: 1.0-5
qemu-server: 5.0-26
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.9-pve1~bpo9

Code:
# pvecm status
Quorum information
------------------
Date:             Fri Aug 24 11:16:35 2018
Quorum provider:  corosync_votequorum
Nodes:            9
Node ID:          0x00000008
Ring ID:          1/752
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   9
Highest expected: 9
Total votes:      9
Quorum:           5
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.19.1.8
0x00000002          1 10.19.1.9
0x00000003          1 10.19.1.10
0x00000004          1 10.19.1.11
0x00000005          1 10.19.1.12
0x00000006          1 10.19.1.13
0x00000007          1 10.19.1.14
0x00000008          1 10.19.1.15 (local)
0x00000009          1 10.19.1.16

sysctl fs variables:
fs.aio-max-nr = 1048576
fs.aio-nr = 62896
fs.binfmt_misc.status = enabled
fs.dentry-state = 2829612 2404098 45 0 0 0
fs.dir-notify-enable = 1
fs.epoll.max_user_watches = 40549478
fs.file-max = 19769117
fs.file-nr = 73680 0 19769117
fs.inode-nr = 1748660 130375
fs.inode-state = 1748660 130375 0 0 0 0 0
fs.inotify.max_queued_events = 16384
fs.inotify.max_user_instances = 131072
fs.inotify.max_user_watches = 524288

Were you able to fix your slow api response ?
 
no. I just live with it and know that 100-120cts/node is the operational ceiling. mind you this isnt the only limit; number of nodes in the cluster also makes a difference. Does anyone have experience in running nodes/clusters more densely?
 
We have a cluster with 5 HV 50-60 VM per node but the API response is very slow it takes 5-20 sec before we all data is loaded. We are curious if there is a way to make this faster. The cluster is healthy but the API response is to slow.
 
we still having this problem, very annoying. We implemented a workaround by caching informations from proxmox, but this is a really dirty solution.
 
we still having this problem, very annoying. We implemented a workaround by caching informations from proxmox, but this is a really dirty solution.

Could you explain what your workaround is exactly and how resolved it for now? Is there nobody who can help us identify our problem and resolve it or someone which had the problem before and was able to resolve it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!