pveproxy eats available ram

Corwin

Renowned Member
Jan 26, 2016
37
0
71
FR
Hello !
I encounter slow system and out of memory situations on my server running latest Proxmox version. I do use ZFS with 16 Go ram, first installed with PVE5 then upgraded, with a few running CT's only. I tried to cap the memory used with "options zfs zfs_arc_max" with no luck
proxmox-ve: 6.2-2 (running kernel: 5.4.73-1-pve)
pve-manager: 6.2-15 (running version: 6.2-15/48bd51b6)
It seems the pveproxy processes use 1 GB more every few hours... There is a lot of stuff in the log (see below), but searching in the forum was not helpful.
Can you help ?

# arcstat
time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
11:13:30 13 0 0 0 0 0 0 0 0 3.1G 8.0G

# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 1,81T 1,26T 566G - - 37% 69% 1.03x ONLINE -

# service pveproxy status
● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2020-11-24 19:13:57 CET; 15h ago
Process: 30167 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 30176 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Process: 8408 ExecReload=/usr/bin/pveproxy restart (code=exited, status=0/SUCCESS)
Main PID: 30184 (pveproxy)
Tasks: 600 (limit: 4915)
Memory: 3.9G
CGroup: /system.slice/pveproxy.service
├─ 327 pveproxy worker (shutdown)
(...)
├─ 2415 pveproxy worker (shutdown)
├─ 2416 pveproxy worker
├─ 2450 pveproxy worker (shutdown)
(...)
lines 1-80

# journalctl -u pveproxy
nov. 24 09:55:41 x pveproxy[2731]: Can't call method "binip" on an undefined value at /usr/share/perl5/Net/IP.pm line 757.
nov. 24 09:55:41 x pveproxy[2729]: worker 2731 finished
nov. 24 09:55:41 x pveproxy[2729]: starting 1 worker(s)
nov. 24 09:55:41 x pveproxy[2729]: worker 6367 started
nov. 24 09:55:44 x pveproxy[6356]: got inotify poll request in wrong process - disabling inotify
nov. 24 09:56:43 x pveproxy[2730]: Can't call method "binip" on an undefined value at /usr/share/perl5/Net/IP.pm line 757.
nov. 24 09:56:43 x pveproxy[2729]: worker 2730 finished
nov. 24 09:56:43 x pveproxy[2729]: starting 1 worker(s)
nov. 24 09:56:43 x pveproxy[2729]: worker 19065 started
nov. 24 09:56:44 x pveproxy[19054]: got inotify poll request in wrong process - disabling inotify
nov. 24 09:58:44 x pveproxy[19065]: Can't call method "binip" on an undefined value at /usr/share/perl5/Net/IP.pm line 757.
 
Well, I saw this in another thread, but was not sure if I needed to restart afterwards. Just tried it with the following output:

(re)generate node files
generate new node certificate
merge authorized SSH keys and known hosts

then service pveproxy stop has simply killed the host...

Nov 25 13:43:50 x pveproxy[30184]: error stopping workers (will kill them now) - timeout
Nov 25 13:43:50 x pveproxy[30184]: kill worker 18137
Nov 25 13:43:50 x pveproxy[30184]: server stopped
Nov 25 13:50:55 x kernel: [ 0.000000] Linux version 5.4.73-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.73-1 (Mon, 16 Nov 2020 10:52:16 +0100) ()
Nov 25 13:50:55 x systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Nov 25 13:50:55 x kernel: [ 0.000000] Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-5.4.73-1-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs rootd$
Nov 25 13:50:55 x kernel: [ 0.000000] KERNEL supported cpus:
 
anyway need reboot the node

1- after upgrade from PVE5 to PVE6 (that is really important)
2- after renew certs

after reboot see what say systemctl status pveproxy.service ?
 
After reboot, same story in the log (plenty of: got inotify poll request in wrong process - disabling inotify + Can't call method "binip" on an undefined value at /usr/share/perl5/Net/IP.pm line 757).
The memory consumption seems to be a bit better, but shouldn't it be much lower ? What are the expected figures, I would like to compare.
Just for info, upgrade from 5 to 6 was made long time ago.

# service pveproxy status
● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2020-11-25 13:51:17 CET; 7h ago
Process: 18865 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 18869 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Main PID: 18872 (pveproxy)
Tasks: 294 (limit: 4915)
Memory: 1.5G
CGroup: /system.slice/pveproxy.service
├─ 307 pveproxy worker (shutdown)
├─ 540 pveproxy worker (shutdown)
(...)
 
Thanks for sharing this... Still increasing on my side :(
Tasks: 744 (limit: 4915)
Memory: 4.2G

Here is the complete versioning output:

proxmox-ve: 6.2-2 (running kernel: 5.4.73-1-pve)
pve-manager: 6.2-15 (running version: 6.2-15/48bd51b6)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-4.15: 5.4-12
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-4.15.18-24-pve: 4.15.18-52
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-4
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-10
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.1-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.3-10
pve-cluster: 6.2-1
pve-container: 3.2-3
pve-docs: 6.2-6
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-6
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-20
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
Last edited:
I also have a second server with the same troubles, this one runs with 32 GB, here I'm able to stop and restart pveproxy service. I have also applied pvecm updatecerts -f and restarted pveproxy, but here again the tasks numbers are increasing continuously

pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2020-11-24 20:34:56 CET; 1 day 14h ago
Process: 3101 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 3103 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Process: 31212 ExecReload=/usr/bin/pveproxy restart (code=exited, status=0/SUCCESS)
Main PID: 3106 (pveproxy)
Tasks: 981 (limit: 4915)
Memory: 5.7G
CGroup: /system.slice/pveproxy.service
├─ 318 pveproxy worker (shutdown)
├─ 335 pveproxy worker (shutdown)
├─ 490 pveproxy worker (shutdown)
 
you should be able to check what those tasks are (doing) by looking at the cgroup (there is a tasks file there which contains the PIDs, which you can then use to lookup their cmdlines):
Code:
# while read p; do echo "========="; echo "pid: $p"; echo "cmdline: $(cat /proc/$p/cmdline)"; echo "stack:"; cat /proc/$p/stack; done < /sys/fs/cgroup/systemd/system.slice/pveproxy.service/tasks
 
Yes, what the hell are they doing ? Here is the (partial) output, all other lines are similar

=========
pid: 26716
-bash: avertissement :substitution de commande: octet nul ignoré en entrée
cmdline: pveproxy worker
stack:
[<0>] poll_schedule_timeout.constprop.14+0x46/0x70
[<0>] do_select+0x6c0/0x780
[<0>] core_sys_select+0x1b7/0x3c0
[<0>] kern_select+0xb7/0x100
[<0>] __x64_sys_select+0x24/0x30
[<0>] do_syscall_64+0x57/0x190
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
=========
pid: 28114
-bash: avertissement :substitution de commande: octet nul ignoré en entrée
cmdline: pveproxy worker (shutdown)
stack:
[<0>] poll_schedule_timeout.constprop.14+0x46/0x70
[<0>] do_select+0x6c0/0x780
[<0>] core_sys_select+0x1b7/0x3c0
[<0>] kern_select+0xb7/0x100
[<0>] __x64_sys_select+0x24/0x30
[<0>] do_syscall_64+0x57/0x190
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
=========
 
that looks like something is blocking the old workers from exiting.. could you paste the complete output somewhere?
 
thanks! could you also post the output of lsof -p 470?
 
On a hunch - could you please post /etc/default/pveproxy (if it exists)?
 
Here is the output of lsof, and the content of /etc/defaut/pveproxy :
ALLOW_FROM="X.Y.Z.0/24,A.B.C.D/32"
DENY_FROM="all"
POLICY="allow"
 

Attachments

Could you please:
* try removing the access restrictions ins /etc/default/pveproxy, restart pveproxy and check if anything changes (the hanging processes, or at least the warnings about binip not being printed)? (only temporary)
* paste the output of `ss -tanp`

Do you have any kind of proxy in front of pveproxy, or are you directly connecting to the host at port 8006?

Thanks!
 
There is only a direct connection to the host. So I removed completely /etc/default/pveproxy and restarted, and it provides instant improvement ! Sounds good... Let's see after a few hours.
What do you recommend next ?


Tasks: 4 (limit: 4915)
Memory: 158.4M
CGroup: /system.slice/pveproxy.service
├─26102 pveproxy
├─26103 pveproxy worker
├─26104 pveproxy worker
└─26105 pveproxy worker
 
There is only a direct connection to the host. So I removed completely /etc/default/pveproxy and restarted, and it provides instant improvement ! Sounds good... Let's see after a few hours.
What do you recommend next ?
glad that helped for now - if this indeed fixes the problem we at least know where to search ....

the `ss -tanp` output would help in reproducing this - do you connect via IPv6 or IPv4?

Thanks!
 
Should be only IPv4. Yes, here is the output of ss -tanp with existing /etc/default/pveproxy (edited as it contains public IP's)
 

Attachments

Last edited:
So it keeps only 4 tasks after removing /etc/default/pveproxy
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2020-11-27 10:54:56 CET; 5h 0min ago
Process: 21535 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 21547 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Main PID: 21557 (pveproxy)
Tasks: 4 (limit: 4915)
Memory: 134.9M
CGroup: /system.slice/pveproxy.service
├─21557 pveproxy
├─21558 pveproxy worker
├─21559 pveproxy worker
└─21560 pveproxy worker