pveproxy eats available ram

Corwin

Renowned Member
Jan 26, 2016
37
0
71
FR
Hello !
I encounter slow system and out of memory situations on my server running latest Proxmox version. I do use ZFS with 16 Go ram, first installed with PVE5 then upgraded, with a few running CT's only. I tried to cap the memory used with "options zfs zfs_arc_max" with no luck
proxmox-ve: 6.2-2 (running kernel: 5.4.73-1-pve)
pve-manager: 6.2-15 (running version: 6.2-15/48bd51b6)
It seems the pveproxy processes use 1 GB more every few hours... There is a lot of stuff in the log (see below), but searching in the forum was not helpful.
Can you help ?

# arcstat
time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
11:13:30 13 0 0 0 0 0 0 0 0 3.1G 8.0G

# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
rpool 1,81T 1,26T 566G - - 37% 69% 1.03x ONLINE -

# service pveproxy status
● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2020-11-24 19:13:57 CET; 15h ago
Process: 30167 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 30176 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Process: 8408 ExecReload=/usr/bin/pveproxy restart (code=exited, status=0/SUCCESS)
Main PID: 30184 (pveproxy)
Tasks: 600 (limit: 4915)
Memory: 3.9G
CGroup: /system.slice/pveproxy.service
├─ 327 pveproxy worker (shutdown)
(...)
├─ 2415 pveproxy worker (shutdown)
├─ 2416 pveproxy worker
├─ 2450 pveproxy worker (shutdown)
(...)
lines 1-80

# journalctl -u pveproxy
nov. 24 09:55:41 x pveproxy[2731]: Can't call method "binip" on an undefined value at /usr/share/perl5/Net/IP.pm line 757.
nov. 24 09:55:41 x pveproxy[2729]: worker 2731 finished
nov. 24 09:55:41 x pveproxy[2729]: starting 1 worker(s)
nov. 24 09:55:41 x pveproxy[2729]: worker 6367 started
nov. 24 09:55:44 x pveproxy[6356]: got inotify poll request in wrong process - disabling inotify
nov. 24 09:56:43 x pveproxy[2730]: Can't call method "binip" on an undefined value at /usr/share/perl5/Net/IP.pm line 757.
nov. 24 09:56:43 x pveproxy[2729]: worker 2730 finished
nov. 24 09:56:43 x pveproxy[2729]: starting 1 worker(s)
nov. 24 09:56:43 x pveproxy[2729]: worker 19065 started
nov. 24 09:56:44 x pveproxy[19054]: got inotify poll request in wrong process - disabling inotify
nov. 24 09:58:44 x pveproxy[19065]: Can't call method "binip" on an undefined value at /usr/share/perl5/Net/IP.pm line 757.
 
Well, I saw this in another thread, but was not sure if I needed to restart afterwards. Just tried it with the following output:

(re)generate node files
generate new node certificate
merge authorized SSH keys and known hosts

then service pveproxy stop has simply killed the host...

Nov 25 13:43:50 x pveproxy[30184]: error stopping workers (will kill them now) - timeout
Nov 25 13:43:50 x pveproxy[30184]: kill worker 18137
Nov 25 13:43:50 x pveproxy[30184]: server stopped
Nov 25 13:50:55 x kernel: [ 0.000000] Linux version 5.4.73-1-pve (build@pve) (gcc version 8.3.0 (Debian 8.3.0-6)) #1 SMP PVE 5.4.73-1 (Mon, 16 Nov 2020 10:52:16 +0100) ()
Nov 25 13:50:55 x systemd[1]: Started Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
Nov 25 13:50:55 x kernel: [ 0.000000] Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-5.4.73-1-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs rootd$
Nov 25 13:50:55 x kernel: [ 0.000000] KERNEL supported cpus:
 
anyway need reboot the node

1- after upgrade from PVE5 to PVE6 (that is really important)
2- after renew certs

after reboot see what say systemctl status pveproxy.service ?
 
After reboot, same story in the log (plenty of: got inotify poll request in wrong process - disabling inotify + Can't call method "binip" on an undefined value at /usr/share/perl5/Net/IP.pm line 757).
The memory consumption seems to be a bit better, but shouldn't it be much lower ? What are the expected figures, I would like to compare.
Just for info, upgrade from 5 to 6 was made long time ago.

# service pveproxy status
● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2020-11-25 13:51:17 CET; 7h ago
Process: 18865 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 18869 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Main PID: 18872 (pveproxy)
Tasks: 294 (limit: 4915)
Memory: 1.5G
CGroup: /system.slice/pveproxy.service
├─ 307 pveproxy worker (shutdown)
├─ 540 pveproxy worker (shutdown)
(...)
 
Thanks for sharing this... Still increasing on my side :(
Tasks: 744 (limit: 4915)
Memory: 4.2G

Here is the complete versioning output:

proxmox-ve: 6.2-2 (running kernel: 5.4.73-1-pve)
pve-manager: 6.2-15 (running version: 6.2-15/48bd51b6)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
pve-kernel-5.4.60-1-pve: 5.4.60-2
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-4.15: 5.4-12
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-4.15.18-24-pve: 4.15.18-52
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-4
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-10
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.1-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.3-10
pve-cluster: 6.2-1
pve-container: 3.2-3
pve-docs: 6.2-6
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-6
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-20
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
 
Last edited:
I also have a second server with the same troubles, this one runs with 32 GB, here I'm able to stop and restart pveproxy service. I have also applied pvecm updatecerts -f and restarted pveproxy, but here again the tasks numbers are increasing continuously

pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2020-11-24 20:34:56 CET; 1 day 14h ago
Process: 3101 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 3103 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Process: 31212 ExecReload=/usr/bin/pveproxy restart (code=exited, status=0/SUCCESS)
Main PID: 3106 (pveproxy)
Tasks: 981 (limit: 4915)
Memory: 5.7G
CGroup: /system.slice/pveproxy.service
├─ 318 pveproxy worker (shutdown)
├─ 335 pveproxy worker (shutdown)
├─ 490 pveproxy worker (shutdown)
 
you should be able to check what those tasks are (doing) by looking at the cgroup (there is a tasks file there which contains the PIDs, which you can then use to lookup their cmdlines):
Code:
# while read p; do echo "========="; echo "pid: $p"; echo "cmdline: $(cat /proc/$p/cmdline)"; echo "stack:"; cat /proc/$p/stack; done < /sys/fs/cgroup/systemd/system.slice/pveproxy.service/tasks
 
Yes, what the hell are they doing ? Here is the (partial) output, all other lines are similar

=========
pid: 26716
-bash: avertissement :substitution de commande: octet nul ignoré en entrée
cmdline: pveproxy worker
stack:
[<0>] poll_schedule_timeout.constprop.14+0x46/0x70
[<0>] do_select+0x6c0/0x780
[<0>] core_sys_select+0x1b7/0x3c0
[<0>] kern_select+0xb7/0x100
[<0>] __x64_sys_select+0x24/0x30
[<0>] do_syscall_64+0x57/0x190
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
=========
pid: 28114
-bash: avertissement :substitution de commande: octet nul ignoré en entrée
cmdline: pveproxy worker (shutdown)
stack:
[<0>] poll_schedule_timeout.constprop.14+0x46/0x70
[<0>] do_select+0x6c0/0x780
[<0>] core_sys_select+0x1b7/0x3c0
[<0>] kern_select+0xb7/0x100
[<0>] __x64_sys_select+0x24/0x30
[<0>] do_syscall_64+0x57/0x190
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
=========
 
that looks like something is blocking the old workers from exiting.. could you paste the complete output somewhere?
 
thanks! could you also post the output of lsof -p 470?
 
On a hunch - could you please post /etc/default/pveproxy (if it exists)?
 
Here is the output of lsof, and the content of /etc/defaut/pveproxy :
ALLOW_FROM="X.Y.Z.0/24,A.B.C.D/32"
DENY_FROM="all"
POLICY="allow"
 

Attachments

  • lsof.txt
    17.5 KB · Views: 5
Could you please:
* try removing the access restrictions ins /etc/default/pveproxy, restart pveproxy and check if anything changes (the hanging processes, or at least the warnings about binip not being printed)? (only temporary)
* paste the output of `ss -tanp`

Do you have any kind of proxy in front of pveproxy, or are you directly connecting to the host at port 8006?

Thanks!
 
There is only a direct connection to the host. So I removed completely /etc/default/pveproxy and restarted, and it provides instant improvement ! Sounds good... Let's see after a few hours.
What do you recommend next ?


Tasks: 4 (limit: 4915)
Memory: 158.4M
CGroup: /system.slice/pveproxy.service
├─26102 pveproxy
├─26103 pveproxy worker
├─26104 pveproxy worker
└─26105 pveproxy worker
 
There is only a direct connection to the host. So I removed completely /etc/default/pveproxy and restarted, and it provides instant improvement ! Sounds good... Let's see after a few hours.
What do you recommend next ?
glad that helped for now - if this indeed fixes the problem we at least know where to search ....

the `ss -tanp` output would help in reproducing this - do you connect via IPv6 or IPv4?

Thanks!
 
Should be only IPv4. Yes, here is the output of ss -tanp with existing /etc/default/pveproxy (edited as it contains public IP's)
 

Attachments

  • ss_after_10h.txt
    23.1 KB · Views: 5
Last edited:
So it keeps only 4 tasks after removing /etc/default/pveproxy
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2020-11-27 10:54:56 CET; 5h 0min ago
Process: 21535 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=0/SUCCESS)
Process: 21547 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Main PID: 21557 (pveproxy)
Tasks: 4 (limit: 4915)
Memory: 134.9M
CGroup: /system.slice/pveproxy.service
├─21557 pveproxy
├─21558 pveproxy worker
├─21559 pveproxy worker
└─21560 pveproxy worker
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!