Unexplained slow Memory Leak / Growth problem

brb8two · Oct 13, 2020

Hi All,

I would like to find a solution to the slow memory Leak / Growth problem I have with some PVE Nodes I manage.
I have PVE 6.2-6 installed (Issue occured with previous versions also).
Almost all nodes are using the same QNAP Storage pool for VMs via NFS V4 protocol.
Almost all Guest Machines are Microsoft Windows (Drives use VirtIO, Network use VirtIO, Guest agents Installed, Display Configured as SPICE, i440fx)

I can reclaim the leaked RAM by migrating guest vm to another node and back (no reboot required for guest or PVE)

Using the past week as an example,
PVENode1 is slowly affected (4 weeks ago 44GB used, 2 weeks ago had 48GB used, now at 53GB used) (4 guest vm)
PVENode2 in 2 weeks had increase 10GB of RAM without any Guest VM changes / restarts. (7 guest vm)
PVENode3 had grown to 170GB of RAM and half a week ago I used the migration trick to re-claim the RAM. The RAM usage came down to 108GB and starting to grow again, currently 123GB. (17 guest vm)
PVENode4 similar to PVENode1 had 43 GB, grown to 50GB in the past week. (5 guest vm)
PVENode5 similar to PVENode1 (Guest VMs changed in this period) (3 guest vm)
Attached are example memory plots.

I might need to be guided through the next steps if you need more information.

Thomas Plant · Oct 13, 2020

Same problem here......never found a solution, other than migrating VMs regularly back and forth to another host. Forum Post

brb8two · Oct 13, 2020

PVENode 2: (7 Guest vm, configured with 6 x 2GB RAM, 1x 512MB)

Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 30.7 us, 9.5 sy, 0.0 ni, 55.0 id, 0.0 wa, 0.0 hi, 4.8 si, 0.0 st
MiB Mem : 28120.2 total, 1881.9 free, 25401.3 used, 837.0 buff/cache
MiB Swap: 8192.0 total, 4397.7 free, 3794.3 used. 2174.6 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16277 root 20 0 9402804 3.9g 5536 S 157.3 14.2 10812:00 kvm
16678 root 20 0 9493368 4.0g 5556 S 43.4 14.7 12606:32 kvm
17052 root 20 0 9521520 4.8g 5528 S 30.1 17.6 12302:13 kvm
17479 root 20 0 8678740 4.2g 5608 S 28.1 15.2 10638:18 kvm
70 root 25 5 0 0 0 S 27.5 0.0 18224:12 ksmd
17723 root 20 0 9349532 4.6g 5508 S 27.2 16.8 11597:55 kvm
15564 root 20 0 3029620 1.2g 4336 S 9.9 4.4 11192:21 kvm
1183 root rt 0 573668 178284 51452 S 3.6 0.6 4407:57 corosync
1205 root 20 0 297396 36220 7988 S 3.6 0.1 1598:52 pvestatd
22543 root 20 0 5841956 3.1g 10768 S 2.6 11.2 502:08.40 kvm
16324 root 20 0 0 0 0 S 1.7 0.0 16:27.69 vhost-16277

It's clearly from KVM, but why?
PID 15564 is the VM with 2GB RAM configured and running OpenSUSE with Local-lvm storage. (76 days)
PID 22543 is the VM with 512MB RAM and migrated 7 days ago.

brb8two · Oct 14, 2020

A simple BASH script helps to indicate wich KVM sessions are using more RAM;

Code:

for OUTPUT in $(pgrep kvm); do ps p $OUTPUT --no-headers o pid,rss,sz,command; done

13212 2158180 719145 /usr/bin/kvm -id 910 -name ...
13225 0 0 [kvm-nx-lpage-re]
13269 0 0 [kvm-pit/13212]
15564 1059496 757405 /usr/bin/kvm -id 919 -name ...
15572 0 0 [kvm-nx-lpage-re]
15597 0 0 [kvm-pit/15564]
16678 4297960 2641546 /usr/bin/kvm -id 911 -name ...
16683 0 0 [kvm-nx-lpage-re]
16727 0 0 [kvm-pit/16678]
17479 4752012 2491965 /usr/bin/kvm -id 914 -name ...
17484 0 0 [kvm-nx-lpage-re]
17535 0 0 [kvm-pit/17479]
22543 3812188 1682441 /usr/bin/kvm -id 909 -name ...
22548 0 0 [kvm-nx-lpage-re]
22593 0 0 [kvm-pit/22543]
28337 2213776 721031 /usr/bin/kvm -id 915 -name ...
28342 0 0 [kvm-nx-lpage-re]
28396 0 0 [kvm-pit/28337]
29918 2261996 785354 /usr/bin/kvm -id 912 -name ...
29923 0 0 [kvm-nx-lpage-re]
29983 0 0 [kvm-pit/29918]

tomarch · Oct 19, 2020

Same problem here (proxmox 6.2).
Only the VM with NFS (v4) have the issue, no problem with local storage or iscsi.
The memory grow fast when there are high IO on the disk.

brb8two · Oct 27, 2020

I wonder if this could be a solution to this issue?
NFS over RDMA RoCE

brb8two · Nov 2, 2020

Update to nfs-common, I'm upgrading to Proxmox 6.2-14 (from 6.2-6) to test if this resolves the issue.

Code:

nfs-utils (1:1.3.4-2.5+deb10u1) buster; urgency=medium

  * statd: take user-id from /var/lib/nfs/sm (CVE-2019-3689) (Closes: #940848)
  * Don't make /var/lib/nfs owned by statd.
    Only sm and sm.bak need to be accessible by statd or sm-notify after
    they drop privileges.
  * debian/control: Point Vcs URLs to kernel-team namespace repository

 -- Salvatore Bonaccorso <carnil@debian.org>  Wed, 24 Jun 2020 09:54:47 +0200

nfs-utils (1:1.3.4-2.5) unstable; urgency=medium
 -- Bernd Zeimetz <bzed@debian.org>  Sat, 06 Apr 2019 18:30:39 +0200

brb8two · Nov 8, 2020

5 days on, so far so good.

brb8two · Nov 16, 2020

Still looking good.

Slower growth than before occuring (15.83GB --> 16.59GB over ~ 12 days)

Search

Search

Unexplained slow Memory Leak / Growth problem

brb8two

Member

Attachments

Thomas Plant

Member

brb8two

Member

brb8two

Member

tomarch

Active Member

brb8two

Member

brb8two

Member

brb8two

Member

brb8two

Member

We value your privacy