Unexplained slow Memory Leak / Growth problem

brb8two

Member
Dec 17, 2019
21
1
23
42
Hi All,

I would like to find a solution to the slow memory Leak / Growth problem I have with some PVE Nodes I manage.
I have PVE 6.2-6 installed (Issue occured with previous versions also).
Almost all nodes are using the same QNAP Storage pool for VMs via NFS V4 protocol.
Almost all Guest Machines are Microsoft Windows (Drives use VirtIO, Network use VirtIO, Guest agents Installed, Display Configured as SPICE, i440fx)

I can reclaim the leaked RAM by migrating guest vm to another node and back (no reboot required for guest or PVE)

Using the past week as an example,
PVENode1 is slowly affected (4 weeks ago 44GB used, 2 weeks ago had 48GB used, now at 53GB used) (4 guest vm)
PVENode2 in 2 weeks had increase 10GB of RAM without any Guest VM changes / restarts. (7 guest vm)
PVENode3 had grown to 170GB of RAM and half a week ago I used the migration trick to re-claim the RAM. The RAM usage came down to 108GB and starting to grow again, currently 123GB. (17 guest vm)
PVENode4 similar to PVENode1 had 43 GB, grown to 50GB in the past week. (5 guest vm)
PVENode5 similar to PVENode1 (Guest VMs changed in this period) (3 guest vm)
Attached are example memory plots.

I might need to be guided through the next steps if you need more information.
 

Attachments

  • PVENode1.png
    PVENode1.png
    18.3 KB · Views: 1
  • pveNode2.png
    pveNode2.png
    18 KB · Views: 1
  • PVENode3.png
    PVENode3.png
    19.1 KB · Views: 1
PVENode 2: (7 Guest vm, configured with 6 x 2GB RAM, 1x 512MB)

Tasks: 257 total, 1 running, 256 sleeping, 0 stopped, 0 zombie
%Cpu(s): 30.7 us, 9.5 sy, 0.0 ni, 55.0 id, 0.0 wa, 0.0 hi, 4.8 si, 0.0 st
MiB Mem : 28120.2 total, 1881.9 free, 25401.3 used, 837.0 buff/cache
MiB Swap: 8192.0 total, 4397.7 free, 3794.3 used. 2174.6 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16277 root 20 0 9402804 3.9g 5536 S 157.3 14.2 10812:00 kvm
16678 root 20 0 9493368 4.0g 5556 S 43.4 14.7 12606:32 kvm
17052 root 20 0 9521520 4.8g 5528 S 30.1 17.6 12302:13 kvm
17479 root 20 0 8678740 4.2g 5608 S 28.1 15.2 10638:18 kvm
70 root 25 5 0 0 0 S 27.5 0.0 18224:12 ksmd
17723 root 20 0 9349532 4.6g 5508 S 27.2 16.8 11597:55 kvm
15564
root 20 0 3029620 1.2g 4336 S 9.9 4.4 11192:21 kvm
1183 root rt 0 573668 178284 51452 S 3.6 0.6 4407:57 corosync
1205 root 20 0 297396 36220 7988 S 3.6 0.1 1598:52 pvestatd
22543 root 20 0 5841956 3.1g 10768 S 2.6 11.2 502:08.40 kvm
16324 root 20 0 0 0 0 S 1.7 0.0 16:27.69 vhost-16277

It's clearly from KVM, but why?
PID 15564 is the VM with 2GB RAM configured and running OpenSUSE with Local-lvm storage. (76 days)
PID 22543 is the VM with 512MB RAM and migrated 7 days ago.
 
Last edited:
A simple BASH script helps to indicate wich KVM sessions are using more RAM;

Code:
for OUTPUT in $(pgrep kvm); do ps p $OUTPUT --no-headers o pid,rss,sz,command; done


13212 2158180 719145 /usr/bin/kvm -id 910 -name ...
13225 0 0 [kvm-nx-lpage-re]
13269 0 0 [kvm-pit/13212]
15564 1059496 757405 /usr/bin/kvm -id 919 -name ...
15572 0 0 [kvm-nx-lpage-re]
15597 0 0 [kvm-pit/15564]
16678 4297960 2641546 /usr/bin/kvm -id 911 -name ...
16683 0 0 [kvm-nx-lpage-re]
16727 0 0 [kvm-pit/16678]
17479 4752012 2491965 /usr/bin/kvm -id 914 -name ...
17484 0 0 [kvm-nx-lpage-re]
17535 0 0 [kvm-pit/17479]
22543 3812188 1682441 /usr/bin/kvm -id 909 -name ...
22548 0 0 [kvm-nx-lpage-re]
22593 0 0 [kvm-pit/22543]
28337 2213776 721031 /usr/bin/kvm -id 915 -name ...
28342 0 0 [kvm-nx-lpage-re]
28396 0 0 [kvm-pit/28337]
29918 2261996 785354 /usr/bin/kvm -id 912 -name ...
29923 0 0 [kvm-nx-lpage-re]
29983 0 0 [kvm-pit/29918]
 
Same problem here (proxmox 6.2).
Only the VM with NFS (v4) have the issue, no problem with local storage or iscsi.
The memory grow fast when there are high IO on the disk.
 
Update to nfs-common, I'm upgrading to Proxmox 6.2-14 (from 6.2-6) to test if this resolves the issue.

Code:
nfs-utils (1:1.3.4-2.5+deb10u1) buster; urgency=medium

  * statd: take user-id from /var/lib/nfs/sm (CVE-2019-3689) (Closes: #940848)
  * Don't make /var/lib/nfs owned by statd.
    Only sm and sm.bak need to be accessible by statd or sm-notify after
    they drop privileges.
  * debian/control: Point Vcs URLs to kernel-team namespace repository

 -- Salvatore Bonaccorso <carnil@debian.org>  Wed, 24 Jun 2020 09:54:47 +0200

nfs-utils (1:1.3.4-2.5) unstable; urgency=medium
 -- Bernd Zeimetz <bzed@debian.org>  Sat, 06 Apr 2019 18:30:39 +0200
 
PVE-MemoryResolved.PNG
Still looking good.

Slower growth than before occuring (15.83GB --> 16.59GB over ~ 12 days)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!