[SOLVED] Memory leak after update pve 5 to 6

experiment with vm clone seems useless - without networking and application work load cloned vm almost don't using memory
 
experiment with vm clone seems useless - without networking and application work load cloned vm almost don't using memory

This isn't exactly useless, it can at least point to what an (indirect) trigger of the issue is - you could try doing FIO storage tests, CPU/memory stress-ng tests and if you can enable network with another mac/ip do some networks test to see if any of those causes the memory grow.
 
I belive we have the same Problem

  • pveversion -v
    proxmox-ve: 6.2-1 (running kernel: 5.4.44-1-pve)
    pve-manager: 6.2-6 (running version: 6.2-6/ee1d7754)
    pve-kernel-5.4: 6.2-3
    pve-kernel-helper: 6.2-3
    pve-kernel-5.3: 6.1-6
    pve-kernel-5.4.44-1-pve: 5.4.44-1
    pve-kernel-4.15: 5.4-16
    pve-kernel-5.3.18-3-pve: 5.3.18-3
    pve-kernel-4.13: 5.2-2
    pve-kernel-4.15.18-27-pve: 4.15.18-55
    pve-kernel-4.15.18-9-pve: 4.15.18-30
    pve-kernel-4.15.18-1-pve: 4.15.18-19
    pve-kernel-4.13.16-4-pve: 4.13.16-51
    pve-kernel-4.13.16-1-pve: 4.13.16-46
    pve-kernel-4.13.13-2-pve: 4.13.13-33
    ceph-fuse: 12.2.11+dfsg1-2.1+b1
    corosync: 3.0.3-pve1
    criu: 3.11-3
    glusterfs-client: 5.5-3
    ifupdown: 0.8.35+pve1
    ksm-control-daemon: 1.3-1
    libjs-extjs: 6.0.1-10
    libknet1: 1.15-pve1
    libproxmox-acme-perl: 1.0.4
    libpve-access-control: 6.1-1
    libpve-apiclient-perl: 3.0-3
    libpve-common-perl: 6.1-3
    libpve-guest-common-perl: 3.0-10
    libpve-http-server-perl: 3.0-5
    libpve-storage-perl: 6.1-8
    libqb0: 1.0.5-1
    libspice-server1: 0.14.2-4~pve6+1
    lvm2: 2.03.02-pve4
    lxc-pve: 4.0.2-1
    lxcfs: 4.0.3-pve3
    novnc-pve: 1.1.0-1
    proxmox-mini-journalreader: 1.1-1
    proxmox-widget-toolkit: 2.2-8
    pve-cluster: 6.1-8
    pve-container: 3.1-8
    pve-docs: 6.2-4
    pve-edk2-firmware: 2.20200531-1
    pve-firewall: 4.1-2
    pve-firmware: 3.1-1
    pve-ha-manager: 3.0-9
    pve-i18n: 2.1-3
    pve-qemu-kvm: 5.0.0-4
    pve-xtermjs: 4.3.0-1
    qemu-server: 6.2-3
    smartmontools: 7.1-pve2
    spiceterm: 3.1-1
    vncterm: 1.6-1
    zfsutils-linux: 0.8.4-pve1


  • qm config 131
    balloon: 0
    bootdisk: virtio0
    cores: 2
    ide2: none,media=cdrom
    memory: 2048
    name: Mailcleaner
    net0: virtio=4A:94:4E:4D:7F:C7,bridge=vmbr1,tag=33
    numa: 0
    ostype: l26
    scsihw: virtio-scsi-pci
    smbios1: uuid=b2ac31c5-4549-4f7b-842a-7449a8cfa909
    sockets: 1
    startup: order=300
    virtio0: storage2nfs2:131/vm-131-disk-0.qcow2,size=60G
    vmgenid: e04beee7-7eaf-48e0-ae23-a42a02c2dd8f

  • root@pve1:~# top -bn1 -p $(cat /run/qemu-server/131.pid)
    top - 13:23:56 up 19 days, 23:29, 1 user, load average: 12.53, 7.01, 5.72
    Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie
    %Cpu(s): 14.3 us, 3.4 sy, 0.0 ni, 81.0 id, 0.0 wa, 0.0 hi, 1.3 si, 0.0 st
    MiB Mem : 96661.4 total, 452.4 free, 69751.1 used, 26457.9 buff/cache
    MiB Swap: 8192.0 total, 4139.7 free, 4052.2 used. 25926.6 avail Mem

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    13078 root 20 0 19.5g 9.6g 6892 S 0.0 10.2 626:09.25 kvm
Proxmox itself runs stable the machine gets killt bei oom killer at an RES level of about 20G. RES grows roughly 2G in the last 48 houres.
Kill looks like this (was Version 6.1 and the first kill detected)
  • May 27 04:02:29 pve1 kernel: [2162347.843393] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0-1,global_oom,task_memcg=/qemu.slice/131.scope,task=kvm,pid=9111,uid=0
  • May 27 04:02:29 pve1 kernel: [2162347.843436] Out of memory: Killed process 9111 (kvm) total-vm:72656248kB, anon-rss:32005536kB, file-rss:252kB, shmem-rss:4kB

This machine is running the appliance from mailcleaner. One other older Linux machine seems to have a simmilar issue but not that extreme, also 3 windows 2000 servers seem to grow roughly 100mb a day

This is a proxmox cluster of 4 machines (one is shut down) and another machine is hosting storage which is accessd by NFS

Already disabled balooning, it is off at the moment, but I have no Idead what is the problem. Migrating the machine between cluster nodes reduces the resources needed by the machine instantly

Also upgrades Proxmox from 6.1 to the Version above, no problems with version 5 of Proxmox same Virtual machine
 
Tested Machine Type q35 without anything better also we have a script running that monitory RSS Values of the kvm process there are just a few 'jumps' in the memory usage
Here are a few examples it just jumps 240MB RAM Usage in 1 Minute at 18:00. The Problem is new since Proxmox 6 and only very few machines show that kind of problem
2020-07-22 18:13:00 3014352896

2020-07-22 18:12:00 3014352896

2020-07-22 18:11:00 3014352896

2020-07-22 18:10:00 3014352896

2020-07-22 18:09:00 3011690496

2020-07-22 18:08:00 3011690496

2020-07-22 18:07:00 3011690496

2020-07-22 18:06:00 3011690496

2020-07-22 18:05:00 3011690496

2020-07-22 18:04:00 2979483648

2020-07-22 18:03:00 2979483648

2020-07-22 18:02:00 2979483648

2020-07-22 18:01:00 2979483648

2020-07-22 18:00:00 2979483648

2020-07-22 17:59:00 2502713344

2020-07-22 17:58:00 2502713344

2020-07-22 17:57:00 2502713344

2020-07-22 17:56:00 2502713344

2020-07-22 17:55:00 2502713344

2020-07-22 17:54:00 2502979584

2020-07-22 17:53:00 2502979584

2020-07-22 17:52:00 2502979584

2020-07-22 17:51:00 2502979584

2020-07-22 17:50:00 2502979584
2020-07-22 18:09:00 3011690496

2020-07-22 18:08:00 3011690496

2020-07-22 18:07:00 3011690496

2020-07-22 18:06:00 3011690496

2020-07-22 18:05:00 3011690496

2020-07-22 18:04:00 2979483648

2020-07-22 18:03:00 2979483648

2020-07-22 18:02:00 2979483648
2020-07-23 00:08:00 3402567680

2020-07-23 00:07:00 3402567680

2020-07-23 00:06:00 3402567680

2020-07-23 00:05:00 3402567680

2020-07-23 00:04:00 3374108672

2020-07-23 00:03:00 3374108672

2020-07-23 00:02:00 3374108672

2020-07-23 00:01:00 3374108672

2020-07-23 00:00:00 3374108672

2020-07-22 23:59:00 3200454656

2020-07-22 23:58:00 3200454656

2020-07-22 23:57:00 3200454656

2020-07-22 23:56:00 3200454656

2020-07-22 23:55:00 3200454656

2020-07-22 23:54:00 3199623168

2020-07-22 23:53:00 3199623168

2020-07-22 23:52:00 3199623168

2020-07-22 23:51:00 3199623168

2020-07-22 23:50:00 3199623168

2020-07-22 23:49:00 3197206528

2020-07-22 23:48:00 3197206528

2020-07-22 23:47:00 3197206528

2020-07-22 23:46:00 3197206528

2020-07-22 23:45:00 3197206528
 
nothing helped - changing machine type to q35, disabling qemu agent, changing disk bus/device type to scsi. any more suggestions?
 
I've noticed a simlar problem with the cluster I look after.
I see a slow unexplainable memory growth 10GB to 50GB above VM allocation over 2 to 4 weeks which disapears by migrating the VM between hosts.
I'm also using NFS for the storage but using RAW format, however most of my Guest machines are Windows.

I've seen comments in the forums about not using NFS because of not being well supported or something like that.
I tried switching to CIFS but that was much worse than NFS, Still had leaks but also had frequent timeouts.
I think switching from qcow2 to raw help to slow down the speed of the memory growth but I have not found this change has eliminated it.

Are you able to try local storage of the VM Disk?

You can use the Move disk option (while the VM is still operating) and select local storage to the server and see if the problem persists?

If your host machine has more than 1 CPU Socket installed, your linux VM may benefit from = numa: 1
The Windows 2000 boxes may benefit from the latest Virtio Drivers; (numa is not supported in Windows 2000. Only from Windows 7 & 2008 R2)
I found a Node with 4x 2008 R2's didn't have the same memory growth as other nodes with mixed Windows OS types.
I'm experimenting with another node by migrating 5 VMs to it running 2008 R2.
Also trying the latest Virtio drivers for the 10, 2012, 2016, 2019 VM's to see if that helps the other nodes memory growth.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!