Memory leak after backup jobs

mishki

Active Member
May 1, 2020
72
11
28
37
Memory leak, not freeing memory
83 threads of proxmox-backup-server
since host reboot, not a single VM has been started
only backup jobs
pve+pbs installed on btrfs raid1 (clean install 7.0 & 2.0 and updated to 7.1 & 2.1)

Bash:
root@omega:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-7 (running version: 7.1-7/df5740ad)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.13.19-1-pve: 5.13.19-3
ceph-fuse: 15.2.13-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-network-perl: 0.6.2
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3
 

Attachments

  • Selection_059.png
    Selection_059.png
    133.7 KB · Views: 38
  • image_2021-11-27_001216.png
    image_2021-11-27_001216.png
    945.5 KB · Views: 39
Last edited:
Memory leak, not freeing memory
where do you see that? the proxmox-backup-proxy uses ~900MB RSS memory does not seem like much to me
83 threads of proxmox-backup-server
the number of threads mostly depend on what is running and how many cpu cores you use, but if they are not doing anything this should not really hurt...

since host reboot, not a single VM has been started
seems unrelated? did you set the vms to autostart?

only backup jobs
what do you mean here?

i can see that there is 50% memory usage reported... do you use zfs? if yes, this can take up to 50% of memory by default... to lower this see here: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage
 
  • Like
Reactions: mishki
where do you see that? the proxmox-backup-proxy uses ~900MB RSS memory does not seem like much to me
the number of threads mostly depend on what is running and how many cpu cores you use, but if they are not doing anything this should not really hurt...

seems unrelated? did you set the vms to autostart?


what do you mean here?

i can see that there is 50% memory usage reported... do you use zfs? if yes, this can take up to 50% of memory by default... to lower this see here: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage
No, autostart VM.

I completely forgot..

PVE is installed on btrfs
PVE Storages nfs, lvm, iscsi (no zfs)

BUT

storage PBS ('omega' host) is zfs with iscsi (there is no other storage option for iscsi yet)
When the backup tasks are started, the memory jumps by 40-50%, yes.

I am sorry to bother you.
 
pbs 2.1-2
there is a memory leak by the proxmox-backup-proxy process. during the execution of the backup by the client (the total size is about 3 TB), the consumption grows by 5 gigabytes, but is not released upon completion. Initially, this became noticeable when I transferred the storage to zfs and the monitoring system alerts began to bother me. Therefore, I had to abandon zfs in the direction of mdadm and I made a crutch - restarting the process at 12 noon, which helped solve the problem with alerting, but still this is not a good solution, because restarting this daemon stops the backup, which one day can shoot during the execution of a manually launched backup. mdadm raid6 ext4 storage
 
pbs 2.1-2
there is a memory leak by the proxmox-backup-proxy process. during the execution of the backup by the client (the total size is about 3 TB), the consumption grows by 5 gigabytes, but is not released upon completion. Initially, this became noticeable when I transferred the storage to zfs and the monitoring system alerts began to bother me. Therefore, I had to abandon zfs in the direction of mdadm and I made a crutch - restarting the process at 12 noon, which helped solve the problem with alerting, but still this is not a good solution, because restarting this daemon stops the backup, which one day can shoot during the execution of a manually launched backup. mdadm raid6 ext4 storage
can you add a bit more info? which process exactly grows in memory (can you post output of free/ps or a screenshot of top/htop) ?
was any other operation running at that time (gc/verify/etc.) ?
 
can you add a bit more info? which process exactly grows in memory (can you post output of free/ps or a screenshot of top/htop) ?
was any other operation running at that time (gc/verify/etc.) ?
after backup htop 1642147355887.png
prometheus node exporter
1642147603406.png
other task - gc and prune daily - are currently completed, but the process has not released the memory.
 
Last edited:
ok thanks. is that a vm or a ct backup? does it rise again when you start another backup?
 
ok thanks. is that a vm or a ct backup? does it rise again when you start another backup?
this is a backup of files from the host - not vm or ct. if you do not reset the memory by reloading the process, then the next day the consumption will increase by a few percent, and so on until it takes up all the free
 
hi,

just fyi, we did some investigation and it seems the memory allocator is at fault here.
basically the program 'frees' the memory as it should, but the allocator (from glibc) does not return the memory to the os again like expected.
we'll investigate further to find a solution, but no idea whats the timeframe here..

in the meantime if you use your workaround of restarting the daemon, you could use 'reload' instead,
this way the old daemons are kept as long as there is a task running (so it won't cancel your running backups, etc)

edit: i forgot to mention, this memory is not actually lost. the daemon can and will reuse that for other things, just the releasing does not really behaves as one would probably expect
 
Last edited:
  • Like
Reactions: SaymonDzen
  • Like
Reactions: SaymonDzen
we recently applied a fix[0] that should improve memory allocation/release behaviour, this will be included with proxmox-backup-server >= 2.1.5-1

Dominik, I just installed PBS, configured it and scheduled a complete backup of all the CTs/VMs of one of my nodes, backup completed ok (apart one CT that is a Docker system running 12 containers), but after that I noticed that the memory usage of my node was way more than it usually is, you can see from this graph that when at midnight the script started, it completed after 20m, but memory never got released.

What is the workaround for this while waiting for the patch? The patch affects only PBS or PVE too?

1643499994783.png
 
What is the workaround for this while waiting for the patch? The patch affects only PBS or PVE too?
the patch was only for pbs. the workaround is to 'reload' the proxmox-backup-proxy with 'systemctl reload proxmox-backup-proxy'
note that some tasks will be canceled by this (i think garbage collection or verify)
 
  • Like
Reactions: juliokele
Bu
the patch was only for pbs. the workaround is to 'reload' the proxmox-backup-proxy with 'systemctl reload proxmox-backup-proxy'

Thanks for the answer Dominik, the problem is that I'm having the RAM issue on the PVE node, not on the PBS one. Would the reload also affect the PVE node? Right now the only way I found to release that memory is restarting the node, but I'd like to avoid it obviously.

I just issued the reload command on the PBS node, and the ram usage on the PVE node is still the same. I'm restarting the node now, but it's starting to get annoying, since I do a backup every 6 hours.
 
Last edited:
the problem is that I'm having the RAM issue on the PVE node

Are you using ZFS for your VM/CT storage (which gets backuped)?
If yes, it is the ARC which fills up while the backup is running and only frees up memory when some other process needs it.

See dcsapak's first answer (last paragraph) in this thread.
 
If yes, it is the ARC which fills up while the backup is running and only frees up memory when some other process needs it.

Thank you for the answer, I guess that's the "problem" (it actually isn't from what I read), since I'm using ZFS. I'm still at the beginning of the learning curve, didn't optimize things yet as I'm still reading docs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!