Memory leak after backup jobs

mishki · Nov 26, 2021

Memory leak, not freeing memory
83 threads of proxmox-backup-server
since host reboot, not a single VM has been started
only backup jobs
pve+pbs installed on btrfs raid1 (clean install 7.0 & 2.0 and updated to 7.1 & 2.1)

Bash:

root@omega:~# pveversion -v
proxmox-ve: 7.1-1 (running kernel: 5.13.19-1-pve)
pve-manager: 7.1-7 (running version: 7.1-7/df5740ad)
pve-kernel-5.13: 7.1-4
pve-kernel-helper: 7.1-4
pve-kernel-5.13.19-1-pve: 5.13.19-3
ceph-fuse: 15.2.13-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-5
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-14
libpve-guest-common-perl: 4.0-3
libpve-http-server-perl: 4.0-4
libpve-network-perl: 0.6.2
libpve-storage-perl: 7.0-15
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.1.2-1
proxmox-backup-file-restore: 2.1.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-4
pve-cluster: 7.1-2
pve-container: 4.1-2
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-3
pve-ha-manager: 3.3-1
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.1-4
smartmontools: 7.2-1
spiceterm: 3.2-2
swtpm: 0.7.0~rc1+2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve3

dcsapak · Nov 29, 2021

mishki said:
Memory leak, not freeing memory

where do you see that? the proxmox-backup-proxy uses ~900MB RSS memory does not seem like much to me

mishki said:
83 threads of proxmox-backup-server

the number of threads mostly depend on what is running and how many cpu cores you use, but if they are not doing anything this should not really hurt...

mishki said:
since host reboot, not a single VM has been started

seems unrelated? did you set the vms to autostart?

mishki said:
only backup jobs

what do you mean here?

i can see that there is 50% memory usage reported... do you use zfs? if yes, this can take up to 50% of memory by default... to lower this see here: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage

mishki · Nov 29, 2021

dcsapak said:
where do you see that? the proxmox-backup-proxy uses ~900MB RSS memory does not seem like much to me
the number of threads mostly depend on what is running and how many cpu cores you use, but if they are not doing anything this should not really hurt...

seems unrelated? did you set the vms to autostart?

what do you mean here?

i can see that there is 50% memory usage reported... do you use zfs? if yes, this can take up to 50% of memory by default... to lower this see here: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage

No, autostart VM.

I completely forgot..

PVE is installed on btrfs
PVE Storages nfs, lvm, iscsi (no zfs)

BUT

storage PBS ('omega' host) is zfs with iscsi (there is no other storage option for iscsi yet)
When the backup tasks are started, the memory jumps by 40-50%, yes.

I am sorry to bother you.

SaymonDzen · Jan 13, 2022

pbs 2.1-2
there is a memory leak by the proxmox-backup-proxy process. during the execution of the backup by the client (the total size is about 3 TB), the consumption grows by 5 gigabytes, but is not released upon completion. Initially, this became noticeable when I transferred the storage to zfs and the monitoring system alerts began to bother me. Therefore, I had to abandon zfs in the direction of mdadm and I made a crutch - restarting the process at 12 noon, which helped solve the problem with alerting, but still this is not a good solution, because restarting this daemon stops the backup, which one day can shoot during the execution of a manually launched backup. mdadm raid6 ext4 storage

dcsapak · Jan 14, 2022

SaymonDzen said:
pbs 2.1-2
there is a memory leak by the proxmox-backup-proxy process. during the execution of the backup by the client (the total size is about 3 TB), the consumption grows by 5 gigabytes, but is not released upon completion. Initially, this became noticeable when I transferred the storage to zfs and the monitoring system alerts began to bother me. Therefore, I had to abandon zfs in the direction of mdadm and I made a crutch - restarting the process at 12 noon, which helped solve the problem with alerting, but still this is not a good solution, because restarting this daemon stops the backup, which one day can shoot during the execution of a manually launched backup. mdadm raid6 ext4 storage

can you add a bit more info? which process exactly grows in memory (can you post output of free/ps or a screenshot of top/htop) ?
was any other operation running at that time (gc/verify/etc.) ?

SaymonDzen · Jan 14, 2022

dcsapak said:
can you add a bit more info? which process exactly grows in memory (can you post output of free/ps or a screenshot of top/htop) ?
was any other operation running at that time (gc/verify/etc.) ?

after backup htop

prometheus node exporter

other task - gc and prune daily - are currently completed, but the process has not released the memory.

dcsapak · Jan 14, 2022

ok thanks. is that a vm or a ct backup? does it rise again when you start another backup?

SaymonDzen · Jan 14, 2022

dcsapak said:
ok thanks. is that a vm or a ct backup? does it rise again when you start another backup?

this is a backup of files from the host - not vm or ct. if you do not reset the memory by reloading the process, then the next day the consumption will increase by a few percent, and so on until it takes up all the free

dcsapak · Jan 18, 2022

hi,

just fyi, we did some investigation and it seems the memory allocator is at fault here.
basically the program 'frees' the memory as it should, but the allocator (from glibc) does not return the memory to the os again like expected.
we'll investigate further to find a solution, but no idea whats the timeframe here..

in the meantime if you use your workaround of restarting the daemon, you could use 'reload' instead,
this way the old daemons are kept as long as there is a task running (so it won't cancel your running backups, etc)

edit: i forgot to mention, this memory is not actually lost. the daemon can and will reuse that for other things, just the releasing does not really behaves as one would probably expect

SaymonDzen · Jan 18, 2022

hi, @dcsapak!
Thanks, "reload" is a much better workaround.

dcsapak · Jan 28, 2022

just fyi, we recently applied a fix[0] that should improve memory allocation/release behaviour, this will be included with proxmox-backup-server >= 2.1.5-1

0: https://git.proxmox.com/?p=proxmox-...ff;h=d91a0f9fc90aecabc4f359d968f716a14562ce78

alexdelprete · Jan 30, 2022

dcsapak said:
we recently applied a fix[0] that should improve memory allocation/release behaviour, this will be included with proxmox-backup-server >= 2.1.5-1

Dominik, I just installed PBS, configured it and scheduled a complete backup of all the CTs/VMs of one of my nodes, backup completed ok (apart one CT that is a Docker system running 12 containers), but after that I noticed that the memory usage of my node was way more than it usually is, you can see from this graph that when at midnight the script started, it completed after 20m, but memory never got released.

What is the workaround for this while waiting for the patch? The patch affects only PBS or PVE too?

dcsapak · Jan 31, 2022

alexdelprete said:
What is the workaround for this while waiting for the patch? The patch affects only PBS or PVE too?

the patch was only for pbs. the workaround is to 'reload' the proxmox-backup-proxy with 'systemctl reload proxmox-backup-proxy'
note that some tasks will be canceled by this (i think garbage collection or verify)

alexdelprete · Jan 31, 2022

Bu

dcsapak said:
the patch was only for pbs. the workaround is to 'reload' the proxmox-backup-proxy with 'systemctl reload proxmox-backup-proxy'

Thanks for the answer Dominik, the problem is that I'm having the RAM issue on the PVE node, not on the PBS one. Would the reload also affect the PVE node? Right now the only way I found to release that memory is restarting the node, but I'd like to avoid it obviously.

I just issued the reload command on the PBS node, and the ram usage on the PVE node is still the same. I'm restarting the node now, but it's starting to get annoying, since I do a backup every 6 hours.

Neobin · Jan 31, 2022

alexdelprete said:
the problem is that I'm having the RAM issue on the PVE node

Are you using ZFS for your VM/CT storage (which gets backuped)?
If yes, it is the ARC which fills up while the backup is running and only frees up memory when some other process needs it.

See dcsapak's first answer (last paragraph) in this thread.

alexdelprete · Jan 31, 2022

Neobin said:
If yes, it is the ARC which fills up while the backup is running and only frees up memory when some other process needs it.

Thank you for the answer, I guess that's the "problem" (it actually isn't from what I read), since I'm using ZFS. I'm still at the beginning of the learning curve, didn't optimize things yet as I'm still reading docs.

Search

Search

Memory leak after backup jobs

mishki

Well-Known Member

Attachments

dcsapak

Proxmox Staff Member

mishki

Well-Known Member

SaymonDzen

New Member

dcsapak

Proxmox Staff Member

SaymonDzen

New Member

dcsapak

Proxmox Staff Member

SaymonDzen

New Member

dcsapak

Proxmox Staff Member

SaymonDzen

New Member

dcsapak

Proxmox Staff Member

alexdelprete

Member

dcsapak

Proxmox Staff Member

alexdelprete

Member

Neobin

Distinguished Member

alexdelprete

Member

We value your privacy