High Memory During Backup

jjtech

New Member
Mar 10, 2024
3
1
3
Hello, I'm fairly new to proxmox and running into some trouble with backups. I configured a VM backup job writing to an NFS share with mostly default options (Snapshot, ZSTD compression). Within a minute or two of starting, the system and all VM's became unresponsive. After rebooting I tried again while watching system resources. My proxmox host has 16GB memory, about half of which is generally free. During the next backup attempt, I watched the memory fill up until everything became unresponsive again.

The answer might be as simple as, "you just need more memory or it won't work", but I'm not sure if I'm missing any critical configuration. Any help appreciated.
 
While a backup runs, new write actions need to be buffered to assure consistency. Maybe that's what takes up all your memory if the backup takes a while? Do you have the same problem when the VM is not running during a backup?
 
Can you backup to local storage and then move things to NFS?
A bit more hassle, but I can try that and see if it makes a difference in case there is some bottleneck with the NFS I guess. Thanks for the idea.

While a backup runs, new write actions need to be buffered to assure consistency. Maybe that's what takes up all your memory if the backup takes a while? Do you have the same problem when the VM is not running during a backup?
That makes sense I suppose. I tried pausing the VM's first. It went better, but still failed. With them running usually it failed about halfway through the first VM. This time it finished the first and got to 97% on the second, hovering around 300MB free memory the whole time, but then became unresponsive again.
 
  • Like
Reactions: Kingneutron
How much swap do you have allocated?
I don't think I've ever explicitly changed anything there. Looks like 8 gigs.

Bash:
free -h
               total        used        free      shared  buff/cache   available
Mem:            15Gi       6.4Gi       8.7Gi        32Mi       562Mi       8.9Gi
Swap:          8.0Gi          0B       8.0Gi
 
Ok so i hate to open up new threads if some of these here exist already but have no real solution.
I have the described problem above too.
My server has 128GB RAM, few HDD and SSD in it.
My PBS is running in a container, writing on a SMB share of a truenas, which has 4 platters running raidz1.
My Containers and VM's are running on SSD.
Usually i have 60-80GB of memory usage.

Now when i start a backup, or when it gets started according to plan, it uses Memory, more and more, until everything gets unresponsive and then i need to wait for like 1-2 hours, then all is finished and goes back to mostly normal, with some VM or containers suddenly stopped.

I installed netdata, because i didnt see the memory being used in the pve overview. There i could see the real memory usage, going up to 100% and thats it, no more response.
It doesnt need to be a full VM or Container backup. Just one container is all it needs to fill everything up and make the system unresponsive again.

So what is happening here? why is it all stored into ram first before being written to the final destination?
The temporary backup location is a separate folder on my backup Drive, that exists fully alone.
 
So what is happening here? why is it all stored into ram first before being written to the final destination?
If your VMs/containers are still running during backup, the new data written by the VM/container needs to be saved somewhere. This can take a lot when there is much I/O is going on.
Try enabling fleecing so that new data is written immediately (after the old data is stored on the fleece storage). Maybe search the forum a bit about fleecing.

EDIT: Turns out I was completely wrong about how backups work, sorry.
 
Last edited:
Hi,
Ok so i hate to open up new threads if some of these here exist already but have no real solution.
I have the described problem above too.
My server has 128GB RAM, few HDD and SSD in it.
My PBS is running in a container, writing on a SMB share of a truenas, which has 4 platters running raidz1.
My Containers and VM's are running on SSD.
Usually i have 60-80GB of memory usage.

Now when i start a backup, or when it gets started according to plan, it uses Memory, more and more, until everything gets unresponsive and then i need to wait for like 1-2 hours, then all is finished and goes back to mostly normal, with some VM or containers suddenly stopped.

I installed netdata, because i didnt see the memory being used in the pve overview. There i could see the real memory usage, going up to 100% and thats it, no more response.
It doesnt need to be a full VM or Container backup. Just one container is all it needs to fill everything up and make the system unresponsive again.

So what is happening here? why is it all stored into ram first before being written to the final destination?
The temporary backup location is a separate folder on my backup Drive, that exists fully alone.
please share the output of pveversion -v. What task is taking up the memory? Note that ZFS can take up to 50% of memory by default, so if you already use 60-80 GiB regularly, it might be too much: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_zfs_limit_memory_usage

If your VMs/containers are still running during backup, the new data written by the VM/container needs to be saved somewhere. This can take a lot when there is much I/O is going on.
Try enabling fleecing so that new data is written immediately (after the old data is stored on the fleece storage). Maybe search the forum a bit about fleecing.
Without fleecing the newly written data is immediately written to the backup storage, so this shouldn't happen.
 
Without fleecing the newly written data is immediately written to the backup storage, so this shouldn't happen.
Sorry if this is off-topic but I hope you can spare the time to explain it to me.
If the VM (over)writes data then the old data needs to be preserved somewhere (if it was not backed up yet), right? Maybe it's different between block and file storage?
I though fleecing stores the old data (in the fleece disk and can be discarded after the backup), so the new data can be written to the virtual disks while the backup runs.
Without fleecing, the new data must be preserved (in memory?), waiting for the backup to finish (and can corrupt the VM if the backup fails in some way that prevents writing the new data)?
 
Sorry if this is off-topic but I hope you can spare the time to explain it to me.
If the VM (over)writes data then the old data needs to be preserved somewhere (if it was not backed up yet), right? Maybe it's different between block and file storage?
No, there is no difference, a VM backup is always taken on the block level. If a new guest write comes in, the following happens:
1. if that block was already backed up, the write just happens
2. if the block was not yet backed-up, the old/current data is backed up to the backup target first, then the write happens overwriting the old data.
I though fleecing stores the old data (in the fleece disk and can be discarded after the backup), so the new data can be written to the virtual disks while the backup runs.
Without fleecing, the new data must be preserved (in memory?), waiting for the backup to finish (and can corrupt the VM if the backup fails in some way that prevents writing the new data)?
It's not preserved in memory, but written directly to the backup target rather than the fleecing disk.
 
  • Like
Reactions: leesteken
No, there is no difference, a VM backup is always taken on the block level. If a new guest write comes in, the following happens:
1. if that block was already backed up, the write just happens
2. if the block was not yet backed-up, the old/current data is backed up to the backup target first, then the write happens overwriting the old data.
so i had a bit of testing now, and i found something quite interesting.

I reverted the vzdump config back to its default, means the temp files get shoved to their default position, which is in /var/tmp, and i can see that. I see that the files are getting created there, and that the SSD fills up quickly. No RAM is really used whatsoever. I watched it till my local partition was almost full before aborting it. Since this is the local disc, it has not really alot of space.
So in the default configuration, it does as expected.

Then i did it again, this time using my SSD as backup file for tmpdir in the config. Turns out that i had the same thing happen where it fills up the RAM till its unresponsive.
im not quite sure why it is happening. Is it maybe the path? i simply give the path /dev/SSD/vzdump (which exists with (experimental) 777 permission.
i saw a folder being created, but then the ram did fill up again and i aborted it.

So im not sure why it is happening. The default seems to be ok. If i change the directory, its not working. Is it the path? The permissions maybe? is the Heavy RAM usage a fallback option?
I wish there would be a GUI option to change temporary file storage for vzdump easily.
 
so i had a bit of testing now, and i found something quite interesting.

I reverted the vzdump config back to its default, means the temp files get shoved to their default position, which is in /var/tmp, and i can see that. I see that the files are getting created there, and that the SSD fills up quickly. No RAM is really used whatsoever. I watched it till my local partition was almost full before aborting it. Since this is the local disc, it has not really alot of space.
So in the default configuration, it does as expected.
But that is a container backup right? What I wrote above only applies for VM backups.
Then i did it again, this time using my SSD as backup file for tmpdir in the config. Turns out that i had the same thing happen where it fills up the RAM till its unresponsive.
im not quite sure why it is happening. Is it maybe the path? i simply give the path /dev/SSD/vzdump (which exists with (experimental) 777 permission.
Is that your actual path? You shouldn't use mount points below /dev.
i saw a folder being created, but then the ram did fill up again and i aborted it.
So im not sure why it is happening. The default seems to be ok. If i change the directory, its not working. Is it the path? The permissions maybe? is the Heavy RAM usage a fallback option?
I wish there would be a GUI option to change temporary file storage for vzdump easily.
Please share your storage configuration cat /etc/pve/storage.cfg, the container configuration pct config <ID> and the full backup task log.

Did you already try limiting ZFS arc usage like suggested above? Also check what task is consuming the RAM during backup.
 
So my storage.cfg is as follows:
dir: local
path /var/lib/vz
content iso,vztmpl
shared 0

lvmthin: local-lvm
thinpool data
vgname pve
content rootdir,images

lvm: SSD
vgname SSD
content rootdir,images
nodes pve
shared 0

lvm: BackupHDD
vgname BackupHDD
content rootdir,images
nodes pve
shared 0

pbs: PBS
datastore Truenas
server 192.168.178.24
content backup
fingerprint ";D"
prune-backups keep-all=1
username root@pam

Im not quite sure what container you might want, but i gess the one i have the problems with, or atleast the one that is big enough to make my server stumble.

arch: amd64
cores: 8
description:
features: keyctl=1,nesting=1
hostname: openwebui
memory: 16384
net0: name=eth0,bridge=vmbr0,gw=192.168.178.1,hwaddr=BC:24:11:5D:03:31,ip=192.168.178.249/24,type=veth
onboot: 1
ostype: debian
rootfs: SSD:vm-119-disk-0,size=136G
startup: up=30
swap: 2048
tags: ai;community-script;interface
unprivileged: 1
lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 235:* rwm
lxc.cgroup2.devices.allow: c 236:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

The thing is, i would have this with every VM/Container i try to make a backup with, thats big enough.

What im currently doing is put these images onto a truenas datastore, while my whole truenas data gets pushed onto tape every 2 weeks. (yes, Tape)
And as to the question if that is my mount point currently for the temporary vzdump files. Yes.
Only because i havent really figured out how to mount a directory i could create in the PBS storage yet. Im very experimental