Swap usage while running backups

pvginkel

New Member
May 19, 2024
3
0
1
I'm doing backups from fast (NVMe) drives to slow (ATA; 5400 RPM 130 MB/s write). Presumably, as a result of this, swap space fills up when I run backups, and it's mostly VM memory that gets swapped out. I have 8 GB of swap configured and it fills up completely. RAM usage of the server never comes under pressure.

The machine has 96 GB RAM. 84 of it is allocated (not over allocated; this is the sum of all RAM assignments of all running VMs), KSM sharing shows 8 GB. There should be plenty of RAM available. I'm not using the Proxmox backup server. I'm running Proxmox 8.4.1. I appreciate there's a more recent version. I'm not in a position right now to upgrade Proxmox. If this however is a known issue I will upgrade to the latest version and retest all of this.

I've already configured sysctl like this:

Code:
vm.dirty_bytes = 268435456
vm.dirty_background_bytes = 134217728

This didn't help either. I configured vm.swappiness to 0, 1 and 10. None of it helped. The only thing that did help is running zstd through nocache using this wrapper:

Code:
root@pve:~# cat /usr/bin/zstd
#!/bin/bash
exec nocache /usr/bin/zstd.real "$@"

I'm running at default vm.swappiness (60) now and swap isn't used anymore. I do see the buff/cache number from free growing as a machine is being backed up, but it goes down to a reasonable level again when a machine finishes backing up.

Right now my issue has been solved (or at least mitigated), but obviously I don't like the solution. I'm open to suggestions. I don't want to move backup storage to a fast disk, but I'm happy to try stuff, create custom built binaries and provide a patch if this could be helpful. Other suggestions are very welcome of course also.
 
Why do you care? Is there an actual slowdown of the VM that persists, or is this more of an aesthetic issue where you don't like to see swap use? From a system performance perspective it actually might be a good idea to put some lesser-used things in swap in order to have cache on a slow device.
 
There's an actual slowdown. I have a dev machine that I use during the day. Of the 16 GB of RAM allocated to the machine, 4 GB gets swapped out. It's a regular victim. When I use the machine it feels sluggish and sometimes slow. When I flush the swap (swapoff -a && swapon -a), the machine behaves normally again. If I look at vmstat 1 output, there's a constant trickle of pages being swapped in (say 20 a second). It's noticeable.

Because I have enough RAM available (free reports 13 GB free at the moment) I would like the issue resolved. I would like to allocate more of it to my VMs but I'm hesitant because of this issue.
 
You could set up ZRAM. Pictures of node > Summary and your command outputs will be more helpful than just descriptions so we can see your memory usage/pressure ourselves.
If KSM is used then there was likely some pressume and ballooning going in. Enough for SWAP to be absolutely needed? I can't tell. You can also monitor what gets swapped. I wonder if limiting the backup's IO bandwidth would have much of an effect here. Maybe also try if PBS works better here.
 
Apologies, I can't reliably reproduce the issue. I'm not sure why. Earlier today every time I would start a backup swap usage would grow quickly. It does not do this anymore. I've tried different things and I do get the swap usage to grow, but not in a reproducible manner. I can't pinpoint the cause. If I can again, I will post it back here.

As for limiting the bandwidth: I did this. It helped but I don't like it as a solution. The bandwidth limit is at the read side, not the write side. I limited the bandwidth to what 130 MB/s, but in actuality the write speed was a lot lower because of compression. I just keep coming back to the fact that there is plenty of RAM available in the system and I would like that to be usable. Maybe I'm completely misunderstanding something about Linux architecture, but I don't see how it would not be available to the VMs, or why the backup process would push out VM pages instead of freeing up file cache.

Btw I'm not using ZFS or Ceph. Just local disks.
 

Attachments

  • 1768498536386.png
    1768498536386.png
    73.9 KB · Views: 1