Daily backup tasks spike Linux RAM cache, never gets freed up, pushes into swap

BloodyIron · Apr 13, 2020

I have 3x Proxmox VE nodes in a cluster, and the issue I'm about describe has been problematic for me for years now. I've been trying to find the "proper" solution to this problem, but alas, I find more criticism for the method I use than real solutions.

A few years ago I learned about the performance impact of content being in swap (this is the same effect/cost between Windows and Linux systems). Firstly, as more content is in swap, that increases the disk usage for a function that in-effect is meant to extend RAM. If this happens across many hosts and VMs then the disk impact is compounding. Secondly, this also increases CPU usage to manage all the swap pages (the cost naturally increases as more swap is used).

As such, I set up my monitoring to alert me when any of my monitored systems hit 10% swap usage or higher. And now several of my systems regularly alert me every few days. Clearly there is a pattern and repeat cause here. When I check out each of these systems, they aren't even close to using all their RAM, often 50% RAM used by applications or lower. But in every single case, the Linux kernel has used up all the available RAM for cache.

Initially, I just flushed swap with "sudo swapoff -a && sudo swapon -a && htop" so that the swap flushed back into RAM and htop so I can see the task is done. But as I continued to work through this aspect, I observed that swap was refilling very frequently, and that dropping the caches bought me several days worth of time before it refilled. So I have been also manually issuing "sync; echo 3 > /proc/sys/vm/drop_caches". After doing additional recent research into this topic, I am now switching that to "sync; echo 1 > /proc/sys/vm/drop_caches".

Up until this point, I have been doing this manually, while I try to identify not only the actual proper thing I should be doing, but the cause. I believe I have identified the cause, but I am still leaning towards a practice (dropping caches) that is widely criticised as "cargo cult systems administration", as in, a reckless "solution" without actually understanding the cause or problem properly.

Yesterday I took a deeper look at my monitoring history, and I think I have identified the primary "cause" of the cache usage spikes, but why the cache doesn't flush itself does not make sense to me.

Now, I aspire to be as good a sys admin as I possibly can, always looking to improve. Naturally, I have a daily task which does a full VM disk image backup of each of the VMs I care about in my cluster. The task starts at 1am, every day, and varies from one node to the next, but usually takes about 3 hours to complete.

Now, what did I notice when I looked at my monitoring patterns yesterday? The cache jumps happen when the backup tasks happen. And the caches _never_ clear themselves. Each backup task each day increases the Linux kernel cache on each node, and after a few days starts just using swap. Let me reiterate, there is PLENTY of unused RAM on each of the nodes, I keep them at 50% usage or less. I do not have ballooning enabled on any VM either. I'm running about 30 VMs, and only one is Windows, one is Debian, one is pfSense and the rest is Ubuntu. Additionally, all nodes in the cluster are fully updated as of a few days ago, and again, this effect has been happening over the years, so this is not localised to a specific version so far as I can tell.

So, at this point, I'm thinking of just setting up a daily cron job on each node at say 5am to run "sync; echo 1 > /proc/sys/vm/drop_caches", but I'm really thinking there must be a better way to do this.

What can you folks tell me here? I'm really not finding any actual other solutions anywhere on the internet.

BloodyIron · Apr 16, 2020

I can't even find this thread in the forum, what's going on here? :/

edit: oh it's showing up now suddenly, what. Anyone have any thoughts on this topic btw?

fabian · Apr 17, 2020

using memory for caching is not bad, it's good

the question is what gets swapped out, and whether that affects performance. could you post more details about which values exactly you are monitoring?

BloodyIron · Apr 17, 2020

The cache never gets freed up, that's the problem. And it only gets cached during the backup process, so clearly day to day the cache isn't needed. All the VMs are on a NAS, they're just local compute.

I'm monitoring percentage of swap used, 10% or higher and I get alerted.

fabian · Apr 20, 2020

no, cached memory not being freed up is not a problem (it will be freed and re-used once something actually needs that memory). this is a common misconception when coming from the windows world, where cached memory is counted and displayed as free and thus less visible.

the question is what gets swapped out (swap is only expensive if you need to swap out and in all the time, swapping out never used stuff does not hurt performance).

BloodyIron · Apr 20, 2020

Uh the cache never gets freed up. That's the problem. Before the backup task the cache does not grow, and it does not get released between backup tasks. Nothing goes into swap until the cache uses all the remaining RAM.

I would be okay with the cache growing if it actually reclaimed or freed between backup tasks, but it doesn't. It actually grows and compounds each time.

fabian · Apr 20, 2020

yes - but that is okay. memory used for cache is better than memory that is not used at all. so it's not surprising that old cached stuff remains cached and unused memory is used to cache new stuff. caches are only supposed to get dropped if something actually wants to use that memory (either for newer cached data, or for regular usage).

BloodyIron · Apr 20, 2020

I disagree in this particular case, primarily because the perpetually growing cache pushes data into swap. And you may not see why I care about that, but it increases disk usage in ways that are completely unnecessary and avoidable, plus offer no benefit.

Day to day the VMs running on the host do not cause the cache to grow the same way the backup task does. In my observation the VMs never cause the cache to fill up fully, and in-turn never cause swap to be used on the cluster nodes, that is strictly done by the backup task. Er go, the VMs themselves do not actually benefit from this cache growth, as it is strictly involved by the backup task.

If the backup task properly released the cache this would not be a problem, however it does _not_ release the cache, and each day the cache grows until it pushes into swap, and that continues to grow further too, until I drop the caches and push the content out of swap.

When I drop the caches, I do not observe any drop in performance for the VMs.

I would also like to point out that the VMs have their VM Disks on NFS storage, not local storage, so the VM disks do not need to write to local disk, but to network storage.

To put it another way, if I didn't run periodic backups of the VMs, this cache and swap issue would not be a thing. But well, backups are needed.

Swap causing disk usage is the same effect between Windows and Linux. It is in-effect an extension of RAM that operates on disk. Whenever it is read/modified, etc, that works against a disk, not RAM. And because my nodes are only using about 50-60% (or less) of their RAM for memory allocated to VMs, the available RAM should be plenty for caching, and yet it is not. This is a growth pattern that never cleans itself up. And I'm hoping I can find a better way to address this than dropping caches, because to-date I have not seen a more appropriate method than that.

fabian · Apr 21, 2020

if you don't want to swap, then don't enable swap?

if you want to find out what's going on there, you need to investigate yourself:
- what exactly is getting cached where (page cache? something else?)
- is it associated with the VM process, or cached by the kernel in general?
- is it storage specific (both VM and backup target storage)?
- what exactly is getting swapped out? is it ever swapped back in during regular operation?

but again, stuff being put into cache is good, not bad. UNUSED stuff getting swapped out is good (it frees up memory for use!). the only bad thing is frequently USED stuff getting swapped out, because you need to swap it back in at some later point, and that will hurt performance.

BloodyIron · Apr 21, 2020

I'm not against stuff being put into cache, the issue is the cache never gets cleared, or reduced, after a backup task. And that's what I'm trying to address here, as it is the root cause.

I don't want to turn off swap because it's meant to be there for emergency purposes, not day to day operations, hence me wanting to generally keep stuff out of swap, but keep swap on. I want it to continue to work if it's pushing into swap and I'm not around to address it.

So, why is the cache from the backup task not getting released?

fabian · Apr 21, 2020

again - you need to give more information what kind of cached memory you are talking about. the backup task does not cache anything explicitly, so it's likely that writing the backup archive fills up the page cache of the kernel. that will only get freed when the memory is needed by something else. that is how FS access works in linux.

LnxBil · Apr 23, 2020

BloodyIron said:
I'm not against stuff being put into cache, the issue is the cache never gets cleared, or reduced, after a backup task. And that's what I'm trying to address here, as it is the root cause.

That's not how cache works, in no system I ever saw. You normally have an expiration strategy like LRU and pages not used in a while get automatically overwritten if needed. The same with swap. Stuff that is swapped out is not bad. It's bad when you constantly swap in and out. Therefore the correct method of monitoring swap is the pagein/pageout rate. Please see e.g. here.

BloodyIron · Apr 24, 2020

Well I'm trying to convey what I'm seeing here, if you don't believe me that's on you. I've spent a very large amount of time trying to identify the cause of this situation and the only consistent aspect is the backups and that the cache is not getting freed up and is pushing into swap. You say it doesn't work like that, well I'm sorry to burst your bubble but it sure is behaving like that.

I'm not feeling attacked at the disbelief here, but the very reason I'm bringing it up is because the responses I'm seeing are the exact same info I'm seeing elsewhere, and it's NOT solving the problem.

I need a solution here, not just arguments adding up to me being "wrong". Because if that's all I'm going to receive, I'm just going to stop asking for help, stop trying to get my problem solved in a "proper" way and just use my "hacky" solution. I know nobody is beholden to me to solve my problems, but that's one of the functions of the forums, to get help, and well this really isn't helping me.

I have given you all the info I have, and I will gladly collect more info. If you want me to gather more info, tell me what and how and I will.

But if you're (the person reading this, whomever that is) just going to dismiss what I have to say as false, well then please just don't even bother responding to this thread, because you're not helping, you're literally just saying I'm wrong without giving me a solution, and I'm tired of that.

BobhWasatch · Apr 24, 2020

Do you have an actual problem though, or is it just that you don't like the look of what you're seeing? Is there any measureable difference in VM performance before and after the backup?

I think what people are trying to tell you is that it doesn't matter how much swap is used if unused stuff is just sitting there.

BloodyIron · Apr 25, 2020

The swap usage is unnecessary in this scenario, and the cause of it is the backup task. The swap usage causes unnecessary disk usage that I would rather have allocated for other tasks. That's the issue. The backup task is not releasing the cache and it continues to build up until it goes into swap. This is an issue for me.

I was hoping to find a solution that would actually cause the backup task to release the cache before the next task is done, but this seems to validate what I thought all long, that the sync and drop of caches is the best way to address this, and I have yet to see evidence to a better solution.

BobhWasatch · Apr 25, 2020

Disk usage? What do you mean by that? Linux does not grow the swap size like Windows does. That space is reserved even if it is not used. If you want it to be smaller, or zero, just change it.

It seems like there is no actual problem here.

ETA: The phrasing "the backup task is not releasing the cache" makes me think you have a misguided idea of what is happening. The backup task does not control the cache, the kernel does. So when the backup starts the kernel grows the cache because there is a lot of disk activity. When it is done, the kernel doesn't release the cache because, apparently, nothing else needs the memory. The stuff that was swapped out likewise stays in swap because there's been no reason to page it back in (and it was swapped in the first place because it wasn't used recently).

You should ask yourself what possible benefit there might be for the kernel to shrink the cache to create free memory? It can create free memory any time it is needed by just grabbing pages from the cache. There's no extra overhead for that compared to grabbing an already free page.

BloodyIron · Apr 25, 2020

Swap exists on disk. This is the case whether it is Linux or Windows.

Just because you think there is no problem, doesn't mean there is no problem. I've explained this enough to you, and you have not provided any help or solution here. I'm done explaining. I've explained this more than enough and yet you don't understand. Clearly this isn't something you can help me with.

LnxBil · May 1, 2020

BloodyIron said:
Well I'm trying to convey what I'm seeing here, if you don't believe me that's on you.

I'm a well trained academic, so doubt is deeply fused in my DNA. You have neither provided data, nor text output, nor a graph ... nothing. Therefore, you get answers like the ones you got, which were in-fact all correct with respect to how caching and swap work on linux.

BloodyIron said:
I'm not feeling attacked at the disbelief here, but the very reason I'm bringing it up is because the responses I'm seeing are the exact same info I'm seeing elsewhere, and it's NOT solving the problem.

Yes, you will see those answers everything, because they are how it works. I don't say you're wrong in what you see, but maybe we look at the problem from the wrong angle? Why are we not seeing those problems on our clusters? What is different?

But back on the problem: We simply need data. You said you had monitoring, can you provide us with memory, swapin/swapout and load data (text , graphs) on which we can "see" what is going on?

Etienne Charlier · May 2, 2020

Just my hypothesis

A reminder: swapped out memory is not expensive by itself. high swap in rate is BAD ( load needed page from swap )

At one point (backup or whatever) , there is pressure on memory, some memory pages are swapped out to make room (for cache or whatever)..
So swap usage rose ( number of pages moved to swap file)

After backup memory pressure lowers.
.
IF ( IF) the content of these swapped out pages is not needed ( ex: code to init/boot the system) why would the kernel swap them in ?
Pages containing unused data/code ( eg: device driver initialization code ...) are at a better place in the swap file than occupy valuable ram

Monitoring swap in rate would give a better view on perf than swap file usage.

budy · May 2, 2020

Well, this is of course all true in regards how the kernel uses swap space, but it is also quite interesting, that my single server at home, which runs its 13 guests of a zpool, has managed to aquire more swap space than each of my clustered hosts I am running at work. Where the pve hosts at work not only run way more guests per host, but are also running their guests from RBDs.

All of them are performing backups, where the pve hosts at work do it every night, whereas my home server only on saturdays.

Daily backup tasks spike Linux RAM cache, never gets freed up, pushes into swap

Renowned Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Distinguished Member

Renowned Member

Famous Member

Renowned Member

Famous Member

Renowned Member

Distinguished Member

Well-Known Member

Active Member