Backup Caused Complete Swap Consumption

Mar 10, 2017
31
1
28
Ohio
primeserversinc.com
Hi All,

So we've been running Proxmox for a while here at work, and while virtualizing machines as we move to phase out the legacy equipment in our rack we've temporarily cloned some of these machines to our cluster (currently sitting at 5 nodes). The issue we're having is when we went to run a full backup yesterday and the backup started on the largest temp. VM we had (HDD of 5+TB), Our 100GB swap filled after about 4 hours and caused almost a complete slow-down in the system. We stopped the backup, and did what we could to bring swap back to a lower figure (albeit swapoff && swapon caused a complete lockup once it started flushing out swap prompting a hard reboot).

My questions are: Why is this happening? Can I prevent it? Should I be doing something different with regards to how my cluster is setup? My supervisor is now worried that Proxmox isn't scalable well enough to handle 4+TB VMs, since our main file server now runs on proxmox, we can't have these systems locking up and forcing a reboot from a single backup.

Let me know what info/screenshots are needed and i'll happily facilitate!


System Info:

Code:
root@vhost1:~# pveversion -v
proxmox-ve: 6.4-1 (running kernel: 5.4.114-1-pve)
pve-manager: 6.4-6 (running version: 6.4-6/be2fa32c)
pve-kernel-5.4: 6.4-2
pve-kernel-helper: 6.4-2
pve-kernel-5.4.114-1-pve: 5.4.114-1
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.78-2-pve: 5.4.78-2
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.2-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.4-1
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.4-3
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.2-2
libpve-storage-perl: 6.4-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.1.6-2
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.5-5
pve-cluster: 6.4-1
pve-container: 3.3-5
pve-docs: 6.4-2
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-3
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-6
pve-xtermjs: 4.7.0-3
qemu-server: 6.4-2
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1
Code:
root@vhost1:~# pvecm status
Cluster information
-------------------
Name:             cluster
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue May 25 11:09:45 2021
Quorum provider:  corosync_votequorum
Nodes:            5
Node ID:          0x00000003
Ring ID:          1.a1b
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      5
Quorum:           3 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.60.4.123
0x00000002          1 10.60.4.124
0x00000003          1 10.60.4.121 (local)
0x00000004          1 10.60.4.122
0x00000005          1 10.60.4.132
 
Hi,

If you have performance issues you can try if disabling the swap helps.

Disable swap

Bash:
swapoff -a

and then disable the following line in the /etc/fstab

/dev/pve/swap none swap sw 0 0

by commenting it out to avoid it being used on the next reboot.

# /dev/pve/swap none swap sw 0 0

We know of customers where this was the case.
 
Well I understand the importance of SWAP, so I would preferrably like to avoid outright disabling it (Seems like a bit of a cop-out solution). I'm more interested in the why of this scenario. What was the reason that a live backup pushed and capped 100GB of SWAP?
 
IMHO it is a bad idea to configure such large amounts of SWAP. The kernel is allowed to use that SWAP, and of cause will use it if memory pressure increases. What for depends on the workload, and I have no information about that.

I suggest you start with a smaller amount of SWAP (i.e. 4GB), unless you have a reason why you need 100GB SWAP?

But you already have 500GB RAM, so I doubt SWAP makes any sense (unless you have a very fast disk).
 
Last edited:
We're running 10K spinning drives, 2 in a RAID 1 for the OS, and the rest in RAID 10 for storage of our Virtual disks. I made sure to add that much swap because in previous smaller installations where I left the default of 8GB of swap, it always got maxed when running X amount (no specific) of VMs. So I took it as a lesson to always add a fair bit, where I add atleast 50GB during install. Though, I never really ever found a "Best Practices" of sorts for Proxmox, this has all been based on past experience.

That being said, this environment borders on the high end of Medium use case to Large use case. Our primary file servers run as VMs on Proxmox, and this issue cropped up during a backup of said server. With the amount of resources that we have, would decreasing/disabling swap increase performance and avoid these issues? Is it a "recommended" step for a setup of this scale? (Note we run 6 instances in our cluster. 2 of the same spec as the one pictured above, 2 of another similar spec, and 2 of another slightly different spec).
 
With the amount of resources that we have, would decreasing/disabling swap increase performance and avoid these issues?
yes, that is what is suggested above.
Is it a "recommended" step for a setup of this scale?
See my answer above.

Also, I really hope you do not over-commit memory (are you trying to use 600GB RAM when you only have 500GB)?

Side note: I would never use spinning drives if I plan a high end system (make no sense).
 
Last edited:
We don't over-commit on RAM, however atleast until the next refresh cycle we're likely not going to have the ability to swap to SSD/NVMe drives. I'll look into the recommendations above and report back with results for those that may have similar issues.
 
Another issue comes to my mind - Do you use ZFS for RAID? If so, are you aware of that ZFS uses half of the host RAM for caching? For machines likes yours it is highly recommended to change that default to a lower value.
 
  • Like
Reactions: Moayad
An update:
So when initially researching this issue, one suggestion that was brought up included adjusting the system property `vm.swappiness` to a lower value than the default (60). I adjusted ours to 20, and had actually forgotten to turn off our backup task (the one that we ran manually prompting this post). The big VM is halfway through backing up, and we're only consuming 12GB of swap, most of which is stemming from other machines. So that may be a solution for those not wanting to completely turn swap off.

On a similar note @dietmar and @Moayad I ask this because I want to learn and understand above anything else: Why is disabling swap a recommended action? Based on the above responses it's something that must be relatively common in bigger setups, but I haven't had much experience so I would prefer a detailed answer that explains why this is recommended, or that explains something known that proxmox does that this would be needed.
 
Why is disabling swap a recommended action?

Hi,

IMO, I do not think that disabling swap is a solution. A swappinies = [10-20] is ok for almost any server. But also you must take care about the amount of swap(110 % RAM could be resoanable on a server with 32-64 GB). Swap is a very good sensor if you need RAM or not, and can handle some short burst if it is needed. And if you HAVE swap, then you can do hybernation. Without .... no hybernation ;)

The main disadvantage for NO swap is the OOM killer may get you! This is a very serious problem.

So, like I said(IMO) I think with swap on, is much better(safe).

http://www.alexonlinux.com/swap-vs-no-swap


Good luck /Bafta!
 
Note: I did not say we recommend not use SWAP.
My apologies on the misunderstanding. As my other responses suggested, I wasn't keep on that option either, hence me asking for clarification.

In regards to the original reason for this post, the change of swappiness seems to have been the ticket i think. Otherwise, the question still remained: Why was ALL 100GB of swap being used. It's abnormal behavior when the RAM wasn't being fully utilized (according to the PVE UI), looking at htop in retrospect the cached portion (available) RAM may have been counted and triggered the OS to swap based on what I've read (I think?). If that is the case, should the default swappiness of the PVE install not be reduced to something lower than 60 (confirmed default on 6 different machines I run in another cluster)? Because I still hold the opinion that the machine should not have used that much swap on a single VM's backup, causing the machine and all VMs to slow to a crawl; Unless there is another issue/explanation that I'm missing here
 
I have seen this when doing a snapshot backup of running VMs with writeback caching enabled: swap would just fill. Because of the sync writes to swap and a performance issue in ZFS, sync writes slowed everything down. I have not seen this behavior since I started doing snapshot backups with PBS running inside a CT on the same host using the same storage zpool. Before I had to set the VM disk cache to none as a work-around, otherwise swap filled until it was full (a few GB) while there was 15GB of free memory.
 
We aren't using ZFS on any of the proxmox host machines, but writeback is enabled on the VM HDDs, didn't think that would cause the issue. But we're going to look into changing our current backup server (currently ubuntu vanilla) into a PBS.

I have not seen this behavior since I started doing snapshot backups with PBS running inside a CT on the same host using the same storage zpool.
Thanks for the possible solution/workaround, will test it out!
 
Last edited:
I also don't see how the writeback cache triggers the swap/slowness issue, but by a step by step undoing of recent changes (at the time), I found out that the writeback setting caused the issue for me (when not using PBS). It was quite reproducable but very weird and annoying, so I just stopped using writeback caching.
Just tested it: snapshot backup to local virtual PBS works fine, snapshot backup to local storage directory (on the same zpool as the PBS datastore) instantly starts filling swap. Snapshot backup to that local storage directory does not fill swap when using no caching for VM disk.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!