PVE LXC Swap is incredibly suboptimal, causes freezing

harvie

Well-Known Member
Apr 5, 2017
138
23
58
35
Hello,
i have PVE server with 170GB of RAM assigned to LXC CTs.
Typical CT settings look like this: 2GB RAM, 256GB Swap.

But when you assign CT 2GB of RAM and 256GB of swap, it actually gets 2GB of RAM and 2.25GB of Swap.
If i set the swap to 0GB, it still gets 2GB of swap.
This is just coconuts.

It means i can't have less than 170GB total (theoretical) swap enabled in CTs on such machine.

It doesn't make sense to have 170GB swap, as even SSDs are too slow to handle so much of swapping and it would just cause IO overload.
Having less swap than that. (like 10GB or so) causes the swap to be permanently full (just need few CTs to run out of ram to fill the swap).
In both cases this leads to performance problem for the whole PVE machine and even freezing of the server (and me having go to the work during weekend).

I would really prefer being able to set small swap for each CT, so if some of the CTs RAM gets full, the OOM kills the CT rather than causing swaping of huge volumes, which would undermine the server stability and ruin the day for all CTs. Even no swap in CT would be better than this. Unfortunately i can't set zero swap in PVE CTs right now.

Please make it possible to make CT swap zero or smaller than ram. This is HUGE problem.
 
There were some rummors, that this can't be fixed until Proxmox has support for cgroupv2.
But LXC has cgroup v2 support since version 3.0.0, so i guess this can now be fixed in Proxmox VE.
 
huh? you can set the container swap to something less than the host RAM? it's just that the limits are a bit strange. if you configure a container with 2GB memory and 2GB swap, it means the container can use
- 2GB memory, no swap
- 2GB memory, 2GB swap
- 1GB memory, 3GB swap
- XGB of memory and YGB of swap, where X <= memory limit, and X+Y <= memory limit + swap limit

cgroupv2 support is still not that close unfortunately (you can't run v1 and v2 "in parallel", and not everything is switched over/supports v2 yet..)
 
So there is no way to completely prevent CT from swapping, while having swap at the host?
 
So there is no way to completely prevent CT from swapping, while having swap at the host?

yes. that's just how the kernel interfaces are at the moment.
 
And is it possible when using cgroupv2?

What is problem with cgroupv2? It seems to be implemented in LXC for some time already...
 
And is it possible when using cgroupv2?

What is problem with cgroupv2? It seems to be implemented in LXC for some time already...
cgroupv2 support is still not that close unfortunately (you can't run v1 and v2 "in parallel", and not everything is switched over/supports v2 yet..)
 
cgroupv2 support is still not that close unfortunately

And what pieces are missing? Is that problem in Linux kernel? missing features in cgroup2 LXC?
Or it's just the Proxmox lacking support for latest lxc/linux features?
 
And what pieces are missing? Is that problem in Linux kernel? missing features in cgroup2 LXC?
Or it's just the Proxmox lacking support for latest lxc/linux features?

like I said (twice already) - you can't (meaningfully) use v1 and v2 in parallel, and most software does not yet support v2, so we can't switch PVE to v2 yet even if LXC had perfect support (note that full support for all needed features on the kernel side has only been available since 5.2! we just switched from 5.0 to 5.3 very very recently).
 
like I said (twice already) - you can't (meaningfully) use v1 and v2 in parallel, and most software does not yet support v2, so we can't switch PVE to v2 yet even if LXC had perfect support.

Aah. I see. I misunderstood this statement. My understanding was that you can't use v1 and v2 in parallel in proxmox infrastructure. Didn't realized this also affects other processes on host (unrelated to proxmox) and possibly even processes running in LXC containers...
I've just checked and systemd seems to support cgoupv2. OTOH docker does not support it. However docker is half-broken in PVE LXC anyway...
 
Btw i've just noticed i have following mountpoint on proxmox nodes:

cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)

There are even some LXC containers listed:

$ ls /sys/fs/cgroup/unified/lxc
102 108 114 201 207 213 219 227 cgroup.freeze cgroup.stat cpu.pressure
104 110 115 202 208 214 220 228 cgroup.max.depth cgroup.subtree_control cpu.stat
105 112 125 203 209 216 221 cgroup.controllers cgroup.max.descendants cgroup.threads io.pressure
106 113 126 204 212 217 226 cgroup.events cgroup.procs cgroup.type memory.pressure


is this some kind of test config made by proxmox or is it just inherited from cgroup v1?
 
Btw i've just noticed i have following mountpoint on proxmox nodes:

cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)

There are even some LXC containers listed:

$ ls /sys/fs/cgroup/unified/lxc
102 108 114 201 207 213 219 227 cgroup.freeze cgroup.stat cpu.pressure
104 110 115 202 208 214 220 228 cgroup.max.depth cgroup.subtree_control cpu.stat
105 112 125 203 209 216 221 cgroup.controllers cgroup.max.descendants cgroup.threads io.pressure
106 113 126 204 212 217 226 cgroup.events cgroup.procs cgroup.type memory.pressure


is this some kind of test config made by proxmox or is it just inherited from cgroup v1?

you can mount both, but for any single controller (like memory, ...) you can only either use v1 or v2. so you can see that for some things (e.g., new cpu/memory/io features) they are visibile via cgroup v2, but most are still in v1.
 
Any news on this? It seems that there was very interresting progress now that LXC 4.0 is implemented in Proxmox.
Do you think it is now safe to boot into cgroupv2 mode in production? (given that i run reasonably recent guest distros in CTs). I really need to be able to limit the swap per CT (or even disable swapping entirely for that CT. so that single CT gets OOM instead of ruining swap for everyone else)
 
  • Like
Reactions: pantherts
Recently there was release of PVE 6.4 with improved cgroupv2 support, i wonder if that means that swap limit now works properly and independently from ram limit.
 
Recently there was release of PVE 6.4 with improved cgroupv2 support, i wonder if that means that swap limit now works properly and independently from ram limit.
I just checked: available swap inside the container is still the memory settings plus the swap setting.
EDIT: I just used the default settings: cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
 
Last edited:
I just checked: available swap inside the container is still the memory settings plus the swap setting.
even with cgroupv2 enabled?

this swap=mem+swap thing is absolutely messing with my setups for years... And this whole time i have very hard time defending this behaviour in our company. I like Proxmox very much, but people keep pushing Hyper-V and i will probably die inside little bit if i will be forced to migrate, because i don't like core concepts of that.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!