[SOLVED] Swappiness question

gallew

Active Member
Oct 9, 2015
28
7
43
Hi

It seems like latest proxmox kernel does not honor swappiness parameter.
Does anybody have same problem, or i'm just missing something here.
(I'm not using linux containers)
ZFS ARC is restricted to 8GB

Code:
root@cd02 ~ # sysctl vm.swappiness
vm.swappiness = 0
root@cd02 ~ # cat /proc/sys/vm/swappiness
0
root@cd02 ~ # free -m
              total        used        free      shared  buff/cache   available
Mem:          31962       11957       19528         214         476       19395
Swap:          1022         666         356
 

Attachments

  • Untitled.png
    Untitled.png
    32.2 KB · Views: 71
Swappiness sets now a weight on how likely to swap. So, 0 is not off, it is the lowest weight. You can only swapoff to disable swap completely.
 
  • Like
Reactions: chrone
Swappiness sets now a weight on how likely to swap. So, 0 is not off, it is the lowest weight. You can only swapoff to disable swap completely.

This was true previously, when swappiness 0 meant that swapping will occur when physical memory starts to run out.
Now, apparently, things have changed, according to several sources (including kernel code) this is no longer so (kernel 3.5 and above).
https://dzone.com/articles/OOM-relation-to-swappiness
In my situation, however, there was plenty of free memory (by free, i mean free as: not allocated by operating system to use for file-system buffering)
So in regarding to proxmox kernel, neither of options (no swapping, swapping only when physical memory runs out) work properly.
 
The kernel docs for v4.15 state the following.
swappiness
This control is used to define how aggressive the kernel will swap
memory pages. Higher values will increase aggressiveness, lower values
decrease the amount of swap. A value of 0 instructs the kernel not to
initiate swap until the amount of free and file-backed pages is less
than the high water mark in a zone.
The default value is 60.
https://github.com/torvalds/linux/blob/v4.15/Documentation/sysctl/vm.txt

5.3 swappiness
Overrides /proc/sys/vm/swappiness for the particular group. The tunable
in the root cgroup corresponds to the global swappiness setting.
Please note that unlike during the global reclaim, limit reclaim
enforces that 0 swappiness really prevents from any swapping even if
there is a swap storage available. This might lead to memcg OOM killer
if there are no file pages to reclaim.
https://github.com/torvalds/linux/blob/v4.15/Documentation/cgroup-v1/memory.txt
 
The default value is 60

I think this definition is "less logical". I'd expect "swappines 0" to swap only when it is absolutely (!) necessary. Why should I keep 60 (%/kB/MB/whatever) free?
 
This is not a %-watermark, this defines how aggressive the kernel should swap.
 
Please note that unlike during the global reclaim, limit reclaim
enforces that 0 swappiness really prevents from any swapping even if
there is a swap storage available. This might lead to memcg OOM killer
if there are no file pages to reclaim.
So, does this mean that kernel will not swap at all if vm.swappiness is set to 0?
What i'm trying to achieve is that kernel would not swap unless memory runs out.
Be that swappines 0 or 1.

Yesterday i checked again, if parameter is set (vm.swappiness = 0)
then i disabled swap with "swapoff", then i enabled swap with "swapon".
At that point, swap was 0B as expected.
Today morning i see 880MB of swap.
Code:
root@cd02 ~ # free -m
              total        used        free      shared  buff/cache   available
Mem:          31962       15011       16347         239         602       16311
Swap:          1022         880         142

Since this machine is monitored with zabbix, i checked memory graph and used memory was pretty much same during last 24h.
Mystery?

Untitled.png chart2.png
 
you probably have containers with swap configured. even if the system itself still has plenty of free memory, your container will use swap if it hits its own memory limit.
 
you probably have containers with swap configured. even if the system itself still has plenty of free memory, your container will use swap if it hits its own memory limit.
As i mentioned on my first post: i'm not using containers.
I mentioned that specifically, to be clear that swapping is not related to containers, but to proxmox kernel.

I also tested with swappiness = 1, result is same, still swapping.
 
then the next step would be to check which process are using swap (e.g. by checking /proc/**/status).
 
Code:
(echo "COMM PID SWAP"; for file in /proc/*/status ; do awk '/^Pid|VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file; done | grep kB | grep -wv "0 kB" | sort -k 3 -n -r)

This is what I use for it, the processes who are using swap are not always the same for me. Sometimes I do not see any process using swap, but top tells me there is some swap beeing used.
 
Code:
root@cd02 ~ # (echo "COMM PID SWAP"; for file in /proc/*/status ; do awk '/^Pid|VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file; done | grep kB | grep -wv "0 kB" | sort -k 3 -n -r)
COMM PID SWAP
kvm 21360 121284 kB
kvm 17363 72344 kB
kvm 32624 58280 kB
kvm 20502 33040 kB
kvm 27101 29484 kB
kvm 1632 29040 kB
pve-ha-crm 1901 15144 kB
kvm 13898 12568 kB
pveproxy 2473 6936 kB
pve-ha-lrm 1942 6248 kB
pveproxy worker28507 6096 kB
pveproxy worker28509 6060 kB
pveproxy worker28508 6060 kB
systemd-journal 389 116 kB
pvedaemon worke14329 36 kB
pmxcfs 2368 4 kB
root@cd02 ~ # free -m
              total        used        free      shared  buff/cache   available
Mem:          31962       15800       15647         264         514       15509
Swap:          1022         489         533

As i see, most swap is used by kvm VM process with PID 21360.
This VM is mostly sleeping, it has fixed memory amount of 2GB and no ballooning.
Code:
root     21360  0.4  3.0 2949288 1009396 ?     Sl   Mar18  23:24 /usr/bin/kvm -id 121 -chardev socket,id=qmp,path=/var/run/qemu-server/121.qmp,server,nowait -mon chardev=qmp,mode=control -pidfile /var/run/qemu-server/121.pid -daemonize -smbios type=1,uuid=4ac51408-2884-4e71-833a-98ced0acb51a -name XXXXXXXXX-srv-dev -smp 1,sockets=1,cores=1,maxcpus=1 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vga std -vnc unix:/var/run/qemu-server/121.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 2048 -k sv -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -iscsi initiator-name=iqn.1993-08.org.debian:01:4567db5649d0 -drive if=none,id=drive-ide2,media=cdrom,aio=threads -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -device virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5 -drive file=/dev/zvol/data/vm-121-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on -device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap121i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=16:7A:AB:93:C1:00,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
Summary page for this VM on proxmox webui says:
Memory usage 34.05% (697.36 MiB of 2.00 GiB)

Server itself (host) has 32GB of memory. ZFS ARC is restricted to 8GB and provisioned memory for all VM's is 16,5GB
So, assuming proxmox itself occupies 1-2GB of memory, memory is no over committed.
So total: ~26,5GB
Proxmox webui reports memory usage for this machine: RAM usage 49.61% (15.48 GiB of 31.21 GiB)
This is less than max calculated 26,5GB because all VM's do not use maximum memory that are allocated to them.
 
Last edited:
Code:
root@cd02 ~ # (echo "COMM PID SWAP"; for file in /proc/*/status ; do awk '/^Pid|VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file; done | grep kB | grep -wv "0 kB" | sort -k 3 -n -r)
COMM PID SWAP
kvm 21360 121284 kB
kvm 17363 72344 kB
kvm 32624 58280 kB
kvm 20502 33040 kB
kvm 27101 29484 kB
kvm 1632 29040 kB
pve-ha-crm 1901 15144 kB
kvm 13898 12568 kB
pveproxy 2473 6936 kB
pve-ha-lrm 1942 6248 kB
pveproxy worker28507 6096 kB
pveproxy worker28509 6060 kB
pveproxy worker28508 6060 kB
systemd-journal 389 116 kB
pvedaemon worke14329 36 kB
pmxcfs 2368 4 kB
root@cd02 ~ # free -m
              total        used        free      shared  buff/cache   available
Mem:          31962       15800       15647         264         514       15509
Swap:          1022         489         533

As i see, most swap is used by kvm VM process with PID 21360.
This VM is mostly sleeping, it has fixed memory amount of 2GB and no ballooning.
Code:
root     21360  0.4  3.0 2949288 1009396 ?     Sl   Mar18  23:24 /usr/bin/kvm -id 121 -chardev socket,id=qmp,path=/var/run/qemu-server/121.qmp,server,nowait -mon chardev=qmp,mode=control -pidfile /var/run/qemu-server/121.pid -daemonize -smbios type=1,uuid=4ac51408-2884-4e71-833a-98ced0acb51a -name XXXXXXXXX-srv-dev -smp 1,sockets=1,cores=1,maxcpus=1 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vga std -vnc unix:/var/run/qemu-server/121.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 2048 -k sv -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -iscsi initiator-name=iqn.1993-08.org.debian:01:4567db5649d0 -drive if=none,id=drive-ide2,media=cdrom,aio=threads -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -device virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5 -drive file=/dev/zvol/data/vm-121-disk-1,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on -device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap121i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=16:7A:AB:93:C1:00,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300
Summary page for this VM on proxmox webui says:
Memory usage 34.05% (697.36 MiB of 2.00 GiB)

Server itself (host) has 32GB of memory. ZFS ARC is restricted to 8GB and provisioned memory for all VM's is 16,5GB
So, assuming proxmox itself occupies 1-2GB of memory, memory is no over committed.
So total: ~26,5GB
Proxmox webui reports memory usage for this machine: RAM usage 49.61% (15.48 GiB of 31.21 GiB)
This is less than max calculated 26,5GB because all VM's do not use maximum memory that are allocated to them.

in your situation it is likely that those pages were swapped out at some point (in a high memory pressure situation) and just never swapped in (the kernel will avoid swapping stuff back in that is not needed, as swapping in is expensive as well). the current memory situation does not tell you why the stuff was originally swapped out. unless you are constantly swapping in and out, this is nothing to worry about (if it bothers you, you can just disable swap altogether and live with the occasional OOM-kill when something runs amok).
 
Code:
(echo "COMM PID SWAP"; for file in /proc/*/status ; do awk '/^Pid|VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file; done | grep kB | grep -wv "0 kB" | sort -k 3 -n -r)

This is what I use for it, the processes who are using swap are not always the same for me. Sometimes I do not see any process using swap, but top tells me there is some swap beeing used.

what numbers are we talking about in the latter case? what does /proc/meminfo say?
 
Code:
root@cd02 ~ # (echo "COMM PID SWAP"; for file in /proc/**/status ; do awk '/^Pid|VmSwap|Name/{printf $2 " " $3}END{ print ""}' $file; done | grep kB | grep -wv "0 kB" | sort -k 3 -n -r)
COMM PID SWAP
kvm 21360 141480 kB
kvm 17363 78912 kB
kvm 32624 65768 kB
kvm 20502 38660 kB
kvm 27101 33580 kB
kvm 1632 33512 kB
pve-ha-crm 1901 15144 kB
kvm 13898 13616 kB
pveproxy 2473 8872 kB
pveproxy worker28509 7672 kB
pveproxy worker28507 7620 kB
pveproxy worker28508 7608 kB
pve-ha-lrm 1942 6248 kB
systemd-journal 389 124 kB
pmxcfs 2368 4 kB
Same with globstar.

7h ago, i read swap back to memory.
Then i see that free memory fluctuates a little, but nothing to worry about, plenty of free memory left, but in same time system uses swap more and more over time.
Look at attachments.


Since we have second proxmox cluster with older software, but same setup (ZFS, no containers, etc), and older version is working as expected: with swappiness = 0 there is no swap, since there is plenty of free memory on nodes.
Older version has:
Kernel Version Linux 4.13.8-3-pve #1 SMP PVE 4.13.8-30 (Tue, 5 Dec 2017 13:06:48 +0100)
PVE Manager Version pve-manager/5.1-36/131401db
 

Attachments

  • mem.png
    mem.png
    29.9 KB · Views: 23
  • swap.png
    swap.png
    27.1 KB · Views: 25
Hmm, I did some testing and with 4.13.4 kernel (which I use atm on this) I can not reproduce the swap usage.

Nevertheless
Code:
MemTotal:       49403976 kB
MemFree:         5780856 kB
MemAvailable:   10460368 kB
Buffers:         4460672 kB
Cached:           731920 kB
SwapCached:            0 kB
Active:         29810996 kB
Inactive:        4102856 kB
Active(anon):   28748612 kB
Inactive(anon):   139976 kB
Active(file):    1062384 kB
Inactive(file):  3962880 kB
Unevictable:      121888 kB
Mlocked:          121892 kB
SwapTotal:       8388604 kB
SwapFree:        8388604 kB
Dirty:               156 kB
Writeback:             0 kB
AnonPages:      28843256 kB
Mapped:           366648 kB
Shmem:            162056 kB
Slab:            1920384 kB
SReclaimable:     251372 kB
SUnreclaim:      1669012 kB
KernelStack:       22160 kB
PageTables:       122756 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    33090592 kB
Committed_AS:   41551068 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:  20869120 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      382648 kB
DirectMap2M:    20535296 kB
DirectMap1G:    29360128 kB

Edit:
Now it happened on 4.13 too:
Code:
COMM PID SWAP
pvedaemon 11370 92712 kB
pvedaemon worke11375 81624 kB
pvedaemon worke11371 79964 kB
pvedaemon worke11374 79676 kB
pve-ha-crm 11624 57384 kB
pve-ha-lrm 11655 36036 kB
pve-firewall 11284 14472 kB
pvestatd 11308 13524 kB
cron 11217 204 kB
atopacctd 8973 64 kB

MemTotal:       49403976 kB
MemFree:         1279868 kB
MemAvailable:   16730452 kB
Buffers:        15284268 kB
Cached:           392544 kB
SwapCached:        15696 kB
Active:         24437040 kB
Inactive:       17363908 kB
Active(anon):   23025628 kB
Inactive(anon):  3255084 kB
Active(file):    1411412 kB
Inactive(file): 14108824 kB
Unevictable:       12484 kB
Mlocked:           12484 kB
SwapTotal:       8388604 kB
SwapFree:        8163580 kB
Dirty:               240 kB
Writeback:             0 kB
AnonPages:      26130268 kB
Mapped:           243788 kB
Shmem:            151292 kB
Slab:            2430208 kB
SReclaimable:     527472 kB
SUnreclaim:      1902736 kB
KernelStack:       21504 kB
PageTables:       121256 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    33090592 kB
Committed_AS:   41391932 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:  17393664 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      405176 kB
DirectMap2M:    23658496 kB
DirectMap1G:    26214400 kB
 
Last edited:
Update: it seems that, it does not matter what /proc/sys/vm/swappiness says.
What helped was: update-initramfs -u and reboot after that.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!