Memory allocation failure

Xarion

Active Member
Oct 4, 2016
4
2
43
37
Hello

I would like to ask you for some help. I've a weird situation. Every few hours I can't restart my VM because of:

Code:
root@s0:/home/xarion# qm stop 128
root@s0:/home/xarion# qm start 128
ioctl(KVM_CREATE_VM) failed: 12 Cannot allocate memory
failed to initialize KVM: Cannot allocate memory
start failed: command '/usr/bin/kvm -id 128 -chardev 'socket,id=qmp,path=/var/run/qemu-server/128.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/128.pid -daemonize -smbios 'type=1,uuid=a2913cd7-b246-455c-b291-769abf8c8819' -name util.xirit.pl -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/128.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 2048 -object 'memory-backend-ram,id=ram-node0,size=2048M' -numa 'node,nodeid=0,cpus=0-1,memdev=ram-node0' -k pl -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:2ebee71e25e' -drive 'file=/var/lib/vz/template/iso/CentOS-7-x86_64-Minimal-1708.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/dev/zvol/data1/vm-128-disk-1,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap128i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=B2:7D:24:3F:CA:1C,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: exit code 1

Of course I know that this is not a proper way to restart machine but I have done it to illustrate the problem. In dmesg everytime I see:

Code:
[1833903.962832] kvm: page allocation failure: order:6, mode:0x140c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null) 
[1833903.962839] kvm cpuset=/ mems_allowed=0-1 
[1833903.962847] CPU: 15 PID: 1153 Comm: kvm Tainted: P IO 4.13.13-3-pve #1 
[1833903.962848] Hardware name: HP ProLiant SE316M1 , BIOS R02 05/05/2011 
[1833903.962849] Call Trace: 
[1833903.962857] dump_stack+0x63/0x8b 
[1833903.962861] warn_alloc+0x114/0x1c0 
[1833903.962863] ? __alloc_pages_direct_compact+0x51/0x100 
[1833903.962865] __alloc_pages_slowpath+0xe6e/0xe80 
[1833903.962868] ? mntput+0x24/0x40 
[1833903.962871] ? terminate_walk+0x8e/0xf0 
[1833903.962873] __alloc_pages_nodemask+0x251/0x270 
[1833903.962876] alloc_pages_current+0x6a/0xe0 
[1833903.962879] kmalloc_order+0x18/0x40 
[1833903.962880] kmalloc_order_trace+0x24/0xa0 
[1833903.962914] kvm_dev_ioctl+0xb5/0x6b0 [kvm] 
[1833903.962916] do_vfs_ioctl+0xa3/0x610 
[1833903.962918] ? putname+0x54/0x60 
[1833903.962920] ? do_sys_open+0x1bc/0x280 
[1833903.962921] SyS_ioctl+0x79/0x90 
[1833903.962924] entry_SYSCALL_64_fastpath+0x1e/0x81 
[1833903.962926] RIP: 0033:0x7f450551ae07 
[1833903.962927] RSP: 002b:00007ffc4415d6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 
[1833903.962928] RAX: ffffffffffffffda RBX: 00005633c1a462a9 RCX: 00007f450551ae07 
[1833903.962929] RDX: 0000000000000000 RSI: 000000000000ae01 RDI: 000000000000000e 
[1833903.962930] RBP: 00007f44f9d92000 R08: 00005633c1a24cb8 R09: 0000000000000000 
[1833903.962931] R10: 000000000000026c R11: 0000000000000246 R12: 00005633c1a4622a 
[1833903.962932] R13: 0000000000000042 R14: 00005633c1a46300 R15: 00007f44f9c61670 
[1833903.962933] Mem-Info: 
[1833903.962939] active_anon:4393516 inactive_anon:715932 isolated_anon:0 
active_file:117766 inactive_file:2596227 isolated_file:0 
unevictable:3038 dirty:488 writeback:349 unstable:0 
slab_reclaimable:107326 slab_unreclaimable:123514 
mapped:26743 shmem:145528 pagetables:15665 bounce:0 
free:78435 free_pcp:0 free_cma:0

This is weird because I've plenty of free memory:

Code:
top - 15:28:07 up 21 days, 5:24, 1 user, load average: 36.48, 28.53, 27.11 
Tasks: 562 total, 1 running, 561 sleeping, 0 stopped, 0 zombie 
%Cpu(s): 4.1 us, 1.8 sy, 0.0 ni, 32.1 id, 61.8 wa, 0.0 hi, 0.2 si, 0.0 st 
KiB Mem : 49440780 total, 287824 free, 37140896 used, 12012060 buff/cache 
KiB Swap: 0 total, 0 free, 0 used. 11151908 avail Mem

There is a 10 GB of free memory, so why the VM which needs only 2 GB doesn't fit? I can make it working again by.... dropping caches.

Code:
echo 3 > /proc/sys/vm/drop_caches

But when I do that and wait a few hours the problem occurs again. It's repeatable. Can someone explain me why the buff/cache is not reclaimed automatically when the system is out of free memory?

Kernel on the server:
Linux s0 4.13.13-3-pve #1 SMP PVE 4.13.13-34 (Sun, 7 Jan 2018 13:19:58 +0100) x86_64 GNU/Linux

Uptime:
15:09:04 up 22 days, 5:05, 2 users, load average: 1.24, 1.37, 1.06
 
Not enough contiguous free RAM to allocate the RAM requested.

This will display how many contiguous allocations of each 'order' are available:
Code:
cat /proc/buddyinfo

From left to right each column represents the count of allocations available for each order starting with order 0.
The size of each order in bytes can be calculated: (2 ^ order) * 4096

In your case you had an order 6 allocation failure:
Code:
kvm: page allocation failure: order:6

(2 ^ 6 ) * 4096 = 262144
So you were lacking a contiguous free space of 262144bytes.

Not sure how safe it is to run but I've never had issues doing this.
To 'defrag' the RAM run this command:
Code:
echo 1 > /proc/sys/vm/compact_memory

If you want to prevent this issue in the future I have a few suggestions.

Increase vm.min_free_kbytes, making this too high can cause issues so please read some documentation about this.
Code:
root@aaaa:~# cat /etc/sysctl.d/99-vm.min_free_kbytes.conf
vm.min_free_kbytes = 524288
NOTE, this value should be a multiple of 4096.
I usually start at about 5% of total RAM divided by number of CPU cores.

If you have multiple CPU sockets, tell KSM to not merge across nodes.
This will require installing the sysfsutils package:
Code:
root@vm1:~# cat /etc/sysfs.conf              
#merge_across_nodes                          
kernel/mm/ksm/merge_across_nodes=0

You might also want to look into tuning:
vm.dirty_ratio
vm.dirty_background_ratio