Memory allocation failure

Xarion

Active Member
Oct 4, 2016
4
2
43
36
Hello

I would like to ask you for some help. I've a weird situation. Every few hours I can't restart my VM because of:

Code:
root@s0:/home/xarion# qm stop 128
root@s0:/home/xarion# qm start 128
ioctl(KVM_CREATE_VM) failed: 12 Cannot allocate memory
failed to initialize KVM: Cannot allocate memory
start failed: command '/usr/bin/kvm -id 128 -chardev 'socket,id=qmp,path=/var/run/qemu-server/128.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/128.pid -daemonize -smbios 'type=1,uuid=a2913cd7-b246-455c-b291-769abf8c8819' -name util.xirit.pl -smp '2,sockets=1,cores=2,maxcpus=2' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' -vga std -vnc unix:/var/run/qemu-server/128.vnc,x509,password -cpu kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,enforce -m 2048 -object 'memory-backend-ram,id=ram-node0,size=2048M' -numa 'node,nodeid=0,cpus=0-1,memdev=ram-node0' -k pl -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:2ebee71e25e' -drive 'file=/var/lib/vz/template/iso/CentOS-7-x86_64-Minimal-1708.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -drive 'file=/dev/zvol/data1/vm-128-disk-1,if=none,id=drive-virtio0,cache=writethrough,format=raw,aio=threads,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -netdev 'type=tap,id=net0,ifname=tap128i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=B2:7D:24:3F:CA:1C,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: exit code 1

Of course I know that this is not a proper way to restart machine but I have done it to illustrate the problem. In dmesg everytime I see:

Code:
[1833903.962832] kvm: page allocation failure: order:6, mode:0x140c0c0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), nodemask=(null) 
[1833903.962839] kvm cpuset=/ mems_allowed=0-1 
[1833903.962847] CPU: 15 PID: 1153 Comm: kvm Tainted: P IO 4.13.13-3-pve #1 
[1833903.962848] Hardware name: HP ProLiant SE316M1 , BIOS R02 05/05/2011 
[1833903.962849] Call Trace: 
[1833903.962857] dump_stack+0x63/0x8b 
[1833903.962861] warn_alloc+0x114/0x1c0 
[1833903.962863] ? __alloc_pages_direct_compact+0x51/0x100 
[1833903.962865] __alloc_pages_slowpath+0xe6e/0xe80 
[1833903.962868] ? mntput+0x24/0x40 
[1833903.962871] ? terminate_walk+0x8e/0xf0 
[1833903.962873] __alloc_pages_nodemask+0x251/0x270 
[1833903.962876] alloc_pages_current+0x6a/0xe0 
[1833903.962879] kmalloc_order+0x18/0x40 
[1833903.962880] kmalloc_order_trace+0x24/0xa0 
[1833903.962914] kvm_dev_ioctl+0xb5/0x6b0 [kvm] 
[1833903.962916] do_vfs_ioctl+0xa3/0x610 
[1833903.962918] ? putname+0x54/0x60 
[1833903.962920] ? do_sys_open+0x1bc/0x280 
[1833903.962921] SyS_ioctl+0x79/0x90 
[1833903.962924] entry_SYSCALL_64_fastpath+0x1e/0x81 
[1833903.962926] RIP: 0033:0x7f450551ae07 
[1833903.962927] RSP: 002b:00007ffc4415d6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 
[1833903.962928] RAX: ffffffffffffffda RBX: 00005633c1a462a9 RCX: 00007f450551ae07 
[1833903.962929] RDX: 0000000000000000 RSI: 000000000000ae01 RDI: 000000000000000e 
[1833903.962930] RBP: 00007f44f9d92000 R08: 00005633c1a24cb8 R09: 0000000000000000 
[1833903.962931] R10: 000000000000026c R11: 0000000000000246 R12: 00005633c1a4622a 
[1833903.962932] R13: 0000000000000042 R14: 00005633c1a46300 R15: 00007f44f9c61670 
[1833903.962933] Mem-Info: 
[1833903.962939] active_anon:4393516 inactive_anon:715932 isolated_anon:0 
active_file:117766 inactive_file:2596227 isolated_file:0 
unevictable:3038 dirty:488 writeback:349 unstable:0 
slab_reclaimable:107326 slab_unreclaimable:123514 
mapped:26743 shmem:145528 pagetables:15665 bounce:0 
free:78435 free_pcp:0 free_cma:0

This is weird because I've plenty of free memory:

Code:
top - 15:28:07 up 21 days, 5:24, 1 user, load average: 36.48, 28.53, 27.11 
Tasks: 562 total, 1 running, 561 sleeping, 0 stopped, 0 zombie 
%Cpu(s): 4.1 us, 1.8 sy, 0.0 ni, 32.1 id, 61.8 wa, 0.0 hi, 0.2 si, 0.0 st 
KiB Mem : 49440780 total, 287824 free, 37140896 used, 12012060 buff/cache 
KiB Swap: 0 total, 0 free, 0 used. 11151908 avail Mem

There is a 10 GB of free memory, so why the VM which needs only 2 GB doesn't fit? I can make it working again by.... dropping caches.

Code:
echo 3 > /proc/sys/vm/drop_caches

But when I do that and wait a few hours the problem occurs again. It's repeatable. Can someone explain me why the buff/cache is not reclaimed automatically when the system is out of free memory?

Kernel on the server:
Linux s0 4.13.13-3-pve #1 SMP PVE 4.13.13-34 (Sun, 7 Jan 2018 13:19:58 +0100) x86_64 GNU/Linux

Uptime:
15:09:04 up 22 days, 5:05, 2 users, load average: 1.24, 1.37, 1.06
 
Not enough contiguous free RAM to allocate the RAM requested.

This will display how many contiguous allocations of each 'order' are available:
Code:
cat /proc/buddyinfo

From left to right each column represents the count of allocations available for each order starting with order 0.
The size of each order in bytes can be calculated: (2 ^ order) * 4096

In your case you had an order 6 allocation failure:
Code:
kvm: page allocation failure: order:6

(2 ^ 6 ) * 4096 = 262144
So you were lacking a contiguous free space of 262144bytes.

Not sure how safe it is to run but I've never had issues doing this.
To 'defrag' the RAM run this command:
Code:
echo 1 > /proc/sys/vm/compact_memory

If you want to prevent this issue in the future I have a few suggestions.

Increase vm.min_free_kbytes, making this too high can cause issues so please read some documentation about this.
Code:
root@aaaa:~# cat /etc/sysctl.d/99-vm.min_free_kbytes.conf
vm.min_free_kbytes = 524288
NOTE, this value should be a multiple of 4096.
I usually start at about 5% of total RAM divided by number of CPU cores.

If you have multiple CPU sockets, tell KSM to not merge across nodes.
This will require installing the sysfsutils package:
Code:
root@vm1:~# cat /etc/sysfs.conf              
#merge_across_nodes                          
kernel/mm/ksm/merge_across_nodes=0

You might also want to look into tuning:
vm.dirty_ratio
vm.dirty_background_ratio
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!