Host Ran out of memory

adamb

Famous Member
Mar 1, 2012
1,329
77
113
Hey guys we have a host with a single VM. The host has 1.48TiB of ram and we allocated 1.44TiB to the VM. We typically always leave 25-50G of free memory for the host. Based on the numbers above we should have a bit over 40G of memory always available for the host no matter what.

Last night the server started hitting these.

Code:
Oct 31 01:36:12 concordprox1 pvestatd[4058]: command '/sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count' failed: open3: fork failed: Cannot allocate memory at /usr/share/perl5/PVE/Tools.pm line 429.
Oct 31 01:36:12 concordprox1 pvestatd[4058]: command '/sbin/vgscan --ignorelockingfailure --mknodes' failed: open3: fork failed: Cannot allocate memory at /usr/share/perl5/PVE/Tools.pm line 429.
Oct 31 01:36:12 concordprox1 pvestatd[4058]: command '/sbin/lvs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,lv_name,lv_size,lv_attr,pool_lv,data_percent,metadata_percent,snap_percent,uuid,tags,metadata_size' failed: open3: fork failed: Cannot allocate memory at /usr/share/perl5/PVE/Tools.pm line 429.

At around 1:41am the VM went dead. No traces of a OOM killer or panics or anything.

On the latest Proxmox 5.4 packages.

pve-manager/5.4-13/aee6f0ec (running kernel: 4.15.18-21-pve)

Do you guys think I should ensure more than 40G is available for the host? Just seems like that is alot for a host which is simply running 1 KVM process, but maybe I am missing something.
 
Last edited:
With memory, there are so many problems that may arise and with soooo much memory, I suspect that 40 GB is not enough for the pagetables alone. Could you please check /proc/meminfo and /proc/buddyinfo. The memory also has to be contiguous, otherwise programs will also fail, so memory fragmentation is a big, big problem if you do not use hugepages. Do you use hugepages? If not, please do!
 
With memory, there are so many problems that may arise and with soooo much memory, I suspect that 40 GB is not enough for the pagetables alone. Could you please check /proc/meminfo and /proc/buddyinfo. The memory also has to be contiguous, otherwise programs will also fail, so memory fragmentation is a big, big problem if you do not use hugepages. Do you use hugepages? If not, please do!

We do use hugepages.

We also have 30 or so clusters in the field running 768G of ram, with one single VM. No issue there and we typically reserve a similar amount. Maybe we just need to leave a bit more.
 
So, do you monitor the information from buddyinfo? We had similar issues (not with PVE, but with Oracle) and it boiled down to not enough contiguous memory and in the end, we needed more temporary room, so your idea of leaving more is just right. I still wonder how to efficiently and correctly monitor buddyinfo.
 
So, do you monitor the information from buddyinfo? We had similar issues (not with PVE, but with Oracle) and it boiled down to not enough contiguous memory and in the end, we needed more temporary room, so your idea of leaving more is just right. I still wonder how to efficiently and correctly monitor buddyinfo.

We don't, but I will have to check it out. So far things have been solid with "vm.min_free_kbytes".
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!