Out of memory: Killed process [...] on multiple VMs

acapoprox · Nov 11, 2024

Hi guys,
this is our situation.We migrate from VMware this year.
This is our hardware:
BladeCenter VNX5200
Storage Dell EMC C7000 (HBA)
7 Nodes on bladecenter: Blade Server HP BL460C Gen 9 (128GB ram each)
Storage was configured following similar instructions:

blog.mohsen.co/proxmox-shared-storage-with-fc-san-multipath-and-17a10e4edd8d (

HA works smoothly. Backup is perfect.
We have approximately 50 virtual servers in production env. (10 ms windows, 40 linux).

The virtual machines "migrated" from VMware were, for the most part, Oracle Linux 8-9.
We have some ubuntu server too.

After some days, only on some Oracle Linux VM (6-7 at least), we found out the error described in the subject (inside the vm - No error on proxmox nodes.).
Sometimes a "secondary" process is killed but it happened that a central service (mysql, asterisk) was killed, creating business problems.
We absolutely need to fix this, and we certainly don't want to migrate to other platforms (XCP-ng or similar)

All the VMs have the agent properly installed (qemu-agent).
All VM are configured with "ballon memory".
Now, after all these problems, we are trying disabling ballon but we need help.
Have any of you had similar problems?
If so, how did you solve them?

Thanks in advance and sorry for my bad bad english.

Emanuele.

bbgeek17 · Nov 11, 2024

Hi @acapoprox, welcome to the forum.

If the OOM event happens inside the VM, that usually means that you've run out of allocated memory in the VM. While you may have identical settings between the old ESXi setup and the new PVE, there are always differences between hypervisors and virtual machine interactions.

The easy "fix" is to increase the memory allocated to the VMs. The next step would be to implement monitoring via Grafana, Nagios, or a similar product.

You should search and read through the many articles on OOM Killer troubleshooting, i.e.: https://serverfault.com/questions/134669/how-to-diagnose-causes-of-oom-killer-killing-processes

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

leesteken · Nov 11, 2024

acapoprox said:
After some days, only on some Oracle Linux VM (6-7 at least), we found out the error described in the subject (inside the vm - No error on proxmox nodes.).

acapoprox said:
All VM are configured with "ballon memory".

Please be aware that Proxmox will (forcefully) take memory away from the VM when it reaches 80% (on the host) when VMs are configured with ballooning: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#qm_memory
Maybe the minimum memory you set the VMs to is too little for the software inside the VM? Or maybe you need to give the Oracle VM more "Shares" (so Proxmox will take less memory from them and more memory from other VMs)?
Maybe change the KSM setting to start looking for memory to share before the default 80% (so Proxmox might not have to use ballooning as much)? https://pve.proxmox.com/pve-docs/pve-admin-guide.html#qm_memory

acapoprox · Nov 11, 2024

Hy guys thanks for the quick answers.
This is the first thing we did: increase the vRAM in the virtual machines where this problem occurred.
First +2GB, then another 2GB, etc.
This did not solve our problems.
I did not specify one thing:
in the nodes we have a monitoring system that eventually aligns the memory in all the nodes. If a node exceeds 65 percent there is a recalculation and a redistribution of the virtual machines in the remaining 6 nodes.
In any case, they never exceed 60%, for most of the time we have, on each single node, a percentage of memory used of 55% while, on the CPU side (20 x Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz (1 Socket)), we are at 20%.

Another note: we already have a monitoring system that monitors our entire infrastructure (Centreon).
It monitors everything, nodes, virtual machines, ups, etc.
It warns us in case of problems of any kind (disk space, memory above 80%, temperature, high values of cpu usage and cpu load, etc) and from what we have noticed, when the "Out of memory: Killed process" is triggered, there are no memory spikes.
Below is an example:
At about 11 o'clock we receive the notification of the kill, but if you go to the memory graph of the server all seems normal.
We are speechless.

ps: This has never happened in 6 years of VMware with the Same hardware.

BIGMEL · Nov 11, 2024

acapoprox said:
Hi guys,
this is our situation.We migrate from VMware this year.
This is our hardware:
BladeCenter VNX5200
Storage Dell EMC C7000 (HBA)
7 Nodes on bladecenter: Blade Server HP BL460C Gen 9 (128GB ram each)
Storage was configured following similar instructions:

blog.mohsen.co/proxmox-shared-storage-with-fc-san-multipath-and-17a10e4edd8d (

HA works smoothly. Backup is perfect.
We have approximately 50 virtual servers in production env. (10 ms windows, 40 linux).

The virtual machines "migrated" from VMware were, for the most part, Oracle Linux 8-9.
We have some ubuntu server too.

After some days, only on some Oracle Linux VM (6-7 at least), we found out the error described in the subject (inside the vm - No error on proxmox nodes.).
Sometimes a "secondary" process is killed but it happened that a central service (mysql, asterisk) was killed, creating business problems.
We absolutely need to fix this, and we certainly don't want to migrate to other platforms (XCP-ng or similar)

All the VMs have the agent properly installed (qemu-agent).
All VM are configured with "ballon memory".
Now, after all these problems, we are trying disabling ballon but we need help.
Have any of you had similar problems?
If so, how did you solve them?

Thanks in advance and sorry for my bad bad english.

Emanuele.

Hi Emanule

Have you considered Memory Virt/ pooling as a possibility ? take a look at kove.com

acapoprox · Nov 11, 2024

leesteken said:
Please be aware that Proxmox will (forcefully) take memory away from the VM when it reaches 80% (on the host) when VMs are configured with ballooning: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#qm_memory
Maybe the minimum memory you set the VMs to is too little for the software inside the VM? Or maybe you need to give the Oracle VM more "Shares" (so Proxmox will take less memory from them and more memory from other VMs)?
Maybe change the KSM setting to start looking for memory to share before the default 80% (so Proxmox might not have to use ballooning as much)? https://pve.proxmox.com/pve-docs/pve-admin-guide.html#qm_memory

I will try these solutions asap

thanks!

Search

Search

Out of memory: Killed process [...] on multiple VMs

acapoprox

New Member

bbgeek17

Distinguished Member

leesteken

Distinguished Member

acapoprox

New Member

BIGMEL

New Member

acapoprox

New Member

We value your privacy