[PVE 8.4.1] Ballooning doesn't work ?

garfieldfr

Member
May 5, 2020
10
1
23
59
France
Hello,

After update to 8.4.1, I have some error on VMs :

On a VM (Debian 12 up to date, juste Apache/MariaDB/vscode - I use it to remote develop ), I get "Out of memory" when ballooning is checked.
Initial setup :
Avec_balloning.png
I get this error :
OutOfMemory.png

When I uncheck "ballooning device" or set "min memory"="Memory" then all works.

I have same problème on a Debian 11 (with Gitlab CE on it)

Is it a bug or a new config is required ?
 
First things first, I'd make sure you're using guest-agent and it's properly recognized in PVE. Look at the guest summary and ensure you can see the guest IPs and memory statistics.

Second, 2GB is a fairly low minimum. Have you tried raising the minimum / maximum to a bit higher and checking performance? Is there anything that could be interfering with it picking up additional RAM between minimum and maximum? Such as having the minimum available to start the machine but not enough available to ramp up to maximum during start/high load? How much memory does the guest OS recognize?

Try to get your machine to a stable point and then list processes by RAM usage and see if there's anything hogging more than you'd expect. Or at least, ensure that you're sitting comfortable within the maximum memory allocation.

I recently did some experimenting with the memory ballooning configs. The trick I used was to run it in min=max with ballooning checked, run the VM for a period of time ensuring that you get time with startup, shutdown, restart, idle time, heavy use time, etc. From there what I did was to then review the memory usage statistics and identify the maximum memory usage for startup, shutdown, and idle time. Then I would give a *little* bit more as the minimum memory in ballooning memory config. Then I took the maximum from all states and made the maximum memory a *little* bit more than that. I will say, my approach did show that usually on linux (primarily Ubuntu 24.04 servers), there was only a small gain in available memory across the whole node for each adjustment I made.

Attached is the memory summary from a Ubuntu VM that I'm using ballooning on and haven't quite trimmed it down all the way. I have it set to 4.0/6.0 GB RAM and could probably bring that down safely to around 3.0/5.0 GB RAM using this approach.
 

Attachments

  • memory usage summary.PNG
    memory usage summary.PNG
    16.8 KB · Views: 4
Sorry I didn't approach that from the "After update to 8.4.1, I have some error on VMs" direction, I didn't even realize this may have been stable prior. But, try to set min different from max at a higher than normal amount and see what the guest OS recognizes for RAM if you can get it to stabilize.
 
So, yes, it was stable before the update and has been for at least 3 years....

The VM is up to date and has the agent. I managed to do a "top" and it turns out that qemu-ga takes a lot of CPU when I have the "out of memory" but the memory doesn't go up to the max possible.
 
What does the system see for memory? I ran "sudo lshw -c memory" on Ubuntu and came back with the attached image which displays the max as a single DIMM with QEMU vendor.
 

Attachments

  • show memory ubuntu.PNG
    show memory ubuntu.PNG
    16.2 KB · Views: 5
if I run "lshw -c memory" I get this with ballooning=off or ballooning=on
Code:
  *-memory
       description: Mémoire Système
       identifiant matériel: 1000
       emplacement: Project-Id-Version: @(#) $Id: fr.po 2151 2010-03-15 20:26:20Z lyonel $Report-Msgid-Bugs-To: POT-Creation-Date: 2009-10-08 14:02+0200PO-Revision-Date: 2009-10-08 14:06+0100Last-Translator: Lyonel Vincent <lyonel@ezix.org>Language-Team: MIME-Version: 1.0Content-Type: text/plain; charset=UTF-8Content-Transfer-Encoding: 8bit
       taille: 4GiB
       fonctionnalités: ecc
       configuration: errordetection=multi-bit-ecc
     *-bank
          description: DIMM RAM
          fabriquant: QEMU
          identifiant matériel: 0
          emplacement: DIMM 0
          taille: 4GiB
 
It appears that your system is recognizing 4 GB RAM, when you took that output, what was your minimum set to? Still 4GB/4GB? Set 4GB/6GB with ballooning on and make sure that the system now recognizes 6GB RAM. Also, when you're doing this, for efficiency's sake, ensure that you are using multiples of 1024, and it's generally in 2 GB increments, though I've seen systems run fine with odd GB RAM.
 
Here are some tests with 6G max, in all cases 6G are seen by the system.
I add "free -h"

Bash:
# Memory=6144
# Min memory=2048
# Ballooning=on
root@dev:~# lshw -c memory
  *-firmware
       description: BIOS
       fabriquant: SeaBIOS
       identifiant matériel: 0
       version: rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org
       date: 04/01/2014
       taille: 96KiB
  *-memory
       description: Mémoire Système
       identifiant matériel: 1000
       emplacement: Project-Id-Version: @(#) $Id: fr.po 2151 2010-03-15 20:26:20Z lyonel $Report-Msgid-Bugs-To: POT-Creation-Date: 2009-10-08 14:02+0200PO-Revision-Date: 2009-10-08 14:06+0100Last-Translator: Lyonel Vincent <lyonel@ezix.org>Language-Team: MIME-Version: 1.0Content-Type: text/plain; charset=UTF-8Content-Transfer-Encoding: 8bit
       taille: 6GiB
       fonctionnalités: ecc
       configuration: errordetection=multi-bit-ecc
     *-bank
          description: DIMM RAM
          fabriquant: QEMU
          identifiant matériel: 0
          emplacement: DIMM 0
          taille: 6GiB

root@dev:~# free -h
               total       utilisé      libre     partagé tamp/cache   disponible
Mem:           5,4Gi       836Mi       4,4Gi       145Mi       512Mi       4,6Gi
Échange:          0B          0B          0B
> total go down with time

**************************************************************************************
# Memory=6144
# Min memory=2048
# Ballooning=off
root@dev:~# lshw -c memory
  *-firmware
       description: BIOS
       fabriquant: SeaBIOS
       identifiant matériel: 0
       version: rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org
       date: 04/01/2014
       taille: 96KiB
  *-memory
       description: Mémoire Système
       identifiant matériel: 1000
       emplacement: Project-Id-Version: @(#) $Id: fr.po 2151 2010-03-15 20:26:20Z lyonel $Report-Msgid-Bugs-To: POT-Creation-Date: 2009-10-08 14:02+0200PO-Revision-Date: 2009-10-08 14:06+0100Last-Translator: Lyonel Vincent <lyonel@ezix.org>Language-Team: MIME-Version: 1.0Content-Type: text/plain; charset=UTF-8Content-Transfer-Encoding: 8bit
       taille: 6GiB
       fonctionnalités: ecc
       configuration: errordetection=multi-bit-ecc
     *-bank
          description: DIMM RAM
          fabriquant: QEMU
          identifiant matériel: 0
          emplacement: DIMM 0
          taille: 6GiB

root@dev:~# free -h
               total       utilisé      libre     partagé tamp/cache   disponible
Mem:           5,8Gi       819Mi       4,8Gi       145Mi       512Mi       5,0Gi
Échange:          0B          0B          0B

**************************************************************************************
# Memory=6144
# Min memory=6144
# Ballooning=on
root@dev:~# lshw -c memory
  *-firmware
       description: BIOS
       fabriquant: SeaBIOS
       identifiant matériel: 0
       version: rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org
       date: 04/01/2014
       taille: 96KiB
  *-memory
       description: Mémoire Système
       identifiant matériel: 1000
       emplacement: Project-Id-Version: @(#) $Id: fr.po 2151 2010-03-15 20:26:20Z lyonel $Report-Msgid-Bugs-To: POT-Creation-Date: 2009-10-08 14:02+0200PO-Revision-Date: 2009-10-08 14:06+0100Last-Translator: Lyonel Vincent <lyonel@ezix.org>Language-Team: MIME-Version: 1.0Content-Type: text/plain; charset=UTF-8Content-Transfer-Encoding: 8bit
       taille: 6GiB
       fonctionnalités: ecc
       configuration: errordetection=multi-bit-ecc
     *-bank
          description: DIMM RAM
          fabriquant: QEMU
          identifiant matériel: 0
          emplacement: DIMM 0
          taille: 6GiB
          
root@dev:~# free -h
               total       utilisé      libre     partagé tamp/cache   disponible
Mem:           5,8Gi       817Mi       4,8Gi       145Mi       507Mi       5,0Gi
Échange:          0B          0B          0B
 
> total go down with time

Are you saying that the total memory value shrinks as the machine up time goes on as seen by the guest Operating System?

Have you been seeing the original out-of-memory error since increasing the maximum memory to 6 GB?
 
lshw show always the same values
But "free -h" show total ram go down in case 6144/2048/ballooning=on .... and it is normal, ram not used by guest is back to host
A all 2 others cases the total don't move.
 
lshw show always the same values
But "free -h" show total ram go down in case 6144/2048/ballooning=on .... and it is normal, ram not used by guest is back to host
A all 2 others cases the total don't move.


Check this out. It's the official pve wiki on memory allocation. It does note that:
"When the host is running low on RAM, the VM will then release some memory back to the host, swapping running processes if needed and starting the oom killer in last resort. The passing around of memory between host and guest is done via a special balloon kernel driver running inside the guest, which will grab or release memory pages from the host."

This makes me think your host machine is low on memory resources, could have a memory issue, a balloon kernel issue in the guest if no other nodes are affected, or some other issues that are described further on down in that article.

Based on that, how's your host machine's memory and % allocated looking? Are you over 80% or while running that guest, is your memory usage growing significantly more than expected? Do you think the oom killer is running on the host at any point?
 
  • Like
Reactions: UdoB
... ram not used by guest is back to host
That's not how ballooning works! As soon as the Proxmox host memory usage get above 80% (which is configurable in the latest version), Proxmox starts taking away memory from VMs based on the number of Shares: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#qm_ballooning . There is no negotiation with the (software inside the) VM. Memory is just taken away! Make sure (the software inside) your VM can run with the minimal memory setting (and can handle memory disappearing instantly).
 
  • Like
Reactions: ucholak and UdoB
That's not how ballooning works! As soon as the Proxmox host memory usage get above 80% (which is configurable in the latest version), Proxmox starts taking away memory from VMs based on the number of Shares: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#qm_ballooning . There is no negotiation with the (software inside the) VM. Memory is just taken away! Make sure (the software inside) your VM can run with the minimal memory setting (and can handle memory disappearing instantly).
Exactly what I was trying to get at with the reference to the wiki and asking about host memory usage.
 
  • Like
Reactions: leesteken
That's not how ballooning works! As soon as the Proxmox host memory usage get above 80% (which is configurable in the latest version), Proxmox starts taking away memory from VMs based on the number of Shares: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#qm_ballooning . There is no negotiation with the (software inside the) VM. Memory is just taken away! Make sure (the software inside) your VM can run with the minimal memory setting (and can handle memory disappearing instantly).
It's new in 8.4.1 ? because before I update PVE it work fine. I'm around 79.8% of host ram use, I kill some VM and test again
 
It's new in 8.4.1 ? because before I update PVE it work fine. I'm around 79.8% of host ram use, I kill some VM and test again
It being configurable is new but they way ballooning works is not (and it's always been this way IIRC). Maybe the update causes Proxmox to use a little more memory and causing ballooning to take away more memory? Causing the problem inside the VM because you set the minimal memory too low or the software cannot handle memory disappearing.
 
So based on that being close to 80% it sounds like your problem is when the host approaches 80% or passes 80% it is taking memory pages away from that guest based on the shares value if you adjust your host memory usage percentage to be 85 or 90% or get rid of some memory usage by reducing guests then you should be able to continue without issue. Either way if that Guest operating system is not OK with the decrease in ram then you’ll need to configure it as min = max or disable ballooning
 
Seem your are right. I stoped some VM, host ram usage is 59% then I start my VM with 4096/2048/Ballooning=on and lauch my apps on it....and seem to work...:)

As Leesteken said, I think the update cause PVE to use more memory.

I do more test and send feedback here...
 
Recommend mark the thread as resolved. Unless you identified additional issues that indicated a problem with memory ballooning or allocation.