VM memory not ballooning?

eduncan911

Member
Mar 12, 2021
22
11
23
70
I am running a TrueNAS VM (24.10) on the latest Proxmox 8.2 (and will upgrade to 8.3 soon).

I am setting ballooning to start at 8G and end at 64GB. However, when I do something memory intensive (e.g. changing all posix ACLs across 100TB on a ZFS pool within the VM), this quickly runs into out-of-memory kernel panics that fill up the logs.

Also, when monitoring with something like htop and as soon as it gets past the 8000M-ish barrier, it runs into OOM issues.

qemu-guest-agent comes installed within TrueNAS, and is running:

Code:
# systemctl status qemu-guest-agent
● qemu-guest-agent.service - QEMU Guest Agent
     Loaded: loaded (/lib/systemd/system/qemu-guest-agent.service; static)
     Active: active (running) since Fri 2024-12-27 15:49:19 EST; 5h 28min ago
   Main PID: 3410 (qemu-ga)
      Tasks: 2 (limit: 76774)
     Memory: 1.2M
        CPU: 10.776s
     CGroup: /system.slice/qemu-guest-agent.service
             └─3410 /usr/sbin/qemu-ga

Dec 27 15:49:19 truenas-a systemd[1]: Started qemu-guest-agent.service - QEMU Guest Agent.
Dec 27 21:16:43 truenas-a qemu-ga[3410]: info: guest-ping called

The VM shows 64GB available to it though:

Code:
# free -mh
               total        used        free      shared  buff/cache   available
Mem:            62Gi        26Gi        36Gi        21Mi       736Mi        36Gi
Swap:             0B          0B          0B
(note: this above was me increasing it to 32GB to get around the OOM errors for now)

Is there something else I can do to make this more automatic?
 
I guess you have enough RAM on the host to fulfill that RAM need? And that virtio_balloon is loaded on TrueNAS? (lsmod | grep virtio_balloon)

Also, I'd give the VM a bit of swap, in case balloon deflation doesn't happen fast enough for process to get enough RAM... The "precise" number of 8G could be a bug symptom though... I'd try to see if that happens on another VM, stressing it to make that happen. Does dmesg reveal anything except the oom?
 
Can you please show the VM configuration file (or qm config)? When the Proxmox host get to 80% memory usage, it starts taking away memory from tthe VM regardless of whether the VM can give it: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#qm_memory . Maybe 8GB is simply not enough? Maybe the ZFS inside the VM expects to be able to use 50% of the VM memory, but less than 25% is available (when ballooning)?
 
Maybe the ZFS inside the VM expects to be able to use 50% of the VM memory, but less than 25% is available (when ballooning)?

This is exactly my conclusion as the OOM only start after massive ZFS "work" (like, setting ACLs recursively over a few million files, changing properties on a few 100 datasets, etc).

Only way I've gotten around this is to just give the VM all 64VM of ram. Have had zero issues ever since.
 
However, when I do something memory intensive (e.g. changing all posix ACLs across 100TB on a ZFS pool within the VM), this quickly runs into out-of-memory kernel panics that fill up the logs.
However ... changing all posix ACLs across 100TB is *NOT* a memory intensive action as it modifies dir by dir and file by file the metadata information and so it has a very small memory footprint of the used tool setfacl which is just called x-million times probably called itself from find cmd onto the dataset !!
Maybe it's a problem of your (cmd find or a list you use by first collecting all dir/file metadata before processing ??) routine you use to reach your new acl rule rollout.