Ballooning memory and ZFS fighting for available RAM?

lubbdi

Member
Dec 18, 2016
5
0
21
Hi all,

I'm running a cluster of three Proxmox nodes with replicated ZFS RAID10 pools on identical machines. Each node has 64 GB RAM, all VMs have ballooning activated.

On one of the nodes, a VM has ballooning memory set to 6 GB (minimum) and 16 GB maximum, and I see a lot of these errors in dmesg inside the VM:

Code:
[1274631.052945] kworker/0:5: page allocation failure: order:0, mode:0x6310ca(GFP_HIGHUSER_MOVABLE|__GFP_NORETRY|__GFP_NOMEMALLOC), nodemask=(null)
[1274631.052947] kworker/0:5 cpuset=/ mems_allowed=0
[1274631.052953] CPU: 0 PID: 29716 Comm: kworker/0:5 Not tainted 4.19.0-16-amd64 #1 Debian 4.19.181-1
[1274631.052955] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[1274631.052962] Workqueue: events_freezable update_balloon_size_func [virtio_balloon]
[1274631.052964] Call Trace:
[1274631.052978]  dump_stack+0x66/0x81
[1274631.052982]  warn_alloc.cold.120+0x6c/0xec
[1274631.052987]  __alloc_pages_slowpath+0xc73/0xcb0
[1274631.052993]  ? vring_unmap_one+0x16/0x70 [virtio_ring]
[1274631.052995]  __alloc_pages_nodemask+0x28b/0x2b0
[1274631.052998]  update_balloon_size_func+0x109/0x2c0 [virtio_balloon]
[1274631.053002]  process_one_work+0x1a7/0x3a0
[1274631.053004]  worker_thread+0x30/0x390
[1274631.053006]  ? create_worker+0x1a0/0x1a0
[1274631.053008]  kthread+0x112/0x130
[1274631.053010]  ? kthread_bind+0x30/0x30
[1274631.053013]  ret_from_fork+0x35/0x40
[1274631.053038] virtio_balloon virtio0: Out of puff! Can't get 1 pages

Swap space is nearly full constantly. The VM is running Debian 10, CPU- and memory hotplugging is activated (as described in the wiki), QEMU guest agent is installed. Side note: There is one Windows 10 VM running, besides some other small VMs, on this node.

Edit: The hosts RAM is not overcommitted. Committed RAM of running VMs is currently 32 GB (ballooning max), or 20 GB (ballooning min), respectively.

Output of pveversion -v:
Code:
# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.106-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-storage-perl: 6.3-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.13-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-9
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.3-1
pve-qemu-kvm: 5.2.0-5
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-10
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1

My assumption is that ZFS is taking all RAM it can get, and the VM is constantly fighting against it, trying to get more pages. So far it did not seem to affect the reliability of either said VM, other VMs or the host.

Is this a known problem? Or no problem at all? I'm happy to provide more detail when necessary. Thanks!

Here's a visualization of the described phenomenon (see attachment).
 

Attachments

  • ramfight.png
    ramfight.png
    45.9 KB · Views: 7
Last edited:

aaron

Proxmox Staff Member
Staff member
Jun 3, 2019
2,056
269
83
ZFS will be taking up to 50% of the RAM if available as read cache. Swap being used is not necessarily a bad thing. See this blog entry that explains it well.

Thanks for the performance graph! Monitoring the ZFS ARC size would also be interesting, to see if the pattern matches.

You can try to limit the RAM ZFS will use. Don't make it too small as having a read cache is beneficial.
 

lubbdi

Member
Dec 18, 2016
5
0
21
Thank you for your reply, aaron!

I was not too concerned about swap being used, but primarily the stack traces. Luckily, I found some graphs for the time period in question. rpool is a mirror pool with two SSDs (for system and unrelated), preontank is the RAID10 pool with four 4 TB disks and a log device (Intel Optane P4801X, 16 GB partition), where the VM is running on.

Can you draw any conclusions from this?
 

Attachments

  • arcsize.png
    arcsize.png
    385.7 KB · Views: 4
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!