High RAM usage in KVM processes & OOM errors

mfkrause

New Member
Dec 23, 2022
4
0
1
We are currently experiencing random VM shutdowns every few days due to OOM errors that seem to happen right during backups. Here's an excerpt from the syslog of one of those shutdowns (too long for the forums, so I put it in a pastebin): https://pastebin.com/BiFnGgPT

As you can see, there was a backup job in progress for the VMs on this specific host when the VM received an OOM at around 12:34.

Looking at the host summary, the memory indeed seems troubling:
CleanShot 2023-04-02 at 15.16.47@2x.jpg

However, I have no idea where this high mem usage comes from. ps aux sorted by memory usage on the host gives the following five top processes:
Code:
root      830748  0.6 14.0 9363860 4609776 ?     Sl   Jan22 646:48 /usr/bin/kvm -id 108 -name jitsi,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/108.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/108.pid -daemonize -smbios type=1,uuid=09c0d2a5-8a4b-42ec-81ff-7f7f1d3c3451 -smp 4,sockets=1,cores=4,maxcpus=4 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/108.vnc,password=on -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 4096 -object iothread,id=iothread-virtioscsi0 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5 -device vmgenid,guid=fdd63f13-dcb3-43c6-94f4-aeadf5bebba4 -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device VGA,id=vga,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:4c19ca5dc331 -drive file=/var/lib/vz/template/iso/debian-11.6.0-amd64-netinst.iso,if=none,id=drive-ide2,media=cdrom,aio=io_uring -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101 -device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0 -drive file=/var/lib/vz/images/108/vm-108-disk-0.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap108i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=EE:E8:9B:9D:5F:7B,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024,bootindex=102 -machine type=pc+pve0
root       10781  0.6 13.8 6177792 4529348 ?     Sl    2022 909:44 /usr/bin/kvm -id 102 -name nginx,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/102.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/102.pid -daemonize -smbios type=1,uuid=f53132da-f4af-4f04-8d09-68514984433a -smp 1,sockets=1,cores=1,maxcpus=1 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/102.vnc,password=on -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 512 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device vmgenid,guid=f1b59b6f-a2cc-4376-b52c-3aa2e51f562d -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device VGA,id=vga,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:4c19ca5dc331 -device lsi,id=scsihw0,bus=pci.0,addr=0x5 -drive file=/var/lib/vz/images/102/vm-102-disk-0.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=scsihw0.0,scsi-id=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=D6:BD:2B:DB:88:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024 -netdev type=tap,id=net1,ifname=tap102i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=12:12:11:E6:C8:5E,netdev=net1,bus=pci.0,addr=0x13,id=net1,rx_queue_size=1024,tx_queue_size=1024 -machine type=pc+pve0
root     2498308  3.4 13.1 5155332 4321304 ?     Sl   Mar31 100:30 /usr/bin/kvm -id 105 -name webcuststag,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/105.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/105.pid -daemonize -smbios type=1,uuid=48af498a-9a27-42bf-b42c-ca6a16fec402 -smp 2,sockets=1,cores=2,maxcpus=2 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/105.vnc,password=on -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 4096 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device vmgenid,guid=de4b3139-1cb8-48de-928e-5b1a5a8b0f39 -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device VGA,id=vga,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:4c19ca5dc331 -device lsi,id=scsihw0,bus=pci.0,addr=0x5 -drive file=/var/lib/vz/images/105/vm-105-disk-0.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=scsihw0.0,scsi-id=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap105i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=A6:B8:4E:94:C0:04,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024 -machine type=pc+pve0
root      395295  0.1 11.9 7019208 3929072 ?     Sl   Jan20 147:33 /usr/bin/kvm -id 107 -name frp,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/107.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/107.pid -daemonize -smbios type=1,uuid=956cf002-f8f8-4db0-b6f8-bda5ecd0d3a0 -smp 1,sockets=1,cores=1,maxcpus=1 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/107.vnc,password=on -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 2048 -object iothread,id=iothread-virtioscsi0 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5 -device vmgenid,guid=b9616501-d055-42b3-b4f7-5b832828298a -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device VGA,id=vga,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:4c19ca5dc331 -drive file=/var/lib/vz/template/iso/debian-11.6.0-amd64-netinst.iso,if=none,id=drive-ide2,media=cdrom,aio=io_uring -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101 -device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0 -drive file=/var/lib/vz/images/107/vm-107-disk-0.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap107i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=F2:5F:40:2D:EE:E0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024,bootindex=102 -machine type=pc+pve0
root      802768  1.6 11.0 4508112 3636120 ?     Sl   Mar03 722:43 /usr/bin/kvm -id 103 -name webintlprod,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/103.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/103.pid -daemonize -smbios type=1,uuid=0dfeda2a-9ee4-4018-82e3-46a4e990e05f -smp 2,sockets=1,cores=2,maxcpus=2 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/103.vnc,password=on -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 2048 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device vmgenid,guid=884f47c6-9cd2-4875-8182-f2f0822d0e6e -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device VGA,id=vga,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:4c19ca5dc331 -device lsi,id=scsihw0,bus=pci.0,addr=0x5 -drive file=/var/lib/vz/images/103/vm-103-disk-0.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=scsihw0.0,scsi-id=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap103i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=A2:C2:A1:A5:06:AA,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024 -machine type=pc+pve0

And for visual reference:
CleanShot 2023-04-02 at 15.43.58@2x.jpg

However, for example the two VMs at the very top are configured with 4GB RAM and 512MB RAM but their corresponding KVM processes both use around 14% of the hosts' RAM (~4.5GB)! The other three VMs also only have 1 to 2 GB of RAM configured. A little virtualization overhead is expected, but a whopping 4GB where a VM is only assigned .5GB? When I look at the summary of that VM, RAM usage also seems normal within the VM:
CleanShot 2023-04-02 at 15.20.10@2x.jpg

I've tried dropping the caches (echo 3 > /proc/sys/vm/drop_caches) which didn't have any real effect, however. Ballooning is enabled on all VMs, most of them don't effectively use it though (min = max RAM). Only some have ranges configured. The host only had 2GB of swap configured, which I've just increased to 8GB using an additional 6GB swapfile (the swap was full, too, as seen in the syslog - could this have been a cause)? We are not using ZFS but directory-based storage (and an external storage via NFS for backups).
I've rebooted the VMs using most of the RAM which instantly released most of the RAM pressure. However, I fear that the usage will start to grow again over the next days. Any ideas what the culprit could be?
 
Last edited:
Any help with this? Still experiencing this with host crashes every few weeks and no cause was able to be identified yet.
 
Can you show the configs of the VMs in comparison to what memory they are using? qm config {vmid}.

Some memory overhead is to be expected. But it would be interesting to see how the one with the potentially huge overhead is configured.

How much memory do you assign all guests in total?
Do you use ZFS?
 
Thanks for the reply -- here's the config of the one that had the severe overhead (VM 102 that used 4.5GB of RAM in the screenshot with .5GB allocated):
Code:
agent: 1
boot: order=scsi0
cores: 1
memory: 512
name: nginx
net0: virtio=D6:BD:2B:DB:88:F6,bridge=vmbr1,firewall=1
net1: virtio=12:12:11:E6:C8:5E,bridge=vmbr0,firewall=1
onboot: 1
scsi0: local:102/vm-102-disk-0.raw
smbios1: uuid=f53132da-f4af-4f04-8d09-68514984433a
vmgenid: f1b59b6f-a2cc-4376-b52c-3aa2e51f562d

I can't however fathom that this VM is the single point of failure. Even when it uses 4GB too much memory, there should still be enough memory on the host available when all other VMs max out their allocated memory (+ some overhead). Currently, there's ~22.5 GB allocated to VMs with 32 GB of available physical memory on that node.

Here's a current snapshot from ps aux with RAM usage in GB: https://pastebin.com/iMdx0RpL
The VM 102 from above currently "only" uses around 1.7 GB (instead of the 4.5 it used a month ago). VM 107 uses 4.31GB but is allocated 1.00/2.00GB (balloon). Config of that VM:
Code:
agent: 1
balloon: 1024
boot: order=scsi0;ide2;net0
cores: 1
ide2: local:iso/debian-11.6.0-amd64-netinst.iso,media=cdrom,size=388M
memory: 2048
meta: creation-qemu=7.1.0,ctime=1674244766
name: frp
net0: virtio=F2:5F:40:2D:EE:E0,bridge=vmbr1,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local:107/vm-107-disk-0.qcow2,iothread=1,size=16G
scsihw: virtio-scsi-single
smbios1: uuid=956cf002-f8f8-4db0-b6f8-bda5ecd0d3a0
sockets: 1
vmgenid: b9616501-d055-42b3-b4f7-5b832828298a

We don't use ZFS. Only a directory-based storage and some NFS backup storages.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!