We are currently experiencing random VM shutdowns every few days due to OOM errors that seem to happen right during backups. Here's an excerpt from the syslog of one of those shutdowns (too long for the forums, so I put it in a pastebin): https://pastebin.com/BiFnGgPT
As you can see, there was a backup job in progress for the VMs on this specific host when the VM received an OOM at around 12:34.
Looking at the host summary, the memory indeed seems troubling:

However, I have no idea where this high mem usage comes from. ps aux sorted by memory usage on the host gives the following five top processes:
And for visual reference:

However, for example the two VMs at the very top are configured with 4GB RAM and 512MB RAM but their corresponding KVM processes both use around 14% of the hosts' RAM (~4.5GB)! The other three VMs also only have 1 to 2 GB of RAM configured. A little virtualization overhead is expected, but a whopping 4GB where a VM is only assigned .5GB? When I look at the summary of that VM, RAM usage also seems normal within the VM:

I've tried dropping the caches (echo 3 > /proc/sys/vm/drop_caches) which didn't have any real effect, however. Ballooning is enabled on all VMs, most of them don't effectively use it though (min = max RAM). Only some have ranges configured. The host only had 2GB of swap configured, which I've just increased to 8GB using an additional 6GB swapfile (the swap was full, too, as seen in the syslog - could this have been a cause)? We are not using ZFS but directory-based storage (and an external storage via NFS for backups).
I've rebooted the VMs using most of the RAM which instantly released most of the RAM pressure. However, I fear that the usage will start to grow again over the next days. Any ideas what the culprit could be?
As you can see, there was a backup job in progress for the VMs on this specific host when the VM received an OOM at around 12:34.
Looking at the host summary, the memory indeed seems troubling:

However, I have no idea where this high mem usage comes from. ps aux sorted by memory usage on the host gives the following five top processes:
Code:
root 830748 0.6 14.0 9363860 4609776 ? Sl Jan22 646:48 /usr/bin/kvm -id 108 -name jitsi,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/108.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/108.pid -daemonize -smbios type=1,uuid=09c0d2a5-8a4b-42ec-81ff-7f7f1d3c3451 -smp 4,sockets=1,cores=4,maxcpus=4 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/108.vnc,password=on -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 4096 -object iothread,id=iothread-virtioscsi0 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5 -device vmgenid,guid=fdd63f13-dcb3-43c6-94f4-aeadf5bebba4 -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device VGA,id=vga,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:4c19ca5dc331 -drive file=/var/lib/vz/template/iso/debian-11.6.0-amd64-netinst.iso,if=none,id=drive-ide2,media=cdrom,aio=io_uring -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101 -device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0 -drive file=/var/lib/vz/images/108/vm-108-disk-0.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap108i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=EE:E8:9B:9D:5F:7B,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024,bootindex=102 -machine type=pc+pve0
root 10781 0.6 13.8 6177792 4529348 ? Sl 2022 909:44 /usr/bin/kvm -id 102 -name nginx,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/102.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/102.pid -daemonize -smbios type=1,uuid=f53132da-f4af-4f04-8d09-68514984433a -smp 1,sockets=1,cores=1,maxcpus=1 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/102.vnc,password=on -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 512 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device vmgenid,guid=f1b59b6f-a2cc-4376-b52c-3aa2e51f562d -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device VGA,id=vga,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:4c19ca5dc331 -device lsi,id=scsihw0,bus=pci.0,addr=0x5 -drive file=/var/lib/vz/images/102/vm-102-disk-0.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=scsihw0.0,scsi-id=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap102i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=D6:BD:2B:DB:88:F6,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024 -netdev type=tap,id=net1,ifname=tap102i1,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=12:12:11:E6:C8:5E,netdev=net1,bus=pci.0,addr=0x13,id=net1,rx_queue_size=1024,tx_queue_size=1024 -machine type=pc+pve0
root 2498308 3.4 13.1 5155332 4321304 ? Sl Mar31 100:30 /usr/bin/kvm -id 105 -name webcuststag,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/105.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/105.pid -daemonize -smbios type=1,uuid=48af498a-9a27-42bf-b42c-ca6a16fec402 -smp 2,sockets=1,cores=2,maxcpus=2 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/105.vnc,password=on -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 4096 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device vmgenid,guid=de4b3139-1cb8-48de-928e-5b1a5a8b0f39 -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device VGA,id=vga,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:4c19ca5dc331 -device lsi,id=scsihw0,bus=pci.0,addr=0x5 -drive file=/var/lib/vz/images/105/vm-105-disk-0.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=scsihw0.0,scsi-id=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap105i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=A6:B8:4E:94:C0:04,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024 -machine type=pc+pve0
root 395295 0.1 11.9 7019208 3929072 ? Sl Jan20 147:33 /usr/bin/kvm -id 107 -name frp,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/107.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/107.pid -daemonize -smbios type=1,uuid=956cf002-f8f8-4db0-b6f8-bda5ecd0d3a0 -smp 1,sockets=1,cores=1,maxcpus=1 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/107.vnc,password=on -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 2048 -object iothread,id=iothread-virtioscsi0 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device pci-bridge,id=pci.3,chassis_nr=3,bus=pci.0,addr=0x5 -device vmgenid,guid=b9616501-d055-42b3-b4f7-5b832828298a -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device VGA,id=vga,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:4c19ca5dc331 -drive file=/var/lib/vz/template/iso/debian-11.6.0-amd64-netinst.iso,if=none,id=drive-ide2,media=cdrom,aio=io_uring -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101 -device virtio-scsi-pci,id=virtioscsi0,bus=pci.3,addr=0x1,iothread=iothread-virtioscsi0 -drive file=/var/lib/vz/images/107/vm-107-disk-0.qcow2,if=none,id=drive-scsi0,format=qcow2,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap107i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=F2:5F:40:2D:EE:E0,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024,bootindex=102 -machine type=pc+pve0
root 802768 1.6 11.0 4508112 3636120 ? Sl Mar03 722:43 /usr/bin/kvm -id 103 -name webintlprod,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/103.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/103.pid -daemonize -smbios type=1,uuid=0dfeda2a-9ee4-4018-82e3-46a4e990e05f -smp 2,sockets=1,cores=2,maxcpus=2 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/103.vnc,password=on -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 2048 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device vmgenid,guid=884f47c6-9cd2-4875-8182-f2f0822d0e6e -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -device VGA,id=vga,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:4c19ca5dc331 -device lsi,id=scsihw0,bus=pci.0,addr=0x5 -drive file=/var/lib/vz/images/103/vm-103-disk-0.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=io_uring,detect-zeroes=on -device scsi-hd,bus=scsihw0.0,scsi-id=0,drive=drive-scsi0,id=scsi0,bootindex=100 -netdev type=tap,id=net0,ifname=tap103i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=A2:C2:A1:A5:06:AA,netdev=net0,bus=pci.0,addr=0x12,id=net0,rx_queue_size=1024,tx_queue_size=1024 -machine type=pc+pve0
And for visual reference:

However, for example the two VMs at the very top are configured with 4GB RAM and 512MB RAM but their corresponding KVM processes both use around 14% of the hosts' RAM (~4.5GB)! The other three VMs also only have 1 to 2 GB of RAM configured. A little virtualization overhead is expected, but a whopping 4GB where a VM is only assigned .5GB? When I look at the summary of that VM, RAM usage also seems normal within the VM:

I've tried dropping the caches (echo 3 > /proc/sys/vm/drop_caches) which didn't have any real effect, however. Ballooning is enabled on all VMs, most of them don't effectively use it though (min = max RAM). Only some have ranges configured. The host only had 2GB of swap configured, which I've just increased to 8GB using an additional 6GB swapfile (the swap was full, too, as seen in the syslog - could this have been a cause)? We are not using ZFS but directory-based storage (and an external storage via NFS for backups).
I've rebooted the VMs using most of the RAM which instantly released most of the RAM pressure. However, I fear that the usage will start to grow again over the next days. Any ideas what the culprit could be?
Last edited: