oom-kill'ing VM during node to node copy of large file

arturasb

New Member
Mar 16, 2021
5
0
1
54
Hi.

I was trying to find some useful info here and elsewhere, but nothing was close. Trying here...

Setup

  1. PVE node1 on Dell T340 (32GB of RAM, mirrored ZFS rpool on SSDs, mirrored ZFS datapool on HDD disks )
    1. Windows Server 2019 VM (4 vCPU, 16 GB of virtual RAM, 1T GB storage on datapool)
  2. PVE node2 on Dell T30 (16 GB of RAM, mirrored ZFS rpool on SSDs, mirrored ZFS datapool on HDD disks)
    1. Ubuntu Server 20.04 VM (2 vCPU, 6 GB of virtual RAM, 32 GB storage on rpool)
      1. Virtual disk added to the VM - 1.6T file of raw format on datapool in node2
      2. Samba-based file share (Ext4 partition created on the virtual 1.6T disk, directories on Ext4 shared as Samba shares).
Goal

  1. Add one of Ubuntu Server Samba shares on node2 to node1 as CIFS storage.
  2. Add backup task on the node1 to backup Windows Server 2019 VM to the CIFS storage (a Samba share on Ubuntu Server VM in node2).
  3. Backup files of Windows Server 2019 on the backup share on Ubuntu Server VM in node2 will be backed up to an "offsite master backup".

The problem

Node2 PVE oom-kill's Ubuntu Server VM during the backup, that's ~80GB of data. The KVM process takes almost all remaing RAM while "inside" the VM itself there are only <1GB of 6GB total consumed. There are no clear signs that there is not enough memory for PVE or Ubuntu Server VM. I even tried to copy backup files between PVE nodes bypassing VM layer on node2 and it worked with no problem. Then I tried to share backup directory via NFS - it failed same way, node2 PVE oom-kill'ed the Ubuntu Server VM. It is confusing, because few month ago this backup method ZFS pool->Samba share worked great.

Has anyone experienced similar issues with Ubuntu Server 20.04 or Samba ? Any hints what should I check next to debug this situation ?

Regards
Artūras B.
 
Last edited:
zfs used half of the memory by default, do you have any other things running on node ? nonetheless 8 GB(zfs) + 6GB ubuntu leaves only 2 gb for the host which may work but is not ideal
 
zfs used half of the memory by default, do you have any other things running on node ? nonetheless 8 GB(zfs) + 6GB ubuntu leaves only 2 gb for the host which may work but is not ideal

The HW machine has 16 GB of RAM, 6 GB is given to Ubuntu Server VM, utilization is ~20-30% of those 6GB. This VM hosts Samba file server and couple of Docker containers, nothing extraordinary. This leaves 10 GB of RAM to the PVE host.
I tried to see what's happening during file copy on the host PVE with atop. The pattern is like this:
  • Larger files (dozens of MB and more) will cause continuous RAM utilization spikes leaving few hundreds of free memory (even down to ~550-700MB);
  • At the same time swap shows 0 (zero) capacity in use
  • Nothing is changing inside the VM from RAM utilization perspective
  • The KVM process on the host takes ~40% of RAM and ir will not release it after file transfer is over. The KVM process only uses ~10% of RAM after reboot, before file transfer test
  • After file transfer is finished, in couple of minutes there is ~1.9-2.3GB of free RAM, but the KVM process will still hold ~30% of RAM
I'm now considering to change the way I share datapool with the VM. My plan is to get rid of EXT partition on top of the ZFS datapool, move data directly to datapool, share if via NFS to the VM and the share it on VM using Samba. And of course, additional 32 GB of RAM are on its way to my homelab :)

Any other good practices of sharing ZFS pools with VMs and then using Samba on VM to share it to users and other systems for convenience ?
I want PVE to manage ZFS pool, and VM to manage data on it and provide convenient file share (with user/group permissions) for people and other machines.
 
Following on my oom-kill story. I couldn't find what exactly is tipping host over the OOM point. My RAM upgrade arrived while I was trying to figure out what was wrong. Now with 48 GB of RAM in total the problem of large file copying has disappeared. I guess that with previous 16 GB I was left on "thin ice" and the host could easily go into OOM.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!