[SOLVED] Proxmox Shutting Ubuntu 22.04.1 VM Down due to OOM-Kill

rvby1

New Member
Mar 1, 2023
4
0
1
Hi, all,

I'm new to Proxmox. Forgive me if the answer/solution here is obvious!

Since mid-afternoon yesterday, my Ubuntu Server 22.04.1 VM has randomly started shutting down. I find no evidence of errors in the logs of the Ubuntu machine itself, but I am seeing an OOM-Kill in Proxmox's logs.

I'm currently running version 7.3.3 of Proxmox. It's running on an i7-6700 with 16GB of DDR4 Ram and a 256GB SSD. I have 2x2TB HDD and 2x2TB SSD being passed to the Ubuntu VM, which has them set up as 2 Mirror vDevs in a zpool.

I have 14GB assigned to my Ubuntu VM, leaving what I thought was 2GB free for Proxmox. The VM seems to take up 13 of the 14 GB almost constantly after I do any writing, but I presume this is just the ZFS ARC.

I'm noticing that the Proxmox host has 8GB assigned to RAM, then 8GB assigned to swap. Is this the cause of my OOM issues? My VM is thinking it can use up 14GB, but it actually only has access to 8GB since 8GB is reserved for swap on the Proxmox host?

Here is the relevant error log:
Code:
Mar  1 00:52:28 proxmox-kitsune kernel: [659046.224786] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=pveproxy.service,mems_allowed=0,global_oom,task_memcg=/qemu.slice/101.scope,task=kvm,pid=167>Mar  1 00:52:28 proxmox-kitsune kernel: [659046.224838] Out of memory: Killed process 1675037 (kvm) total-vm:17593624kB, anon-rss:6575196kB, file-rss:196kB, shmem-rss:0kB, UID:0 pgtables:30100kB oom_score_ad>Mar  1 00:52:28 proxmox-kitsune systemd[1]: 101.scope: A process of this unit has been killed by the OOM killer.
Mar  1 00:52:28 proxmox-kitsune kernel: [659047.547013] fwbr101i0: port 2(tap101i0) entered disabled state

Any thoughts on what I could do to fix this?
 
Last edited:
The Out-Of-Memory usually kills the process with the most memory, which will be your VM.
If you are using PCI(e) passthrough then the VM will use all of its memory all the time (because the memory must be pinned into actual host memory because of device-initiated DMA).
ZFS uses 50% (8GB) of your host memory unless you limit it (see the Proxmox manual). 2GB is not much for Proxmox and ZFS together.

I don't understand everything your wrote and you did not show the VM configuration nor the Proxmox host memory usage (before the OOM) nor the ZFS ARC size (arc_summary), but I hope my comments will help you balance the memory usage between the VM and Proxmox better.

PS: Why are you running a single VM with most of the resources of the host? Proxmox is designed for many VMs in an enterprise context and not optimized for a single one.
 
As far as I understand he uses a single SSD for PVE (so probably no ZFS) and the 4 other disks are used as ZFS pools inside the VM.

Keep in mind that a VM with 14GB RAM might use more than 14GB. Those 14GB are virtual RAM. On top of that the KVM process virtualizing the VM got overhead. And in case you use some kind of cache like writeback this will also consume additional RAM of the host.
You can run top on the PVE host to see how much RAM the KVM process is using, which is shown in the "RES" coloumn.
free -h run on the host and inside the guest might also tell you a bit how your RAM is used.
 
Yes, correct--just a single SSD for PVE. PVE isn't running ZFS itself. The drives are passed through to the Ubuntu VM, which then mounts them in a zpool for its own storage needs.

Here is the .conf for the Ubuntu VM:
Code:
boot: order=scsi0;ide2;net0;ide0
cipassword: CENSOR
ciuser: CENSOR
cores: 1
ide0: local-lvm:vm-101-cloudinit,media=cdrom,size=4M
ide2: none,media=cdrom
memory: 14336
meta: creation-qemu=7.1.0,ctime=1674174433
name: CENSOR
net0: virtio=CENSOR,bridge=vmbr0,firewall=1
numa: 0
onboot: 1
ostype: l26
scsi0: local-lvm:vm-101-disk-0,discard=on,iothread=1,size=128G
scsi1: /dev/disk/by-id/ata-Acer_SSD_SA100_1920GB_ASAA52420100120,size=1875374424K
scsi2: /dev/disk/by-id/ata-Acer_SSD_SA100_1920GB_ASAA52420100117,size=1875374424K
scsi3: /dev/disk/by-id/wwn-0x5000c500db5750ae,size=1953514584K
scsi4: /dev/disk/by-id/wwn-0x5000c500db56a56d,size=1953514584K
scsihw: virtio-scsi-single
smbios1: uuid=CENSOR
sockets: 4
vmgenid: CENSOR

It's a bit difficult to show the memory use at the time of the OOM-kill, as it's happening sporadically. For example, I can access web apps hosted on the VM, use Plex on my TV, etc. without any issues sometimes--only for it to randomly run OOM and kill the VM.

Here is a summary for the host while the VM is running, though:
1677706693523.png

You can view my arc_summary from my Ubuntu VM here: https://pastebin.com/ThZbPXzP

As for this question...
PS: Why are you running a single VM with most of the resources of the host? Proxmox is designed for many VMs in an enterprise context and not optimized for a single one.
There's a couple reasons. First, I've never used a Type 1 hypervisor, and I wanted to mess around with it in my home lab. Second, I wanted to have it and not need it rather than need it and not have it. This machine is meant to be my main server for the foreseeable future, so I wanted it to be capable of having more VMs later, even if I don't currently need them.
 
Last edited:
So you actually only got 8GB of RAM. Then I wouldn't allocate more than 5 or 6 GB to the VM. And instead of 1 core + 4 sockets I would use 1 socket + 4 cores.

And 5-6GB RAM is very low for a VM with 8TB of ZFS pools. You really should upgrade your RAM.
 
So you actually only got 8GB of RAM. Then I wouldn't allocate more than 5 or 6 GB to the VM. And instead of 1 core + 4 sockets I would use 1 socket + 4 cores.

And 5-6GB RAM is very low for a VM with 8TB of ZFS pools. You really should upgrade your RAM.
Are you saying the system is only seeing 8GB of RAM, or that the system is using 8GB for the swap out of the 16GB available, leaving 8GB for other purposes? I'm not 100% familiar with what the swap and KSM sharing are doing. Would love any resources that you could send over explaining it!

On the 1 core + 4 sockets versus 1 socket + 4 cores, any particular reason why? Again, fresh with Proxmox, so forgive me if the answers here are obvious. :p
 
Are you saying the system is only seeing 8GB of RAM, or that the system is using 8GB for the swap out of the 16GB available, leaving 8GB for other purposes? I'm not 100% familiar with what the swap and KSM sharing are doing. Would love any resources that you could send over explaining it!
Swap isn't RAM. Swap is space on your SSDs where data from RAM get swapped out so the host doesn't need to kill VMs. So yes, according to PVE you only got 8GB of RAM and 8GB of SSD used as SWAP. So with 2GB for PVE, you get at most 6GB you could allocate to guests. Otherwise you are overprovisioning your RAM and that easily kills VMs because of OOM.
On the 1 core + 4 sockets versus 1 socket + 4 cores, any particular reason why? Again, fresh with Proxmox, so forgive me if the answers here are obvious. :p
Your hardware only got 1 CPU with 4 cores. Why should you tell your VM to virtualize 4 CPUs with 1 Core each?
 
Last edited:
Swap isn't RAM. Swap is space on your SSDs where data from RAM get swapped out so the host doesn't need to kill VMs. So yes, according to PVE you only got 8GB of RAM and 8GB of SSD used as SWAP. So with 2GB for PVE, you get at most 6GB you could allocate to guests. Otherwise you are overprovisioning your RAM and that easily kills VMs because of OOM.
Gosh dangit. I think this is my issue. I swapped in a known good set of 16GB about a week ago from another machine, but it looks like one of the sticks died. Tried reseating, but that got me into a DRAM Boot issue. A new kit seems to be working fine, and Proxmox is now detecting all 16GB.

I'm sure there is some use case where you'd want to allocate more RAM to a VM than the host can possibly provide, but it'd be nice if there was some sort of dumb mistake proofing--maybe a warning triangle next to the RAM amount in the VM or something to indicate that it's more than the host has. ;P Then again, if you don't think SWAP is part of the RAM like I did, I guess you'd figure it out pretty quick!

Your hardware only got 1 CPU with 4 cores. Why should you tell your VM to virtualize 4 CPUs with 1 Core each?
Well, when you put it like that... :p I'll make the change.

Thanks for the tip and the troubleshooting help!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!