OOM Error For Ubuntu LXC Containers

Jared9922 · Aug 12, 2021

Hello

I am currently trying to setup some Ubuntu containers using ProxMox for our company. The containers downloads just fine but when I try to run services I am getting a OOM error. I understand that this means "out of memory error". I have allocated 32 gbs of memory for this container and know without a doubt that this is more than enough to run the service I am trying to run. I know this because of two things. 1. I can run the service on a laptop with 16 gb of memory max 2. It was working to run it on a container with less memory before. My base system has 64GB's of ram. Also when I look at the OOM scores on the ProxMox screen that shows the error, they are all shown as 0. Also when I look at my ProxMox web gui it shows that I have a ton of available memory that isn't being used.

I was wondering if the error could be because RAM is still being used from containers I have deleted? (I have made about three other containers trying to troubleshoot other problems that are now removed from the ProxMox list). Maybe something is using way more memory then it should?

Any trouble shooting advice would be greatly appreciated. Also I can send any files that are needed to help solve this issue.

Thanks,
Jared

fabian · Aug 13, 2021

one thing to check is whether you have any tmpfs inside the cont ainer that might be filled (those are accounted to the containers memory usage since they are memory backed) - usually at least /run is setup as tmpfs and systemd puts the journal there by default on most distros.

otherwise the full OOM output from the host journal might be helpful.

Jared9922 · Aug 13, 2021

I'm pretty new to ProxMox would you be able to explain what you mean when you say, "check whether I have any tmpfs inside the container that might be filled." Where can I check this? How can I fix this?

Also debugging has been tough because I can't seem to figure out how to scroll in the ProxMox console. Everyone online has been saying that to scroll you use Shift + Page up or down. This isn't working for me any suggestions?

Also I am checking my memory usage the best way? Is the ProxMox UI accurate? Maybe something obvious is using more memory but I just can't see it? What is the best way to check this?

Here is the JournalCtl from Ubuntu

Code:

-- The process' exit code is 'killed' and its exit status is 9.
Aug 13 15:05:11 Elasticsearch systemd[1]: elasticsearch.service: Failed with result 'signal'.
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit elasticsearch.service has entered the 'failed' state with result 'signal'.
Aug 13 15:05:11 Elasticsearch systemd[1]: Failed to start Elasticsearch.
-- Subject: A start job for unit elasticsearch.service has failed
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- A start job for unit elasticsearch.service has finished with a failure.
--
-- The job identifier is 281 and the job result is failed.
Aug 13 15:05:11 Elasticsearch systemd[1]: elasticsearch.service: Consumed 7.120s CPU time.
-- Subject: Resources consumed by unit runtime
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- The unit elasticsearch.service completed and consumed the indicated resources.

Here is the error code from ProxMox:

Code:

Aug 13 06:14:38 pve kernel: GC Thread#0 invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Aug 13 06:14:38 pve kernel: CPU: 0 PID: 13753 Comm: GC Thread#0 Tainted: P           O      5.11.22-1-pve #1
Aug 13 06:14:38 pve kernel: Hardware name: ASUS System Product Name/ROG CROSSHAIR VIII DARK HERO, BIOS 3302 03/05/2021
Aug 13 06:14:38 pve kernel: Call Trace:
Aug 13 06:14:38 pve kernel:  dump_stack+0x70/0x8b
Aug 13 06:14:38 pve kernel:  dump_header+0x4f/0x1f6
Aug 13 06:14:38 pve kernel:  oom_kill_process.cold+0xb/0x10
Aug 13 06:14:38 pve kernel:  out_of_memory+0x1cf/0x520
Aug 13 06:14:38 pve kernel:  mem_cgroup_out_of_memory+0x139/0x150
Aug 13 06:14:38 pve kernel:  try_charge+0x750/0x7b0
Aug 13 06:14:38 pve kernel:  mem_cgroup_charge+0x8a/0x280
Aug 13 06:14:38 pve kernel:  __add_to_page_cache_locked+0x34b/0x3a0
Aug 13 06:14:38 pve kernel:  ? scan_shadow_nodes+0x30/0x30
Aug 13 06:14:38 pve kernel:  add_to_page_cache_lru+0x4d/0xd0
Aug 13 06:14:38 pve kernel:  pagecache_get_page+0x161/0x3b0
Aug 13 06:14:38 pve kernel:  filemap_fault+0x6da/0xa30
Aug 13 06:14:38 pve kernel:  ? xas_load+0x9/0x80
Aug 13 06:14:38 pve kernel:  ? xas_find+0x17a/0x1d0
Aug 13 06:14:38 pve kernel:  __do_fault+0x3c/0xe0
Aug 13 06:14:38 pve kernel:  handle_mm_fault+0x1516/0x1a70
Aug 13 06:14:38 pve kernel:  do_user_addr_fault+0x1a3/0x450
Aug 13 06:14:38 pve kernel:  ? exit_to_user_mode_prepare+0x75/0x190
Aug 13 06:14:38 pve kernel:  exc_page_fault+0x6c/0x150
Aug 13 06:14:38 pve kernel:  ? asm_exc_page_fault+0x8/0x30
Aug 13 06:14:38 pve kernel:  asm_exc_page_fault+0x1e/0x30
Aug 13 06:14:38 pve kernel: RIP: 0033:0x7f4c816db970
Aug 13 06:14:38 pve kernel: Code: Unable to access opcode bytes at RIP 0x7f4c816db946.
Aug 13 06:14:38 pve kernel: RSP: 002b:00007f4c1aaa7df0 EFLAGS: 00010206
Aug 13 06:14:38 pve kernel: RAX: 00007f4c39aa9000 RBX: 00007f4c8093e848 RCX: 00007f4c8229fa65
Aug 13 06:14:38 pve kernel: RDX: 0000000000001000 RSI: 00007f4c432a9000 RDI: 00007f4c43232000
Aug 13 06:14:38 pve kernel: RBP: 00007f4c1aaa7df0 R08: 0000000000000000 R09: 00000000ffffffff
Aug 13 06:14:38 pve kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007f4c8093e830
Aug 13 06:14:38 pve kernel: R13: 00007f4c1aaa7e2c R14: 00000000ffffffff R15: 00007f4c7c069880
Aug 13 06:14:38 pve kernel: memory: usage 32768000kB, limit 32768000kB, failcnt 16067
Aug 13 06:14:38 pve kernel: swap: usage 0kB, limit 2048000kB, failcnt 0
Aug 13 06:14:38 pve kernel: Memory cgroup stats for /lxc/100:
Aug 13 06:14:38 pve kernel: anon 33481383936
file 0
kernel_stack 933888
pagetables 67719168
percpu 2063872
sock 0
shmem 135168
file_mapped 0
file_dirty 270336
file_writeback 0
anon_thp 0
file_thp 0
shmem_thp 0
inactive_anon 33481383936
active_anon 0
inactive_file 188416
active_file 0
unevictable 0
slab_reclaimable 889024
slab_unreclaimable 4096936
slab 4985960
workingset_refault_anon 0
workingset_refault_file 20937
workingset_activate_anon 0
workingset_activate_file 5
workingset_restore_anon 0
workingset_restore_file 5
workingset_nodereclaim 429
pgfault 41367991
pgmajfault 49096
pgrefill 53507
pgscan 59764
pgsteal 49131
pgactivate 39883
pgdeactivate 40279
pglazyfree 0
pglazyfreed 0
thp_fault_alloc 0
thp_collapse_alloc 0
Aug 13 06:14:38 pve kernel: Tasks state (memory values in pages):
Aug 13 06:14:38 pve kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Aug 13 06:14:38 pve kernel: [   3728] 100000  3728    42171      638    90112        0             0 systemd
Aug 13 06:14:38 pve kernel: [   3969] 100000  3969     1455      202    49152        0             0 login
Aug 13 06:14:38 pve kernel: [   4744] 100000  4744     1227      334    45056        0             0 bash
Aug 13 06:14:38 pve kernel: [  13510] 100000 13510     1042      133    45056        0             0 elasticsearch
Aug 13 06:14:38 pve kernel: [  13521] 100000 13521     1575       66    53248        0             0 systemctl
Aug 13 06:14:38 pve kernel: [  13522] 100000 13522     4103      201    73728        0             0 systemd-tty-ask
Aug 13 06:14:38 pve kernel: [   3970] 100000  3970      660       28    40960        0             0 agetty
Aug 13 06:14:38 pve kernel: [   4162] 100000  4162     9509      120    69632        0             0 master
Aug 13 06:14:38 pve kernel: [   4164] 100102  4164     9574      121    61440        0             0 pickup
Aug 13 06:14:38 pve kernel: [   4165] 100102  4165     9587      122    65536        0             0 qmgr
Aug 13 06:14:38 pve kernel: [   3852] 100000  3852     8810      255   106496        0             0 systemd-journal
Aug 13 06:14:38 pve kernel: [   3881] 100105  3881     4638      252    81920        0             0 systemd-network
Aug 13 06:14:38 pve kernel: [   3904] 100106  3904     6029     1058    86016        0             0 systemd-resolve
Aug 13 06:14:38 pve kernel: [   3906] 100000  3906    58126      196    81920        0             0 accounts-daemon
Aug 13 06:14:38 pve kernel: [   3907] 100000  3907      956       61    40960        0             0 cron
Aug 13 06:14:38 pve kernel: [   3908] 100100  3908     1848      141    57344        0             0 dbus-daemon
Aug 13 06:14:38 pve kernel: [   3911] 100000  3911     6540     1898    86016        0             0 networkd-dispat
Aug 13 06:14:38 pve kernel: [   3912] 100101  3912    38667      164    69632        0             0 rsyslogd
Aug 13 06:14:38 pve kernel: [   3915] 100000  3915     4160      227    77824        0             0 systemd-logind
Aug 13 06:14:38 pve kernel: [   3968] 100000  3968      660       27    45056        0             0 agetty
Aug 13 06:14:38 pve kernel: [  13523] 100110 13523  8613386  8167864 65650688        0             0 java
Aug 13 06:14:38 pve kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=ns,mems_allowed=0,oom_memcg=/lxc/100,task_memcg=/lxc/100/ns/system.slice/elasticsearch.service,task=java,pid=13523,uid=100110
Aug 13 06:14:38 pve kernel: Memory cgroup out of memory: Killed process 13523 (java) total-vm:34453544kB, anon-rss:32671456kB, file-rss:0kB, shmem-rss:0kB, UID:100110 pgtables:64112kB oom_score_adj:0
Aug 13 06:14:38 pve kernel: oom_reaper: reaped process 13523 (java), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

UdoB · Aug 14, 2021

Good morning,

to see mounted tmpfs use

Code:

~# mount | grep tmpfs

But of course not all mounted tmpfs are bad - far from it! As usual: "it depends"...

This is a small Linux machine, and it is absolutely fine:

Code:

~# mount | grep tmpfs
udev on /dev type devtmpfs (rw,nosuid,relatime,size=1006548k,nr_inodes=251637,mode=755)
tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=204292k,nr_inodes=255361,mode=755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,size=1021444k,nr_inodes=255361)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k,nr_inodes=255361)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=1021444k,nr_inodes=255361,mode=755)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=99428k,nr_inodes=226689,mode=700)

Regarding OOM: in the last lines it tells that "java" is the bad guy. You need to limit the excessive hunger of elasticsearch by means of some configuration...

Best regards

keeka · Aug 16, 2021

OP, assuming you're using PVE 7 here. I had an existing ubuntu 18.04 container running an up to date elk stack where, after the upgrade, I had to set explicit jvm heap limits. For example:

Code:

# /etc/elasticsearch/jvm.options.d/heap.options
-Xms4g
-Xmx4g

Something changed with LXC between PVE-6.4 and 7.0.

Jared9922 · Aug 16, 2021

@keeka This fixed my issue! Would you mind explaining why setting this values fixes the issue? Also why doesn't Elasticsearch just configure this setting by itself correctly like it is supposed to?

keeka · Aug 17, 2021

Jared9922 said:
why doesn't Elasticsearch just configure this setting by itself correctly like it is supposed to?

I don't know. Perhaps jvm is not seeing (or seeing a higher value) the memory allocated to the container. Maybe it's seeing total host memory? I did not attempt to debug log jvm further. In my ignorance, I put it down to running a container created using a 6.4 template under the new lxc environment in pve-7. There are some changes to lxc in Proxmox 7.
Incidentally, did you start fresh from an ubuntu 20.04 container template from PVE7?

Kaijia · Nov 27, 2021

In my case, I restored an Elasticsearch CT previously running on PVE 6.4 to a newly installed PVE 7.1 node and got OOMed when starting unless there's the manual heap limit (thanks @keeka for the fix!). Running the same CT on a 6.4 node works. Guess it's related to the cgroups v2 changes in PVE 7.

Search

Search

OOM Error For Ubuntu LXC Containers

Jared9922

New Member

fabian

Proxmox Staff Member

Jared9922

New Member

UdoB

Distinguished Member

keeka

Renowned Member

Jared9922

New Member

keeka

Renowned Member

Kaijia

Active Member

We value your privacy