I am running proxmox in a cluster with HA enabled, from time to time(2-5 day intervals) every lxc container gets killed for no apparent reason. Containers seem to get restarted without any logs, I am only seeing restarts from uptimes.
When looking at dmesg output I've found that some containers seem to run out of memory, which is understandable why that container would get killed, but why does it kill all of them?
There is one particular container which gets out of memory most of the time, it's running ELK stack and have 2G memory which should be plenty enough.
I've added my pve version and relevant dmesg logs. Any insight as to what might be going wrong would be highly appreciated.
	
	
	
		
	
	
	
		
				
			When looking at dmesg output I've found that some containers seem to run out of memory, which is understandable why that container would get killed, but why does it kill all of them?
There is one particular container which gets out of memory most of the time, it's running ELK stack and have 2G memory which should be plenty enough.
I've added my pve version and relevant dmesg logs. Any insight as to what might be going wrong would be highly appreciated.
		Code:
	
	proxmox-ve: 4.2-56 (running kernel: 4.4.13-1-pve)
pve-manager: 4.2-15 (running version: 4.2-2/725d76f0)
pve-kernel-4.4.13-1-pve: 4.4.13-56
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-52
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-42
qemu-server: 4.0-83
pve-firmware: 1.1-8
libpve-common-perl: 4.0-70
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-55
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-19
pve-container: 1.0-70
pve-firewall: 2.0-29
pve-ha-manager: 1.0-32
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5.7-pve10~bpo80
		Code:
	
	[1489174.926977] java invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=0
[1489174.926981] java cpuset=110 mems_allowed=0
[1489174.926987] CPU: 3 PID: 1749 Comm: java Tainted: P           O    4.4.13-1-pve #1
[1489174.926989] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[1489174.926991]  0000000000000286 00000000fda67529 ffff88019b4cbc70 ffffffff813ed3f3
[1489174.926994]  ffff88019b4cbd48 ffff8801f088a800 ffff88019b4cbcd8 ffffffff8120942b
[1489174.926996]  ffff88019b4cbca8 ffffffff81190c2b ffff88008f0f0000 ffff88008f0f0000
[1489174.926999] Call Trace:
[1489174.927007]  [<ffffffff813ed3f3>] dump_stack+0x63/0x90
[1489174.927011]  [<ffffffff8120942b>] dump_header+0x67/0x1d5
[1489174.927015]  [<ffffffff81190c2b>] ? find_lock_task_mm+0x3b/0x80
[1489174.927017]  [<ffffffff811911f5>] oom_kill_process+0x205/0x3c0
[1489174.927021]  [<ffffffff811fd1a0>] ? mem_cgroup_iter+0x1d0/0x380
[1489174.927024]  [<ffffffff811ff158>] mem_cgroup_out_of_memory+0x2a8/0x2f0
[1489174.927027]  [<ffffffff811ffef7>] mem_cgroup_oom_synchronize+0x347/0x360
[1489174.927047]  [<ffffffff811fb230>] ? mem_cgroup_css_online+0x240/0x240
[1489174.927050]  [<ffffffff811918f4>] pagefault_out_of_memory+0x44/0xc0
[1489174.927054]  [<ffffffff8106af2f>] mm_fault_error+0x7f/0x160
[1489174.927056]  [<ffffffff8106b733>] __do_page_fault+0x3e3/0x410
[1489174.927058]  [<ffffffff8106b7c7>] trace_do_page_fault+0x37/0xe0
[1489174.927064]  [<ffffffff81063f49>] do_async_page_fault+0x19/0x70
[1489174.927069]  [<ffffffff8184d2a8>] async_page_fault+0x28/0x30
[1489174.927071] Task in /lxc/110 killed as a result of limit of /lxc/110
[1489174.927075] memory: usage 1046764kB, limit 1048576kB, failcnt 13595223
[1489174.927077] memory+swap: usage 1572864kB, limit 1572864kB, failcnt 132729882
[1489174.927078] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[1489174.927079] Memory cgroup stats for /lxc/110: cache:6464KB rss:1040300KB rss_huge:0KB mapped_file:2816KB dirty:0KB writeback:0KB swap:526100KB inactive_anon:521764KB active_anon:521360KB inactive_file:1852KB active_file:1576KB unevictable:0KB
[1489174.927090] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[1489174.927281] [ 4182]     0  4182     7078      271      19       3       91             0 systemd
[1489174.927284] [ 4539]     0  4539    14478      769      32       3       17             0 systemd-journal
[1489174.927287] [ 4847]     0  4847     6350       58      17       3     1672             0 dhclient
[1489174.927290] [ 5224]     0  5224     9270       15      23       3       84             0 rpcbind
[1489174.927292] [ 5333]     0  5333     4756        0      15       3       46             0 atd
[1489174.927295] [ 5350]     0  5350     6869       16      18       3       45             0 cron
[1489174.927297] [ 5372]     0  5372    13796       28      32       3      139         -1000 sshd
[1489174.927300] [ 5417]     0  5417     4964       23      15       3       38             0 systemd-logind
[1489174.927302] [ 5498]   102  5498    10558       53      25       3       61          -900 dbus-daemon
[1489174.927305] [ 5857]     0  5857    64668       47      29       3      159             0 rsyslogd
[1489174.927308] [ 5983]     0  5983     3559        1      12       3       36             0 agetty
[1489174.927310] [ 6003]     0  6003     3559        1      12       3       38             0 agetty
[1489174.927312] [ 6555]     0  6555     9042       21      23       3      121             0 master
[1489174.927315] [ 6580]   100  6580     9570       22      23       3      114             0 qmgr
[1489174.927444] [10071]     0 10071    54528        0      36       4      302             0 bacula-fd
[1489174.927490] [31528]   999 31528   525345   110987     952     860   118421             0 node
[1489174.927548] [14452]   100 14452     9558       18      24       3      116             0 pickup
[1489174.927563] [32337]     0 32337   151273     3237      42       6       83             0 filebeat
[1489174.927575] [ 1641]   107  1641  1016796    91057     414       8     6995             0 java
[1489174.927583] [20950]   998 20950   894071    52342     245       7        0             0 java
[1489174.927587] [25666]     0 25666    12229      157      27       3        0             0 sshd
[1489174.927591] [25980]     0 25980    12229      158      27       3        0             0 sshd
[1489174.927593] [26454]     0 26454    12199       56      26       3        0             0 sshd
[1489174.927595] Memory cgroup out of memory: Kill process 31528 (node) score 588 or sacrifice child
[1489174.928829] Killed process 31528 (node) total-vm:2101380kB, anon-rss:443948kB, file-rss:0kB 
	