PVE kills all containers due to out of memory

Robertas

New Member
May 5, 2016
19
0
1
32
I am running proxmox in a cluster with HA enabled, from time to time(2-5 day intervals) every lxc container gets killed for no apparent reason. Containers seem to get restarted without any logs, I am only seeing restarts from uptimes.

When looking at dmesg output I've found that some containers seem to run out of memory, which is understandable why that container would get killed, but why does it kill all of them?

There is one particular container which gets out of memory most of the time, it's running ELK stack and have 2G memory which should be plenty enough.

I've added my pve version and relevant dmesg logs. Any insight as to what might be going wrong would be highly appreciated.

Code:
proxmox-ve: 4.2-56 (running kernel: 4.4.13-1-pve)
pve-manager: 4.2-15 (running version: 4.2-2/725d76f0)
pve-kernel-4.4.13-1-pve: 4.4.13-56
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-52
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-42
qemu-server: 4.0-83
pve-firmware: 1.1-8
libpve-common-perl: 4.0-70
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-55
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-19
pve-container: 1.0-70
pve-firewall: 2.0-29
pve-ha-manager: 1.0-32
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5.7-pve10~bpo80

Code:
[1489174.926977] java invoked oom-killer: gfp_mask=0x24000c0, order=0, oom_score_adj=0
[1489174.926981] java cpuset=110 mems_allowed=0
[1489174.926987] CPU: 3 PID: 1749 Comm: java Tainted: P           O    4.4.13-1-pve #1
[1489174.926989] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
[1489174.926991]  0000000000000286 00000000fda67529 ffff88019b4cbc70 ffffffff813ed3f3
[1489174.926994]  ffff88019b4cbd48 ffff8801f088a800 ffff88019b4cbcd8 ffffffff8120942b
[1489174.926996]  ffff88019b4cbca8 ffffffff81190c2b ffff88008f0f0000 ffff88008f0f0000
[1489174.926999] Call Trace:
[1489174.927007]  [<ffffffff813ed3f3>] dump_stack+0x63/0x90
[1489174.927011]  [<ffffffff8120942b>] dump_header+0x67/0x1d5
[1489174.927015]  [<ffffffff81190c2b>] ? find_lock_task_mm+0x3b/0x80
[1489174.927017]  [<ffffffff811911f5>] oom_kill_process+0x205/0x3c0
[1489174.927021]  [<ffffffff811fd1a0>] ? mem_cgroup_iter+0x1d0/0x380
[1489174.927024]  [<ffffffff811ff158>] mem_cgroup_out_of_memory+0x2a8/0x2f0
[1489174.927027]  [<ffffffff811ffef7>] mem_cgroup_oom_synchronize+0x347/0x360
[1489174.927047]  [<ffffffff811fb230>] ? mem_cgroup_css_online+0x240/0x240
[1489174.927050]  [<ffffffff811918f4>] pagefault_out_of_memory+0x44/0xc0
[1489174.927054]  [<ffffffff8106af2f>] mm_fault_error+0x7f/0x160
[1489174.927056]  [<ffffffff8106b733>] __do_page_fault+0x3e3/0x410
[1489174.927058]  [<ffffffff8106b7c7>] trace_do_page_fault+0x37/0xe0
[1489174.927064]  [<ffffffff81063f49>] do_async_page_fault+0x19/0x70
[1489174.927069]  [<ffffffff8184d2a8>] async_page_fault+0x28/0x30
[1489174.927071] Task in /lxc/110 killed as a result of limit of /lxc/110
[1489174.927075] memory: usage 1046764kB, limit 1048576kB, failcnt 13595223
[1489174.927077] memory+swap: usage 1572864kB, limit 1572864kB, failcnt 132729882
[1489174.927078] kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
[1489174.927079] Memory cgroup stats for /lxc/110: cache:6464KB rss:1040300KB rss_huge:0KB mapped_file:2816KB dirty:0KB writeback:0KB swap:526100KB inactive_anon:521764KB active_anon:521360KB inactive_file:1852KB active_file:1576KB unevictable:0KB
[1489174.927090] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[1489174.927281] [ 4182]     0  4182     7078      271      19       3       91             0 systemd
[1489174.927284] [ 4539]     0  4539    14478      769      32       3       17             0 systemd-journal
[1489174.927287] [ 4847]     0  4847     6350       58      17       3     1672             0 dhclient
[1489174.927290] [ 5224]     0  5224     9270       15      23       3       84             0 rpcbind
[1489174.927292] [ 5333]     0  5333     4756        0      15       3       46             0 atd
[1489174.927295] [ 5350]     0  5350     6869       16      18       3       45             0 cron
[1489174.927297] [ 5372]     0  5372    13796       28      32       3      139         -1000 sshd
[1489174.927300] [ 5417]     0  5417     4964       23      15       3       38             0 systemd-logind
[1489174.927302] [ 5498]   102  5498    10558       53      25       3       61          -900 dbus-daemon
[1489174.927305] [ 5857]     0  5857    64668       47      29       3      159             0 rsyslogd
[1489174.927308] [ 5983]     0  5983     3559        1      12       3       36             0 agetty
[1489174.927310] [ 6003]     0  6003     3559        1      12       3       38             0 agetty
[1489174.927312] [ 6555]     0  6555     9042       21      23       3      121             0 master
[1489174.927315] [ 6580]   100  6580     9570       22      23       3      114             0 qmgr
[1489174.927444] [10071]     0 10071    54528        0      36       4      302             0 bacula-fd
[1489174.927490] [31528]   999 31528   525345   110987     952     860   118421             0 node
[1489174.927548] [14452]   100 14452     9558       18      24       3      116             0 pickup
[1489174.927563] [32337]     0 32337   151273     3237      42       6       83             0 filebeat
[1489174.927575] [ 1641]   107  1641  1016796    91057     414       8     6995             0 java
[1489174.927583] [20950]   998 20950   894071    52342     245       7        0             0 java
[1489174.927587] [25666]     0 25666    12229      157      27       3        0             0 sshd
[1489174.927591] [25980]     0 25980    12229      158      27       3        0             0 sshd
[1489174.927593] [26454]     0 26454    12199       56      26       3        0             0 sshd
[1489174.927595] Memory cgroup out of memory: Kill process 31528 (node) score 588 or sacrifice child
[1489174.928829] Killed process 31528 (node) total-vm:2101380kB, anon-rss:443948kB, file-rss:0kB
 
I have same problem.

I've already fixed it, at least in my own case. I was using ZFS as underlying storage for NFS, which had default sync option. So in that case storage being under load would cause containers to get killed.

My fix was disable syncronous writes, and so far it's running 2 weeks without a single killed container. But keep in mind that this option is quite dangerous as zfs will report successfull write even if it hasn't finished writing it, so if server looses power data might get corrupt from the container perspective.

Code:
zfs set sync=never pool/dataset
 
My LXC killed every night with backup shedule. I have multiple nodes with different local storages: ZFS and LVM-Thin.

All vzdump with snapshot option. All LVM-Thin have enough free space on vg volumes. And not all nodes have problem with killed LXC.
All dumps created on NFS storage. Now I suspect problem with NFS storage.

My settings in /etc/exports:
Code:
/nfs/b-daily  XXX.XXX.XXX.XXX/XX(rw,nohide,sync,no_root_squash,subtree_check)
/nfs/b-weekly  XXX.XXX.XXX.XXX/XX(rw,nohide,sync,no_root_squash,subtree_check)
 
Are containers stored on the same nfs server as the backups? That sounds like the problem I had with ZFS sync option, because they were killed exactly when doing backups. By default NFS uses async mode, meaning it won't wait for local storage to finish writing data and respond to client as soon as it processed request. In your setup you're requesting syncronous writes, it might be the problem.

Just to test a theory you can change sync option in your exports to async. If that doesn't help you can try limiting bandwidth in /etc/vzdump.conf as well as adjusting tmp directory which might be on NFS storage and is quite slow compared to local storage.

What is the output of:
Code:
zfs get sync
 
Are containers stored on the same nfs server as the backups?
No. Containers stored on local storages. NFS used only for hosting templates and backups.

By default NFS uses async mode, meaning it won't wait for local storage to finish writing data and respond to client as soon as it processed request. In your setup you're requesting syncronous writes, it might be the problem.
I will test with async.

What is the output of:
Code:
zfs get sync
Nothing with LVM-Thinout local storage.

With ZFS local storage:
Code:
NAME  PROPERTY  VALUE  SOURCE
rpool  sync  standard  local
rpool/ROOT  sync  standard  inherited from rpool
rpool/ROOT/pve-1  sync  standard  inherited from rpool
rpool/data  sync  standard  inherited from rpool
rpool/data/subvol-1019-disk-1  sync  standard  inherited from rpool
rpool/data/subvol-102-disk-1  sync  standard  inherited from rpool
rpool/data/subvol-1066-disk-1  sync  standard  inherited from rpool
rpool/data/subvol-1067-disk-1  sync  standard  inherited from rpool
rpool/data/subvol-1096-disk-1  sync  standard  inherited from rpool
rpool/data/subvol-85126-disk-1  sync  standard  inherited from rpool
rpool/swap  sync  always  local
 
No. Containers stored on local storages. NFS used only for hosting templates and backups.

Interesting, problem seems to be load related. Backups are done using rsync which shouldn't cause issues with local storage. Have you tried looking at system stats while backup is in progress? I personally like nmon which lets me to look at all the stats(disk io, network, cpu, ram) at the same time.

If async option doesn't help you can next try adjust /etc/vzdump.conf, for example I've set bwlimit(in kb/s) to:
Code:
bwlimit: 20000

It should reduce disk IO and prevent containers starving of IO.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!