Undetermined memory issues in containter

luison

Renowned Member
Feb 22, 2010
113
4
83
Spain
alsur.es
We are stuck with some sudden regular hangs of a container running mainly apache and mysql for a website.
We are not sure if directly related to an upgrade of the container to Buster, after a few hours the container becomes unstable, Mysql hangs and after a while the whole system becomes inaccessible.

Host is:
Code:
proxmox-ve: 5.4-2 (running kernel: 4.15.18-28-pve)
pve-manager: 5.4-15 (running version: 5.4-15/d0ec33c6)
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
pve-kernel-4.15.18-29-pve: 4.15.18-57
pve-kernel-4.15.18-28-pve: 4.15.18-56
pve-kernel-4.15.18-27-pve: 4.15.18-55
pve-kernel-4.15.18-26-pve: 4.15.18-54
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-56
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-7
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-42
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-56
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3

On the container errors on syslog are of kind:

Code:
kernel: [5695832.767617] Memory cgroup out of memory: Kill process 3302 (mysqld) score 126 or sacrifice child


On the host, refering to that container:
Code:
Jul  8 11:28:33 d18 kernel: [5703815.305923] Memory cgroup stats for /lxc/1111/ns/system.slice/system-container\x2dgetty.slice/container-getty@1.service rss:132KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:116KB active_anon:16KB inactive_file:0KB active_file:0Kle:0KB

Jul  8 11:28:33 d18 kernel: [5703815.347015] Memory cgroup stats for /lxc/1111/ns/system.slice/system-container\x2dgetty.slice/container-getty@2.service rss:132KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:116KB active_anon:16KB inactive_file:0KB active_file:0Kle:0KB

Jul  8 11:28:33 d18 kernel: [5703815.472013] Memory cgroup stats for /lxc/1111/ns/system.slice/ifupdown-pre.service: cache:0KB rss:0KB rss_huge:0KB shmed_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB

Jul  8 11:28:33 d18 kernel: [5703815.516867] Memory cgroup stats for /lxc/1111/ns/system.slice/systemd-journald.service: cache:8224KB rss:1024KB rss_hug:8224KB mapped_file:4780KB dirty:0KB writeback:0KB swap:0KB inactive_anon:6828KB active_anon:2420KB inactive_file:0KB active_file:0KB unevictable:0KB

Jul  8 11:28:33 d18 kernel: [5703816.013793] Memory cgroup stats for /lxc/1111/ns/system.slice/rsyslog.service: cache:0KB rss:772KB rss_huge:0KB shmem:0ile:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:508KB active_anon:264KB inactive_file:0KB active_file:0KB unevictable:0KB

Jul  8 11:28:34 d18 kernel: [5703816.077217] Memory cgroup stats for /lxc/1111/ns/system.slice/cron.service: cache:4KB rss:268KB rss_huge:0KB shmem:4KB :0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:268KB active_anon:4KB inactive_file:0KB active_file:0KB unevictable:0KB

Jul  8 11:28:34 d18 kernel: [5703816.437589] Memory cgroup stats for /lxc/1111/ns/system.slice/apt-daily.service: cache:0KB rss:0KB rss_huge:0KB shmem:0ile:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB

Jul  8 11:28:35 d18 kernel: [5703817.094613] Memory cgroup stats for /lxc/1111/ns/user.slice/user-1013.slice/user@1013.service: cache:0KB rss:1624KB rsshmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:892KB active_anon:732KB inactive_file:0KB active_file:0KB unevictable:0KB

Jul  8 11:28:35 d18 kernel: [5703817.243221] [27080]     0 27080     5501      563    81920        0             0 systemd

Jul  8 11:28:35 d18 kernel: [5703817.524030] [28008]  1128 28008      598       19    45056        0             0 sh

Jul  8 11:28:35 d18 kernel: [5703817.586948] [29008]     0 29008     8017     3215    98304        0             0 crashmailbatch

Jul  8 11:28:35 d18 kernel: [5703817.644904] [29016]  1128 29016   119347     7026   393216        0             0 python

Jul  8 11:28:35 d18 kernel: [5703817.658525] [29018]  1128 29018   119344     7025   401408        0             0 python

Jul  8 11:28:35 d18 kernel: [5703817.756369] [14750]    33 14750    82580     8625   380928     1661             0 /usr/sbin/apach

Jul  8 11:28:35 d18 kernel: [5703817.798094] [19601]    33 19601    82727     9186   385024     1650             0 /usr/sbin/apach

Jul  8 11:28:35 d18 kernel: [5703817.937076] [11604]    33 11604    82115     6129   327680     1674             0 /usr/sbin/apach

Jul  8 11:28:35 d18 kernel: [5703817.948283] [11607]    33 11607    82072     6084   331776     1682             0 /usr/sbin/apach

Jul  8 11:28:35 d18 kernel: [5703817.959520] [11609]    33 11609    82622     6605   327680     1680             0 /usr/sbin/apach

Jul  8 11:28:35 d18 kernel: [5703817.981251] [21808]     0 21808     2038       16    49152        0             0 tail

This is a container that has been upgraded from previous versions of Debian and I've read that there were some issues related to Debian templates in the past so wondering if a clean install from a new Debian 10 template (or alternative) could fix the issue but we are rather lost in how to debug what is happening and if it is purely related to that container or another one or the host itself. I understand that there are in general some issues with Debian 10, services and CGROUPS.

I found this on the forum and not sure if it could be related or now required on services within container. https://forum.proxmox.com/threads/oom-killer-activity-on-debian-stretch-lxc.48273/#fromHistory

Any help in how to debug these issues would be appreciated.
 
Can migrate the container to a Proxmox VE 6 and check if the CT works there.

The point is there are many things that changed since kernel 4.15 and also LXCFS and Cgroups got many improvements.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!