We are stuck with some sudden regular hangs of a container running mainly apache and mysql for a website.
We are not sure if directly related to an upgrade of the container to Buster, after a few hours the container becomes unstable, Mysql hangs and after a while the whole system becomes inaccessible.
Host is:
On the container errors on syslog are of kind:
On the host, refering to that container:
This is a container that has been upgraded from previous versions of Debian and I've read that there were some issues related to Debian templates in the past so wondering if a clean install from a new Debian 10 template (or alternative) could fix the issue but we are rather lost in how to debug what is happening and if it is purely related to that container or another one or the host itself. I understand that there are in general some issues with Debian 10, services and CGROUPS.
I found this on the forum and not sure if it could be related or now required on services within container. https://forum.proxmox.com/threads/oom-killer-activity-on-debian-stretch-lxc.48273/#fromHistory
Any help in how to debug these issues would be appreciated.
We are not sure if directly related to an upgrade of the container to Buster, after a few hours the container becomes unstable, Mysql hangs and after a while the whole system becomes inaccessible.
Host is:
Code:
proxmox-ve: 5.4-2 (running kernel: 4.15.18-28-pve)
pve-manager: 5.4-15 (running version: 5.4-15/d0ec33c6)
pve-kernel-4.15: 5.4-19
pve-kernel-4.15.18-30-pve: 4.15.18-58
pve-kernel-4.15.18-29-pve: 4.15.18-57
pve-kernel-4.15.18-28-pve: 4.15.18-56
pve-kernel-4.15.18-27-pve: 4.15.18-55
pve-kernel-4.15.18-26-pve: 4.15.18-54
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-56
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-7
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-42
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-56
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
On the container errors on syslog are of kind:
Code:
kernel: [5695832.767617] Memory cgroup out of memory: Kill process 3302 (mysqld) score 126 or sacrifice child
On the host, refering to that container:
Code:
Jul 8 11:28:33 d18 kernel: [5703815.305923] Memory cgroup stats for /lxc/1111/ns/system.slice/system-container\x2dgetty.slice/container-getty@1.service rss:132KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:116KB active_anon:16KB inactive_file:0KB active_file:0Kle:0KB
Jul 8 11:28:33 d18 kernel: [5703815.347015] Memory cgroup stats for /lxc/1111/ns/system.slice/system-container\x2dgetty.slice/container-getty@2.service rss:132KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:116KB active_anon:16KB inactive_file:0KB active_file:0Kle:0KB
Jul 8 11:28:33 d18 kernel: [5703815.472013] Memory cgroup stats for /lxc/1111/ns/system.slice/ifupdown-pre.service: cache:0KB rss:0KB rss_huge:0KB shmed_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Jul 8 11:28:33 d18 kernel: [5703815.516867] Memory cgroup stats for /lxc/1111/ns/system.slice/systemd-journald.service: cache:8224KB rss:1024KB rss_hug:8224KB mapped_file:4780KB dirty:0KB writeback:0KB swap:0KB inactive_anon:6828KB active_anon:2420KB inactive_file:0KB active_file:0KB unevictable:0KB
Jul 8 11:28:33 d18 kernel: [5703816.013793] Memory cgroup stats for /lxc/1111/ns/system.slice/rsyslog.service: cache:0KB rss:772KB rss_huge:0KB shmem:0ile:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:508KB active_anon:264KB inactive_file:0KB active_file:0KB unevictable:0KB
Jul 8 11:28:34 d18 kernel: [5703816.077217] Memory cgroup stats for /lxc/1111/ns/system.slice/cron.service: cache:4KB rss:268KB rss_huge:0KB shmem:4KB :0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:268KB active_anon:4KB inactive_file:0KB active_file:0KB unevictable:0KB
Jul 8 11:28:34 d18 kernel: [5703816.437589] Memory cgroup stats for /lxc/1111/ns/system.slice/apt-daily.service: cache:0KB rss:0KB rss_huge:0KB shmem:0ile:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
Jul 8 11:28:35 d18 kernel: [5703817.094613] Memory cgroup stats for /lxc/1111/ns/user.slice/user-1013.slice/user@1013.service: cache:0KB rss:1624KB rsshmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:892KB active_anon:732KB inactive_file:0KB active_file:0KB unevictable:0KB
Jul 8 11:28:35 d18 kernel: [5703817.243221] [27080] 0 27080 5501 563 81920 0 0 systemd
Jul 8 11:28:35 d18 kernel: [5703817.524030] [28008] 1128 28008 598 19 45056 0 0 sh
Jul 8 11:28:35 d18 kernel: [5703817.586948] [29008] 0 29008 8017 3215 98304 0 0 crashmailbatch
Jul 8 11:28:35 d18 kernel: [5703817.644904] [29016] 1128 29016 119347 7026 393216 0 0 python
Jul 8 11:28:35 d18 kernel: [5703817.658525] [29018] 1128 29018 119344 7025 401408 0 0 python
Jul 8 11:28:35 d18 kernel: [5703817.756369] [14750] 33 14750 82580 8625 380928 1661 0 /usr/sbin/apach
Jul 8 11:28:35 d18 kernel: [5703817.798094] [19601] 33 19601 82727 9186 385024 1650 0 /usr/sbin/apach
Jul 8 11:28:35 d18 kernel: [5703817.937076] [11604] 33 11604 82115 6129 327680 1674 0 /usr/sbin/apach
Jul 8 11:28:35 d18 kernel: [5703817.948283] [11607] 33 11607 82072 6084 331776 1682 0 /usr/sbin/apach
Jul 8 11:28:35 d18 kernel: [5703817.959520] [11609] 33 11609 82622 6605 327680 1680 0 /usr/sbin/apach
Jul 8 11:28:35 d18 kernel: [5703817.981251] [21808] 0 21808 2038 16 49152 0 0 tail
This is a container that has been upgraded from previous versions of Debian and I've read that there were some issues related to Debian templates in the past so wondering if a clean install from a new Debian 10 template (or alternative) could fix the issue but we are rather lost in how to debug what is happening and if it is purely related to that container or another one or the host itself. I understand that there are in general some issues with Debian 10, services and CGROUPS.
I found this on the forum and not sure if it could be related or now required on services within container. https://forum.proxmox.com/threads/oom-killer-activity-on-debian-stretch-lxc.48273/#fromHistory
Any help in how to debug these issues would be appreciated.