Containers shutdown sometimes

Morphushka · Dec 16, 2019

Hello!
Some times I found my containers shutdown. But I don't do that. Task history not contains somwthing about it.
Where can I look for container logs ? When and why it was shutdown ?
Thanks!

t.lamprecht · Dec 16, 2019

Morphushka said:
Where can I look for container logs ?

The task logs of the container can be looked at in the container panels "Task Log", it includes also older entries, than the recent logs at the bottom.

The syslog can be looked at, journalctl -u pve-container@VMID (eplace VMID with the CTs number) to see if it was actively stopped by someone or something. The CT could also have crashed, due to out-of-memory situation, for example.

You can also check the CTs log in the inside, maybe you find some hints there.

Morphushka · Jan 9, 2020

Hi @t.lamprecht
I found information inside container. /var/log/syslog
Here is full log for Dec 28.

Dec 28 10:56:08 share systemd[1]: apt-daily.service: Failed to reset devices.list: Operation not permitted
Dec 28 10:56:08 share systemd[1]: Starting Daily apt download activities...
Dec 28 10:56:09 share systemd[1]: Started Daily apt download activities.
Dec 28 10:56:09 share systemd[1]: apt-daily.timer: Adding 8h 9min 30.574047s random time.
Dec 28 10:56:09 share systemd[1]: apt-daily.timer: Adding 8h 52min 25.808620s random time.
Dec 28 11:47:01 share CRON[4822]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 12:47:01 share CRON[6017]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 13:47:01 share CRON[7287]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 14:47:01 share CRON[8270]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 15:47:01 share CRON[9125]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 16:47:01 share CRON[9873]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 17:47:01 share CRON[10882]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 18:47:02 share CRON[13164]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 19:17:44 share kernel: [6868109.850445] oom_reaper: reaped process 16511 (systemd-journal), now anon-rss:0kB, file-rss:0kB, shmem-rss:7712kB
Dec 28 19:17:46 share kernel: [6868111.792532] oom_reaper: reaped process 16881 (lpqd), now anon-rss:0kB, file-rss:0kB, shmem-rss:8kB
Dec 28 19:17:51 share kernel: [6868116.264400] Killed process 16877 (cleanupd) total-vm:311300kB, anon-rss:252kB, file-rss:0kB, shmem-rss:24kB
Dec 28 19:17:54 share kernel: [6868119.376317] smbd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Dec 28 19:17:54 share kernel: [6868119.376352] mem_cgroup_out_of_memory+0xc4/0xd0
Dec 28 19:17:54 share kernel: [6868119.376384] handle_mm_fault+0xdd/0x210
Dec 28 19:17:54 share kernel: [6868119.376410] R10: 00007f2f7b8916c0 R11: 0000000000000246 R12: 0000000000000001
Dec 28 19:17:54 share kernel: [6868119.376762] [ 16556] 0 16556 62528 350 139264 91 0 rsyslogd
Dec 28 19:17:54 share kernel: [6868119.376788] [ 16874] 0 16874 79851 138 638976 534 0 smbd
Dec 28 19:17:54 share kernel: [6868119.377191] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=ns,mems_allowed=0-1,oom_memcg=/lxc/107,task_memcg=/lxc/107/ns,task=smbd,pid=16874,uid=0
Dec 28 19:17:56 share kernel: [6868121.227581] Call Trace:
Dec 28 19:17:56 share kernel: [6868121.227621] __alloc_pages_nodemask+0x246/0x2e0
Dec 28 19:17:56 share kernel: [6868121.227657] __x64_sys_clone+0x27/0x30
Dec 28 19:17:56 share kernel: [6868121.227682] memory+swap: usage 1048576kB, limit 1048576kB, failcnt 1155847
Dec 28 19:17:56 share kernel: [6868121.228024] [ 16590] 0 16590 3879 0 81920 36 0 agetty
Dec 28 19:17:56 share kernel: [6868121.228416] [ 24135] 102 24135 17488 205 172032 0 0 sshd
Dec 28 19:18:00 share systemd[1]: systemd-journald.service: Main process exited, code=killed, status=9/KILL
Dec 28 19:18:00 share systemd[1]: systemd-journald.service: Unit entered failed state.
Dec 28 19:18:00 share systemd[1]: rsyslog.service: Service hold-off time over, scheduling restart.
Dec 28 19:18:00 share systemd[1]: Stopped System Logging Service.
Dec 28 10:56:09 share systemd[1]: apt-daily.timer: Adding 8h 9min 30.574047s random time.
Dec 28 10:56:09 share systemd[1]: apt-daily.timer: Adding 8h 52min 25.808620s random time.
Dec 28 11:47:01 share CRON[4822]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 12:47:01 share CRON[6017]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 13:47:01 share CRON[7287]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 14:47:01 share CRON[8270]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 15:47:01 share CRON[9125]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 16:47:01 share CRON[9873]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 17:47:01 share CRON[10882]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 18:47:02 share CRON[13164]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Dec 28 19:17:44 share kernel: [6868109.850445] oom_reaper: reaped process 16511 (systemd-journal), now anon-rss:0kB, file-rss:0kB, shmem-rss:7712kB
Dec 28 19:17:46 share kernel: [6868111.792532] oom_reaper: reaped process 16881 (lpqd), now anon-rss:0kB, file-rss:0kB, shmem-rss:8kB
Dec 28 19:17:51 share kernel: [6868116.264400] Killed process 16877 (cleanupd) total-vm:311300kB, anon-rss:252kB, file-rss:0kB, shmem-rss:24kB
Dec 28 19:17:54 share kernel: [6868119.376317] smbd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Dec 28 19:17:54 share kernel: [6868119.376352] mem_cgroup_out_of_memory+0xc4/0xd0
Dec 28 19:17:54 share kernel: [6868119.376384] handle_mm_fault+0xdd/0x210
Dec 28 19:17:54 share kernel: [6868119.376410] R10: 00007f2f7b8916c0 R11: 0000000000000246 R12: 0000000000000001
Dec 28 19:17:54 share kernel: [6868119.376762] [ 16556] 0 16556 62528 350 139264 91 0 rsyslogd
Dec 28 19:17:54 share kernel: [6868119.376788] [ 16874] 0 16874 79851 138 638976 534 0 smbd
Dec 28 19:17:54 share kernel: [6868119.377191] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=ns,mems_allowed=0-1,oom_memcg=/lxc/107,task_memcg=/lx
c/107/ns,task=smbd,pid=16874,uid=0
Dec 28 19:17:56 share kernel: [6868121.227581] Call Trace:
Dec 28 19:17:56 share kernel: [6868121.227621] __alloc_pages_nodemask+0x246/0x2e0
Dec 28 19:17:56 share kernel: [6868121.227657] __x64_sys_clone+0x27/0x30
Dec 28 19:17:56 share kernel: [6868121.227682] memory+swap: usage 1048576kB, limit 1048576kB, failcnt 1155847
Dec 28 19:17:56 share kernel: [6868121.228024] [ 16590] 0 16590 3879 0 81920 36 0 agetty
Dec 28 19:17:56 share kernel: [6868121.228416] [ 24135] 102 24135 17488 205 172032 0 0 sshd
Dec 28 19:18:00 share systemd[1]: systemd-journald.service: Main process exited, code=killed, status=9/KILL
Dec 28 19:18:00 share systemd[1]: systemd-journald.service: Unit entered failed state.
Dec 28 19:18:00 share systemd[1]: rsyslog.service: Service hold-off time over, scheduling restart.
Dec 28 19:18:00 share systemd[1]: Stopped System Logging Service.
Dec 28 19:18:01 share systemd[1]: rsyslog.service: Failed to reset devices.list: Operation not permitted
Dec 28 19:18:01 share systemd[1]: Starting System Logging Service...
Dec 28 19:18:01 share systemd[1]: systemd-journal-flush.service: Failed to reset devices.list: Operation not permitted
Dec 28 19:18:02 share systemd[1]: Starting Flush Journal to Persistent Storage...
Dec 28 19:18:03 share liblogging-stdlog: [origin software="rsyslogd" swVersion="8.24.0" x-pid="14320" x-info="http://www.rsyslog.com"] start
Dec 28 19:18:03 share systemd[1]: Started System Logging Service.
Dec 28 19:18:03 share systemd[1]: Started Flush Journal to Persistent Storage.
Dec 28 19:18:07 share kernel: [6868132.446025] sshd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Dec 28 19:18:07 share kernel: [6868132.446049] out_of_memory+0x1c3/0x490
Dec 28 19:18:07 share kernel: [6868132.446071] ext4_filemap_fault+0x31/0x44
Dec 28 19:18:07 share kernel: [6868132.446093] Code: Bad RIP value.
Dec 28 19:18:07 share kernel: [6868132.446110] Memory cgroup stats for /lxc/107/ns: cache:832524KB rss:7308KB rss_huge:0KB shmem:831600KB mapped_file:5412KB dirty:1056KB writeback:0KB swap:189816KB inactive_anon:427844KB active_anon:408696KB inactive_file:504KB active_file:120KB unevictable:0KB
Dec 28 19:18:07 share kernel: [6868132.446465] [ 16772] 0 16772 20296 28 131072 170 0 master
Dec 28 19:18:11 share kernel: [6868136.668025] sshd invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
Dec 28 19:18:11 share kernel: [6868136.668054] oom_kill_process.cold.30+0xb/0x1d6
Dec 28 19:18:11 share kernel: [6868136.668083] ? filemap_map_pages+0x1ae/0x380
Dec 28 19:18:11 share kernel: [6868136.668108] RIP: 0033:0x7fd259cc4d50
Dec 28 19:18:12 share kernel: [6868136.668125] memory+swap: usage 1048576kB, limit 1048576kB, failcnt 2158293
Dec 28 19:18:12 share kernel: [6868136.668470] [ 16558] 107 16558 11280 43 122880 70 -900 dbus-daemon
Dec 28 19:18:12 share kernel: [6868136.668861] [ 27849] 0 27849 19055 207 196608 0 0 sshd
Dec 28 19:18:12 share kernel: [6868136.668898] Memory cgroup out of memory: Kill process 16436 (systemd) score 1 or sacrifice child

As I can understand container want to use more memory then he has ? and start to kill some processes inside ?
It is strange becouse there just samba for printers. May be some thing wrong with samba, but I have same behavior with vpn container and even with container that used for run one task per month. I can't find any similarities.
Other containers with big load for work well for the same time. All created from the same template.
Thanks for advices!

MihaG · Nov 10, 2023

Hello,

I have the same problem with OOM on LXC VPS. When the memory usage of the container and the PVE host if relatively low inside Proxmox GUI. Any idea?

Regards,

t.lamprecht · Nov 11, 2023

What version do you use (pveversion -v)? As there was a change about half a year ago that should reduce the times the OOM-killer is getting active for containers that are memory hungry, as since pve-container 4.4-4 there is a slightly smaller threshold set, that once reached, will increase memory reclaim pressure and also throttle the container's new allocations.

https://git.proxmox.com/?p=pve-container.git;a=commit;h=09ea3e7fb3f166d11d245abf26ba9d02a829ab93

MihaG · Nov 11, 2023

t.lamprecht said:
What version do you use (pveversion -v)? As there was a change about half a year ago that should reduce the times the OOM-killer is getting active for containers that are memory hungry, as since pve-container 4.4-4 there is a slightly smaller threshold set, that once reached, will increase memory reclaim pressure and also throttle the container's new allocations.

https://git.proxmox.com/?p=pve-container.git;a=commit;h=09ea3e7fb3f166d11d245abf26ba9d02a829ab93

I have this below. So I think we need to update all components and reboot HW node? Or is there any workaround with no need to reboot HW node.
pveversion -v
proxmox-ve: 7.3-1 (running kernel: 5.15.83-1-pve)
pve-manager: 7.3-4 (running version: 7.3-4/d69b70d4)
pve-kernel-5.15: 7.3-1
pve-kernel-helper: 7.3-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.2-1
proxmox-backup-file-restore: 2.3.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-2
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.7-pve2

t.lamprecht · Nov 12, 2023

MihaG said:
So I think we need to update all components

Yes, you really should.

MihaG said:
and reboot HW node? Or is there any workaround with no need to reboot HW node.

Well, for most fixes you do not need a reboot, those are mostly needed to apply a newer kernel.
I mean, you will probably also get a newer kernel update, but you do not need to immediately reboot to get the container fix, for that reloading the web-interface once (to avoid getting a stale worker) and then restarting the containers would be enough to have this specific fix applied.

Search

Search

Containers shutdown sometimes

Morphushka

Well-Known Member

t.lamprecht

Proxmox Staff Member

Morphushka

Well-Known Member

MihaG

Active Member

t.lamprecht

Proxmox Staff Member

MihaG

Active Member

t.lamprecht

Proxmox Staff Member