All containers in Node suddenly Stop

mavillar

New Member
Feb 21, 2022
2
0
1
52
Hello,
I have been running a server for over a year now containing a number of containers that have been working properly with no problems.
All of them with very little workload.
The issue is that since a few days ago, suddenly all the containers on the node stop. The main node is still active.
The first time it happened I went back to activate each of the containers and all of them worked fine again.
The next day, all the containers shut down again.
I activated them again until today when it happened again.
I am at a complete loss as to what the reason is as I don't know what is causing this to happen.
Does anyone know the cause or reason for this?
I don't know if there is any log that can tell me what is happening.

Thanks in advance.
 
hi,

I don't know if there is any log that can tell me what is happening.
you can check the log in /var/log/syslog on that node. try grepping for the container IDs: grep -C 3 999 /var/log/syslog (where 999 would be your container's ID)

the task logs in the GUI might also give you some hints (check for shutdown/stop jobs for that container).

The first time it happened I went back to activate each of the containers and all of them worked fine again.
The next day, all the containers shut down again.
does it happen at a specific hour? or just randomly?

which PVE version are you on? can you send us the output from pveversion -v?
 
I am suspecting that it may be due to the behavior of a container.
When I run the grep command, I get the following:

root@ns3012819:~# grep -C 3 102 /var/log/syslog Feb 21 00:34:00 ns3012819 systemd[1]: Starting Proxmox VE replication runner... Feb 21 00:34:00 ns3012819 systemd[1]: pvesr.service: Succeeded. Feb 21 00:34:00 ns3012819 systemd[1]: Started Proxmox VE replication runner. Feb 21 00:34:19 ns3012819 kernel: [297303.065102] sshd[11632]: segfault at 0 ip 00007f9e205396ca sp 00007ffc242f8398 error 4 in libc-2.28.so[7f9e20404000+148000] Feb 21 00:34:19 ns3012819 kernel: [297303.083835] Code: 80 7e 10 00 0f 84 f9 fe ff ff e9 a1 a8 f4 ff 90 89 f8 31 d2 c5 c5 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 56 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9 Feb 21 00:34:46 ns3012819 kernel: [297329.813799] audit: type=1400 audit(1645403686.578:113): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-102_</var/lib/lxc>" name="/tmp/" pid=12110 comm="mount" flags="rw, noexec, remount" Feb 21 00:35:00 ns3012819 systemd[1]: Starting Proxmox VE replication runner... Feb 21 00:35:00 ns3012819 systemd[1]: pvesr.service: Succeeded. Feb 21 00:35:00 ns3012819 systemd[1]: Started Proxmox VE replication runner. -- Feb 21 01:34:28 ns3012819 kernel: [300911.866407] Code: 80 7e 10 00 0f 84 f9 fe ff ff e9 a1 a8 f4 ff 90 89 f8 31 d2 c5 c5 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 56 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9 Feb 21 01:34:42 ns3012819 kernel: [300925.616298] sshd[9439]: segfault at 0 ip 00007fa10de756ca sp 00007ffc40e049c8 error 4 in libc-2.28.so[7fa10dd40000+148000] Feb 21 01:34:42 ns3012819 kernel: [300925.634868] Code: 80 7e 10 00 0f 84 f9 fe ff ff e9 a1 a8 f4 ff 90 89 f8 31 d2 c5 c5 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 56 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9 Feb 21 01:34:46 ns3012819 kernel: [300929.823208] audit: type=1400 audit(1645407286.577:114): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-102_</var/lib/lxc>" name="/tmp/" pid=9575 comm="mount" flags="rw, noexec, remount" Feb 21 01:35:00 ns3012819 systemd[1]: Starting Proxmox VE replication runner... Feb 21 01:35:00 ns3012819 systemd[1]: pvesr.service: Succeeded. Feb 21 01:35:00 ns3012819 systemd[1]: Started Proxmox VE replication runner. -- Feb 21 01:36:00 ns3012819 systemd[1]: Starting Proxmox VE replication runner... Feb 21 01:36:00 ns3012819 systemd[1]: pvesr.service: Succeeded. Feb 21 01:36:00 ns3012819 systemd[1]: Started Proxmox VE replication runner. Feb 21 01:36:22 ns3012819 kernel: [301025.606130] sshd[11974]: segfault at 0 ip 00007f9fe7b836ca sp 00007ffdfe4df1b8 error 4 in libc-2.28.so[7f9fe7a4e000+148000] Feb 21 01:36:22 ns3012819 kernel: [301025.624887] Code: 80 7e 10 00 0f 84 f9 fe ff ff e9 a1 a8 f4 ff 90 89 f8 31 d2 c5 c5 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 56 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9 Feb 21 01:37:00 ns3012819 systemd[1]: Starting Proxmox VE replication runner... Feb 21 01:37:00 ns3012819 systemd[1]: pvesr.service: Succeeded. Feb 21 01:37:00 ns3012819 systemd[1]: Started Proxmox VE replication runner. -- Feb 21 01:56:00 ns3012819 systemd[1]: Starting Proxmox VE replication runner... Feb 21 01:56:00 ns3012819 systemd[1]: pvesr.service: Succeeded. Feb 21 01:56:00 ns3012819 systemd[1]: Started Proxmox VE replication runner. Feb 21 01:56:27 ns3012819 kernel: [302231.145102] sshd[742]: segfault at 0 ip 00007f6ee42cd6ca sp 00007ffc15428de8 error 4 in libc-2.28.so[7f6ee4198000+148000] Feb 21 01:56:27 ns3012819 kernel: [302231.163927] Code: 80 7e 10 00 0f 84 f9 fe ff ff e9 a1 a8 f4 ff 90 89 f8 31 d2 c5 c5 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 56 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9 Feb 21 01:57:00 ns3012819 systemd[1]: Starting Proxmox VE replication runner... Feb 21 01:57:00 ns3012819 systemd[1]: pvesr.service: Succeeded. -- Feb 21 02:34:10 ns3012819 kernel: [304493.213058] Code: 80 7e 10 00 0f 84 f9 fe ff ff e9 a1 a8 f4 ff 90 89 f8 31 d2 c5 c5 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 56 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9 Feb 21 02:34:19 ns3012819 kernel: [304502.817510] sshd[9387]: segfault at 0 ip 00007ff96bc0a6ca sp 00007ffe6f95c048 error 4 in libc-2.28.so[7ff96bad5000+148000] Feb 21 02:34:19 ns3012819 kernel: [304502.836146] Code: 80 7e 10 00 0f 84 f9 fe ff ff e9 a1 a8 f4 ff 90 89 f8 31 d2 c5 c5 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 56 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9 Feb 21 02:34:46 ns3012819 kernel: [304529.835784] audit: type=1400 audit(1645410886.584:115): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-102_</var/lib/lxc>" name="/tmp/" pid=9843 comm="mount" flags="rw, noexec, remount" Feb 21 02:34:59 ns3012819 kernel: [304542.198852] sshd[10171]: segfault at 0 ip 00007f1616d3b6ca sp 00007fff75f9b918 error 4 in libc-2.28.so[7f1616c06000+148000] Feb 21 02:34:59 ns3012819 kernel: [304542.217619] Code: 80 7e 10 00 0f 84 f9 fe ff ff e9 a1 a8 f4 ff 90 89 f8 31 d2 c5 c5 ef ff 09 f0 25 ff 0f 00 00 3d 80 0f 00 00 0f 8f 56 03 00 00 <c5> fe 6f 0f c5 f5 74 06 c5 fd da c1 c5 fd 74 c7 c5 fd d7 c8 85 c9 Feb 21 02:35:00 ns3012819 systemd[1]: Starting Proxmox VE replication runner... -- Feb 21 03:34:00 ns3012819 systemd[1]: Starting Proxmox VE replication runner... Feb 21 03:34:00 ns3012819 systemd[1]: pvesr.service: Succeeded. Feb 21 03:34:00 ns3012819 systemd[1]: Started Proxmox VE replication runner. Feb 21 03:34:46 ns3012819 kernel: [308129.848725] audit: type=1400 audit(1645414486.591:116): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-102_</var/lib/lxc>" name="/tmp/" pid=4229 comm="mount" flags="rw, noexec, remount" Feb 21 03:35:00 ns3012819 systemd[1]: Starting Proxmox VE replication runner... Feb 21 03:35:00 ns3012819 systemd[1]: pvesr.service: Succeeded. Feb 21 03:35:00 ns3012819 systemd[1]: Started Proxmox VE replication runner. -- Feb 21 04:34:00 ns3012819 systemd[1]: Starting Proxmox VE replication runner... Feb 21 04:34:00 ns3012819 systemd[1]: pvesr.service: Succeeded. Feb 21 04:34:00 ns3012819 systemd[1]: Started Proxmox VE replication runner. Feb 21 04:34:46 ns3012819 kernel: [311729.859100] audit: type=1400 audit(1645418086.590:117): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-102_</var/lib/lxc>" name="/tmp/" pid=32691 comm="mount" flags="rw, noexec, remount" Feb 21 04:35:00 ns3012819 systemd[1]: Starting Proxmox VE replication runner... Feb 21 04:35:00 ns3012819 systemd[1]: pvesr.service: Succeeded. Feb 21 04:35:00 ns3012819 systemd[1]: Started Proxmox VE replication runner. -- Feb 21 05:34:00 ns3012819 systemd[1]: Starting Proxmox VE replication runner... Feb 21 05:34:00 ns3012819 systemd[1]: pvesr.service: Succeeded. Feb 21 05:34:00 ns3012819 systemd[1]: Started Proxmox VE replication runner. Feb 21 05:34:46 ns3012819 kernel: [315329.871277] audit: type=1400 audit(1645421686.594:118): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-102_</var/lib/lxc>" name="/tmp/" pid=32447 comm="mount" flags="rw, noexec, remount" Feb 21 05:35:00 ns3012819 systemd[1]: Starting Proxmox VE replication runner... Feb 21 05:35:00 ns3012819 systemd[1]: pvesr.service: Succeeded. Feb 21 05:35:00 ns3012819 systemd[1]: Started Proxmox VE replication runner. Binary file /var/log/syslog matches
 
thanks, could you also post the pveversion -v output?

is it possible that you've made some kernel package upgrades without rebooting afterwards?