LXC cgroups not cleaned up on container shutdown, can't restart

UrkoM · Jan 16, 2018

Hello,
One of our hosts is not cleaning up the cgroups for shutdown containers, and it prevents them from starting again. Here is a snippet of the log file I obtained by starting the container with:
/usr/bin/lxc-start -F --logfile=/root/135.log --logpriority=DEBUG -n 135

Code:

lxc-start 135 20180116013019.550 INFO     lxc_cgroup - cgroups/cgroup.c:cgroup_init:67 - cgroup driver cgroupfs-ng initing for 135
      lxc-start 135 20180116013019.550 ERROR    lxc_cgfsng - cgroups/cgfsng.c:create_path_for_hierarchy:1337 - Path "/sys/fs/cgroup/cpu//lxc/135" already existed.
      lxc-start 135 20180116013019.550 ERROR    lxc_cgfsng - cgroups/cgfsng.c:cgfsng_create:1433 - Failed to create "/sys/fs/cgroup/cpu//lxc/135"

When we try to start one of these containers, the web interface becomes unresponsive for that host. 2 additional VMs on the same host are running well, totally unaffected.

I've found some conversations online about similar issues when doing a container restart that doesn't give the system enough time to cleanup, but in this case the containers were off for over 10 minutes.

How can I force the cleanup of the cgroups, at least as a workaround? Where can we look for more clues to what may be causing the problem?

UrkoM · Jan 16, 2018

Using commands from this page:

http://blog.tinola.com/?e=21

I've been able to clean up all cgroups for the container ID.
Going to /sys/fs/cgroups, and running this line gets rid of all cgroups:

Code:

find <container id> -depth -type d -print -exec rmdir {} \;

Then I have found that the network interface stays configured on the vswitch. Used this command to clear it:

Code:

ovs-vsctl del-port <port name>

and then restart the openvswitch service, to get rid of the hidden veth port (possibly not the best way to do it):

Code:

systemctl restart openvswitch.service

After all this, starting the container still fails. Log files from running it with this line:

Code:

/usr/bin/lxc-start -F --logfile=/root/115.log --logpriority=DEBUG -n 115

are not showing any errors, but the lxc process dies, the container does not respond, and I need to forcefully kill it.

I am really open for ideas...

Search

Search

LXC cgroups not cleaned up on container shutdown, can't restart

UrkoM

New Member

UrkoM

New Member

We value your privacy