LXC cgroups not cleaned up on container shutdown, can't restart

UrkoM

New Member
Oct 15, 2014
17
0
1
Hello,
One of our hosts is not cleaning up the cgroups for shutdown containers, and it prevents them from starting again. Here is a snippet of the log file I obtained by starting the container with:
/usr/bin/lxc-start -F --logfile=/root/135.log --logpriority=DEBUG -n 135
Code:
lxc-start 135 20180116013019.550 INFO     lxc_cgroup - cgroups/cgroup.c:cgroup_init:67 - cgroup driver cgroupfs-ng initing for 135
      lxc-start 135 20180116013019.550 ERROR    lxc_cgfsng - cgroups/cgfsng.c:create_path_for_hierarchy:1337 - Path "/sys/fs/cgroup/cpu//lxc/135" already existed.
      lxc-start 135 20180116013019.550 ERROR    lxc_cgfsng - cgroups/cgfsng.c:cgfsng_create:1433 - Failed to create "/sys/fs/cgroup/cpu//lxc/135"

When we try to start one of these containers, the web interface becomes unresponsive for that host. 2 additional VMs on the same host are running well, totally unaffected.

I've found some conversations online about similar issues when doing a container restart that doesn't give the system enough time to cleanup, but in this case the containers were off for over 10 minutes.

How can I force the cleanup of the cgroups, at least as a workaround? Where can we look for more clues to what may be causing the problem?
 
Using commands from this page:
I've been able to clean up all cgroups for the container ID.
Going to /sys/fs/cgroups, and running this line gets rid of all cgroups:
Code:
find <container id> -depth -type d -print -exec rmdir {} \;
Then I have found that the network interface stays configured on the vswitch. Used this command to clear it:
Code:
ovs-vsctl del-port <port name>
and then restart the openvswitch service, to get rid of the hidden veth port (possibly not the best way to do it):
Code:
systemctl restart openvswitch.service

After all this, starting the container still fails. Log files from running it with this line:
Code:
/usr/bin/lxc-start -F --logfile=/root/115.log --logpriority=DEBUG -n 115
are not showing any errors, but the lxc process dies, the container does not respond, and I need to forcefully kill it.

I am really open for ideas... :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!