LXC running docker fails to start after upgrade from 6.4 to 7.0

oester

Member
Jan 9, 2021
19
2
23
68
I was doing a test upgrade today on on server, followed the upgrade guide. I had a number of turnkey core 16.0 LXC's running docker, and after upgrade these failed to start docker. I eventually chased the first issue down to "overlay" not being available, so I added that to /etc/modules-load.d/modules.conf. and rebooted. Then I ran into the second issue with cgroups:

WARN[2021-07-06T14:42:58.934309464-05:00] Your kernel does not support cgroup memory limit
WARN[2021-07-06T14:42:58.934335793-05:00] Unable to find cpu cgroup in mounts
WARN[2021-07-06T14:42:58.934358938-05:00] Unable to find blkio cgroup in mounts
WARN[2021-07-06T14:42:58.934375725-05:00] Unable to find cpuset cgroup in mounts
WARN[2021-07-06T14:42:58.934439826-05:00] mountpoint for pids not found
INFO[2021-07-06T14:42:58.935103296-05:00] stopping event stream following graceful shutdown error="context canceled" module=libcontainerd namespace=plugins.moby
INFO[2021-07-06T14:42:58.935110144-05:00] stopping healthcheck following graceful shutdown module=libcontainerd
INFO[2021-07-06T14:42:58.935494478-05:00] pickfirstBalancer: HandleSubConnStateChange: 0xc00015cd10, TRANSIENT_FAILURE module=grpc
INFO[2021-07-06T14:42:58.935516976-05:00] pickfirstBalancer: HandleSubConnStateChange: 0xc00015cd10, CONNECTING module=grpc
Error starting daemon: Devices cgroup isn't mounted

And I was pointed to https://forum.proxmox.com/threads/p...trough-not-working-anymore.92025/#post-400916, which is discussing allowing certain cgroup devices. However, I don't know which devices to allow. Can anyone help out here. I was hoping the release notes covered LXC and docker (potential) issues, but I've not found any notes. I'm sure this is bound to come up for others running docker + Debian LXC.

FYI Ubuntu 20.04 LXC + starts Docker just fine.

Thanks.
 
Experiencing same issue on Ubuntu 20.04, docker worked in lxc container in Proxmox 6.4:

Code:
ERROR: for reverse  Cannot start service reverse: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: process_linux.go:458: setting cgroup config for procHooks process caused: can't load program: operation not permitted: unknown
ERROR: Encountered errors while bringing up the project.
 
WARN[2021-07-06T14:42:58.934309464-05:00] Your kernel does not support cgroup memory limit
WARN[2021-07-06T14:42:58.934335793-05:00] Unable to find cpu cgroup in mounts
WARN[2021-07-06T14:42:58.934358938-05:00] Unable to find blkio cgroup in mounts
WARN[2021-07-06T14:42:58.934375725-05:00] Unable to find cpuset cgroup in mounts

on a hunch - I'd guess that the docker implementation in the container is expecting a cgroupv1 or hybrid layout - see
https://pve.proxmox.com/pve-docs/chapter-pct.html#pct_cgroup

does it work if you switch away from a unified layout to hybrid?
 
on a hunch - I'd guess that the docker implementation in the container is expecting a cgroupv1 or hybrid layout - see
https://pve.proxmox.com/pve-docs/chapter-pct.html#pct_cgroup

does it work if you switch away from a unified layout to hybrid?


I just tested with


systemd.unified_cgroup_hierarchy=0


and I still get:

"ERROR: for reverse Cannot start service reverse: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: process_linux.go:458: setting cgroup config for procHooks p rocess caused: can't load program: operation not permitted: unknown

ERROR: for reverse Cannot start service reverse: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: process_linux.go:458: setting cgroup config for procHooks p rocess caused: can't load program: operation not permitted: unknown"

This is with an Ubuntu 20.04 LTS LXC Image running Docker.
 
Just to be sure, you set that setting on the PVE host's kernel commandline and rebooted afterwards? cat /proc/cmdline

No you're right, I used a line break when adding the parameter and it wasn't loaded.

After adding systemd.unified_cgroup_hierarchy=0 to /etc/kernel/cmdline as documented in:
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline
(making sure to keep everything on the same line)

It now works, however most of my existing containers (using bind mounts) no longer work. Also as proxmox is migrating to cgroupv2 in future releases (thus disabling this hybrid approach), is there anything that can be done to fix this issue (i.e. running docker within LXC container)?

EDIT: Using this hybrid fix causes several errors in my existing lxc containers (almost half no longer load):
Code:
()
cgfsng_setup_limits_legacy: 2764 Bad address - Failed to set "devices.deny" to "a"
cgroup_tree_create: 808 Failed to setup legacy device limits
cgfsng_payload_create: 1171 Numerical result out of range - Failed to create container cgroup
lxc_spawn: 1644 Failed creating cgroups
__lxc_start: 2073 Failed to spawn container "103"
TASK ERROR: startup for container '103' failed
 
Last edited:
  • Like
Reactions: RCaldeira
The hybrid approach will stay around for Proxmox VE 7.x, so for the next ~two years you could be fine with that.

It now works. As proxmox is migrating to cgroupv2 in future releases (thus disabling this hybrid approach), is there anything that can be done to fix this issue (i.e. running docker within LXC container)?
The systemd version in Ubuntu 20.04 supports cgroupv2 just fine, and it seems docker should support it since docker 20.10 (did not confirm it, but their release note states so), so maybe check if you can upgrade to that version.
 
The hybrid approach will stay around for Proxmox VE 7.x, so for the next ~two years you could be fine with that.


The systemd version in Ubuntu 20.04 supports cgroupv2 just fine, and it seems docker should support it since docker 20.10 (did not confirm it, but their release note states so), so maybe check if you can upgrade to that version.

Started to work after I upgraded from Docker 20.10.5 to 20.10.7 in my case:

Code:
user@Reverse-Proxy:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.2 LTS
Release:        20.04
Codename:       focal


Code:
user@Reverse-Proxy:~$ docker --version
Docker version 20.10.7, build f0df350
user@Reverse-Proxy:~$


Something was fixed in the point release to get it working, thanks for the pointer on the docker version. Anyone having issues I recommend upgrading Docker to 20.10.7+ and using Ubuntu 18.04 or later.
 
  • Like
Reactions: nak
No you're right, I used a line break when adding the parameter and it wasn't loaded.

After adding systemd.unified_cgroup_hierarchy=0 to /etc/kernel/cmdline as documented in:
https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysboot_edit_kernel_cmdline
(making sure to keep everything on the same line)

It now works, however most of my existing containers (using bind mounts) no longer work. Also as proxmox is migrating to cgroupv2 in future releases (thus disabling this hybrid approach), is there anything that can be done to fix this issue (i.e. running docker within LXC container)?

EDIT: Using this hybrid fix causes several errors in my existing lxc containers (almost half no longer load):
Code:
()
cgfsng_setup_limits_legacy: 2764 Bad address - Failed to set "devices.deny" to "a"
cgroup_tree_create: 808 Failed to setup legacy device limits
cgfsng_payload_create: 1171 Numerical result out of range - Failed to create container cgroup
lxc_spawn: 1644 Failed creating cgroups
__lxc_start: 2073 Failed to spawn container "103"
TASK ERROR: startup for container '103' failed
I agree, setting "systemd.unified_cgroup_hierarchy=0" causes this same error for me on a LXC container with docker running Turnkey core.
 
  • Like
Reactions: RCaldeira
Following up to my previous comment - the docker version in LXC template "debian-10-turnkey-core_16.0-1" is quite old (18.09). This is based on debian buster and may be the root of the problem. The Ubuntu version I'm running is 20.10.2 and that's fine.

I'm going to steer clear of any Debian based LXC's for a while.
 
Have same error on centos 7

Code:
cgfsng_setup_limits_legacy: 2764 Bad address - Failed to set "devices.deny" to "a"
cgroup_tree_create: 808 Failed to setup legacy device limits
cgfsng_payload_create: 1171 Numerical result out of range - Failed to create container cgroup
lxc_spawn: 1644 Failed creating cgroups
__lxc_start: 2073 Failed to spawn container "100"
TASK ERROR: startup for container '100' failed

Any solution for fix compatibility with CentOS 7 ?

Best Regards
 
There are four bullentin points over at:
https://pve.proxmox.com/pve-docs/chapter-pct.html#pct_cgroup_compat

You can pick any.
Yes have read this and add
Code:
systemd.unified_cgroup_hierarchy=0

Hence my error message above following this modification.
But not work for CentOS 7, since I can't update CentOS 7 to CentOS 8 yet, I need to run them as is.
PS : I dont use docker on this LXC and other point CentOS 8 also not work too when you active this option all LXC dont run.
Other point i think is better to add option for add individually server by server : https://wiki.debian.org/LXC/CGroupV2

Also have try :

Code:
lxc.init.cmd: /lib/systemd/systemd systemd.unified_cgroup_hierarchy=0
In LXC conf
 
Last edited:
Yes have read this and add
Code:
systemd.unified_cgroup_hierarchy=0

Hence my error message above following this modification.
But not work for CentOS 7, since I can't update CentOS 7 to CentOS 8 yet, I need to run them as is.
PS : I dont use docker on this LXC and other point CentOS 8 also not work too when you active this option all LXC dont run.
Other point i think is better to add option for add individually server by server : https://wiki.debian.org/LXC/CGroupV2

Also have try :

Code:
lxc.init.cmd: /lib/systemd/systemd systemd.unified_cgroup_hierarchy=0
In LXC conf
Just to be sure, did you reboot the host afterwards, you can see the active kernel boot command line with cat /proc/cmdline
 
Just to be sure, did you reboot the host afterwards, you can see the active kernel boot command line with cat /proc/cmdline
Yes is loaded :

Code:
cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-5.11.22-1-pve root=/dev/mapper/pve-root ro quiet systemd.unified_cgroup_hierarchy=0

Code:
pct start 100
cgfsng_setup_limits_legacy: 2764 Bad address - Failed to set "devices.deny" to "a"
cgroup_tree_create: 808 Failed to setup legacy device limits
cgfsng_payload_create: 1171 Numerical result out of range - Failed to create container cgroup
lxc_spawn: 1644 Failed creating cgroups
__lxc_start: 2073 Failed to spawn container "100"
startup for container '100' failed
 
Last edited:
  • Like
Reactions: RCaldeira
pct start 100 cgfsng_setup_limits_legacy: 2764 Bad address - Failed to set "devices.deny" to "a" cgroup_tree_create: 808 Failed to setup legacy device limits cgfsng_payload_create: 1171 Numerical result out of range - Failed to create container cgroup lxc_spawn: 1644 Failed creating cgroups __lxc_start: 2073 Failed to spawn container "100" startup for container '100' failed
please post:
* the container config
* the debug-logs when trying to start the container - see https://pve.proxmox.com/pve-docs/chapter-pct.html#_obtaining_debugging_logs
* `ls /sys/fs/cgroup`
* `mount |grep cgroup`
 
No problem that my all infos @Stoiko Ivanov :


Code:
cat /etc/pve/nodes/pve-lab/lxc/100.conf
arch: amd64
cores: 1
hostname: myserver.com
memory: 512
net0: name=eth0,bridge=vmbr0,gw=10.10.155.254,hwaddr=4E:F9:5F:BB:CA:B5,ip=10.10.155.122/24,type=veth
ostype: centos
rootfs: netapp:vm-100-disk-0,size=8G
swap: 512

Code:
ls /sys/fs/cgroup
blkio  cpu  cpuacct  cpu,cpuacct  cpuset  devices  freezer  hugetlb  memory  net_cls  net_cls,net_prio  net_prio  perf_event  pids  rdma  systemd  unified

Code:
mount |grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,inode64)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)

Code:
pct start 100 --debug
cgfsng_setup_limits_legacy: 2764 Bad address - Failed to set "devices.deny" to "a"
cgroup_tree_create: 808 Failed to setup legacy device limits
cgfsng_payload_create: 1171 Numerical result out of range - Failed to create container cgroup
lxc_spawn: 1644 Failed creating cgroups
__lxc_start: 2073 Failed to spawn container "100"
210708081531.779 DEBUG    seccomp - seccomp.c:parse_config_v2:656 - Host native arch is [3221225534]
INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "reject_force_umount  # comment this to allow umount -f;  not recommended"
INFO     seccomp - seccomp.c:do_resolve_add_rule:524 - Set seccomp rule to reject force umounts
INFO     seccomp - seccomp.c:do_resolve_add_rule:524 - Set seccomp rule to reject force umounts
INFO     seccomp - seccomp.c:do_resolve_add_rule:524 - Set seccomp rule to reject force umounts
INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "[all]"
INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "kexec_load errno 1"
INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[246:kexec_load] action[327681:errno] arch[0]
INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[246:kexec_load] action[327681:errno] arch[1073741827]
INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[246:kexec_load] action[327681:errno] arch[1073741886]
INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "open_by_handle_at errno 1"
INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[304:open_by_handle_at] action[327681:errno] arch[0]
INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[304:open_by_handle_at] action[327681:errno] arch[1073741827]
INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[304:open_by_handle_at] action[327681:errno] arch[1073741886]
INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "init_module errno 1"
INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[175:init_module] action[327681:errno] arch[0]
INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[175:init_module] action[327681:errno] arch[1073741827]
INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[175:init_module] action[327681:errno] arch[1073741886]
INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "finit_module errno 1"
INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[313:finit_module] action[327681:errno] arch[0]
INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[313:finit_module] action[327681:errno] arch[1073741827]
INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[313:finit_module] action[327681:errno] arch[1073741886]
INFO     seccomp - seccomp.c:parse_config_v2:807 - Processing "delete_module errno 1"
INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding native rule for syscall[176:delete_module] action[327681:errno] arch[0]
INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[176:delete_module] action[327681:errno] arch[1073741827]
INFO     seccomp - seccomp.c:do_resolve_add_rule:564 - Adding compat rule for syscall[176:delete_module] action[327681:errno] arch[1073741886]
INFO     seccomp - seccomp.c:parse_config_v2:1017 - Merging compat seccomp contexts into main context
INFO     start - start.c:lxc_init:855 - Container "100" is initialized
INFO     cgfsng - cgroups/cgfsng.c:cgfsng_monitor_create:1070 - The monitor process uses "lxc.monitor/100" as cgroup
DEBUG    storage - storage/storage.c:storage_query:233 - Detected rootfs type "dir"
ERROR    cgfsng - cgroups/cgfsng.c:cgfsng_setup_limits_legacy:2764 - Bad address - Failed to set "devices.deny" to "a"
ERROR    cgfsng - cgroups/cgfsng.c:cgroup_tree_create:808 - Failed to setup legacy device limits
DEBUG    cgfsng - cgroups/cgfsng.c:cgfsng_payload_create:1160 - Failed to create cgroup "(null)"
ERROR    cgfsng - cgroups/cgfsng.c:cgfsng_payload_create:1171 - Numerical result out of range - Failed to create container cgroup
ERROR    start - start.c:lxc_spawn:1644 - Failed creating cgroups
DEBUG    network - network.c:lxc_delete_network:4180 - Deleted network devices
ERROR    start - start.c:__lxc_start:2073 - Failed to spawn container "100"
startup for container '100' failed

Best Regards
 
Last edited:
No problem that my all infos @Stoiko Ivanov :
Thanks - managed to reproduce the issue (did not assume the container is privileged).

Will look into it - in the meanwhile - maybe you can recreate the container as unprivileged one?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!