Migration from 6.4 to 7.0

baldy

Active Member
Feb 9, 2017
17
4
43
42
Hi there,

i wanted to migrate some old Containers fron 6.4 to a fresh 7.0 PMX Cluster.
For some reason i am not able to start some old Containers becuase of cgroupsv2.

I found some documentions about how to fix it. But that seems to be not working:


Code:
cat /etc/default/grub

# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox VE"
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=0"

# Disable os-prober, it might add menu entries for each guest
GRUB_DISABLE_OS_PROBER=true


Code:
~# cat /etc/kernel/cmdline
systemd.unified_cgroup_hierarchy=0

After updating Grub und proxmox-boot-tool and restarting the Host I get the follwing error when i start the Container:

Code:
cgfsng_setup_limits_legacy: 2764 Bad address - Failed to set "devices.deny" to "a"
cgroup_tree_create: 808 Failed to setup legacy device limits
cgfsng_payload_create: 1171 Numerical result out of range - Failed to create container cgroup
lxc_spawn: 1644 Failed creating cgroups
__lxc_start: 2073 Failed to spawn container "103"
TASK ERROR: startup for container '103' failed

Any idea how to fix this issue?

Cheers
Daniel
 
  • Like
Reactions: jpros and fvanlint
Code:
cgfsng_setup_limits_legacy: 2764 Bad address - Failed to set "devices.deny" to "a"
cgroup_tree_create: 808 Failed to setup legacy device limits
cgfsng_payload_create: 1171 Numerical result out of range - Failed to create container cgroup
lxc_spawn: 1644 Failed creating cgroups
__lxc_start: 2073 Failed to spawn container "103"
TASK ERROR: startup for container '103' failed
Any idea how to fix this issue?

Can you please also share the container config here?

pct config VMID
 
Hi,

here it comes;


Code:
arch: amd64
cores: 16
hostname: blablub
memory: 16384
net0: name=eth0,bridge=vmbr0,gw=10.0.3.1,hwaddr=82:D4:B9:29:AC:1A,ip=10.0.3.76/24,type=veth
ostype: ubuntu
rootfs: local-lvm:vm-103-disk-0,size=100G
swap: 16384

Cheers
 
Hello, I have the same problem with all LXC containers using
Code:
systemd.unified_cgroup_hierarchy=0
.
We use docker-in-lxc and I thought it would be an easy way to avoid the cgroupv2 issue.

Example of such an LXC

Code:
pct config 150
arch: amd64
cores: 2
features: fuse=1,mknod=1,nesting=1
hostname: DDNS
memory: 256
net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=36:06:55:CA:8E:A0,ip=dhcp,ip6=dhcp,type=veth
onboot: 1
ostype: ubuntu
rootfs: tank:subvol-150-disk-0,size=30G
swap: 256
lxc.apparmor.profile: unconfined
lxc.cgroup.devices.allow: a
lxc.cap.drop:

Our error in proxmox contains exactly the same as @baldy but also a warning about apparmor (does not seem to be an actual issue).

Code:
explicitly configured lxc.apparmor.profile overrides the following settings: features:fuse, features:nesting
cgfsng_setup_limits_legacy: 2764 Bad address - Failed to set "devices.deny" to "a"
cgroup_tree_create: 808 Failed to setup legacy device limits
cgfsng_payload_create: 1171 Numerical result out of range - Failed to create container cgroup
lxc_spawn: 1644 Failed creating cgroups
__lxc_start: 2073 Failed to spawn container "150"
TASK ERROR: startup for container '150' failed

Interestingly, when removing the last 3 lines, the error remains the same:
Code:
arch: amd64
cores: 2
features: fuse=1,mknod=1,nesting=1
hostname: DDNS
memory: 256
net0: name=eth0,bridge=vmbr0,firewall=1,hwaddr=36:06:55:CA:8E:A0,ip=dhcp,ip6=dhcp,type=veth
onboot: 1
ostype: ubuntu
rootfs: tank:subvol-150-disk-0,size=30G
swap: 256

Code:
lxc-start -n 150 -F -lDEBUG -o lxc-150.log
lxc-start: 150: cgroups/cgfsng.c: cgfsng_setup_limits_legacy: 2764 Bad address - Failed to set "devices.deny" to "a"
lxc-start: 150: cgroups/cgfsng.c: cgroup_tree_create: 808 Failed to setup legacy device limits
lxc-start: 150: cgroups/cgfsng.c: cgfsng_payload_create: 1171 Numerical result out of range - Failed to create container cgroup
lxc-start: 150: start.c: lxc_spawn: 1644 Failed creating cgroups
lxc-start: 150: start.c: __lxc_start: 2073 Failed to spawn container "150"
lxc-start: 150: tools/lxc_start.c: main: 308 The container failed to start
lxc-start: 150: tools/lxc_start.c: main: 313 Additional information can be obtained by setting the --logfile and --logpriority options

EDIT: Well, reverting ystemd.unified_cgroup_hierarchy=0 and moving to lxc.cgroup2.devices.allow: a showed me a different issue. We run the LXC containers on ZFS, and AUFS is no longer supported... so yeah. That warrants a topic in and by itself. I hope the above helps in some way.
 
Last edited:
Hey all!

I don't think I have any different config on my CT, either way, I'm posting it here:

Code:
arch: amd64
cores: 1
hostname: management
memory: 512
net0: name=eth0,bridge=vmbr0,firewall=1,gw=192.168.8.1,hwaddr=0A:85:99:63:EF:95,ip=192.168.8.21/21,type=veth
ostype: ubuntu
rootfs: local-lvm:vm-101-disk-1,size=8G
swap: 512

Also with the same boot config to enable legacy cgroups and with the same error thrown:

Code:
cgfsng_setup_limits_legacy: 2764 Bad address - Failed to set "devices.deny" to "a"
cgroup_tree_create: 808 Failed to setup legacy device limits
cgfsng_payload_create: 1171 Numerical result out of range - Failed to create container cgroup
lxc_spawn: 1644 Failed creating cgroups
__lxc_start: 2073 Failed to spawn container "101"
TASK ERROR: startup for container '101' failed

My installation of Proxmox 7 is brand new (so no migration/upgrade on this case) and the CT is also brand new (Ubuntu 16).
 
In addition to the grub command line "systemd.unified_cgroup_hierarchy=0" I also added the following two lines to my container config files (in /etc/pve/lxc/#id#.conf):

lxc.cgroup.devices.allow =
lxc.cgroup.devices.deny =

See https://github.com/lxc/lxc/issues/2268 for details.

After that all my containers started up again.
 
  • Like
Reactions: jpros
@chkern
Thank you for your fix!

It solved my big troubles with zimbra.

Because how can I update the container when it does not start anymore?

2 years ago I gave the order: only VMs no containers
Because of the live migration when upadtes are necessary.
But we still have some old containers ...
 
In addition to the grub command line "systemd.unified_cgroup_hierarchy=0" I also added the following two lines to my container config files (in /etc/pve/lxc/#id#.conf):

lxc.cgroup.devices.allow =
lxc.cgroup.devices.deny =

See https://github.com/lxc/lxc/issues/2268 for details.

After that all my containers started up again.
FYI, that should not be required anymore with the new lxc-pve package update with version 4.0.9-4 available on no-subscription repo at time of writing.
 
FYI, that should not be required anymore with the new lxc-pve package update with version 4.0.9-4 available on no-subscription repo at time of writing.
With the new fixes, do we still need to add?

```
GRUB_CMDLINE_LINUX_DEFAULT="systemd.unified_cgroup_hierarchy=0 quiet"
```
 
With the new fixes, do we still need to add?

```
GRUB_CMDLINE_LINUX_DEFAULT="systemd.unified_cgroup_hierarchy=0 quiet"
```
If you run distro releases that break when running in a unified cgroupv2 environment, like CentOS 7, then yes.

The bug it fixed was a bug that could only happen when enforcing the old, legacy mixed cgroup v1 + v2 environment that CentOS 7 needs to run.

There cannot be any patch from our side that fixes CentOS 7 to cope with newer cgroup, either upgrade to CentOS 8 (AppStream or one of the new derivates), force the old cgroup setting via that kernel command line, switch the workload to another distro with more frequent releases or switch to VMs.
 
I spoke too soon. This works on Ubuntu 18.04 LXC containers, but does not work with Ubuntu 20.04 containers. If this was an issue with "old" Centos 7 style containers only, then it definitely would not appear in 20.04.
 
I spoke too soon. This works on Ubuntu 18.04 LXC containers, but does not work with Ubuntu 20.04 containers. If this was an issue with "old" Centos 7 style containers only, then it definitely would not appear in 20.04.
Just to confirm, what errors are you seeing on 20.04 containers?
 
I spoke too soon. This works on Ubuntu 18.04 LXC containers, but does not work with Ubuntu 20.04 containers. If this was an issue with "old" Centos 7 style containers only, then it definitely would not appear in 20.04.
If you force the system back to old legacy cgroups then new distros can break.
For Ubuntu 20.04 it sounds a bit weird to me, and rather like some other issues, would be good to know what you actually changed and, as the other poster asked, what issues you're seeing.
 
If you force the system back to old legacy cgroups then new distros can break.
For Ubuntu 20.04 it sounds a bit weird to me, and rather like some other issues, would be good to know what you actually changed and, as the other poster asked, what issues you're seeing.
Sorry for not being specific. Details below.

What doesn't work:

The error I am seeing is:
Code:
root@vis-ct-clx-08:~# docker run -it centos:7 bash
docker: Error response from daemon: cgroups: cgroup mountpoint does not exist: unknown.
ERRO[0000] error waiting for container: context canceled
root@vis-ct-clx-08:~#

GRUB_CMDLINE_LINUX_DEFAULT is not modified to include systemd.unified_cgroup_hierarchy=0.

My conf file looks like:
Code:
#mp0%3A /mnt/pve/scratch,mp=/scratch
arch: amd64
cores: 48
features: mount=nfs4,keyctl=1
hostname: vis-ct-clx-08
memory: 57344
net0: name=eth0,bridge=vmbr0,hwaddr=XX:XX:XX:XX:XX:XX,ip=dhcp,type=veth
onboot: 1
ostype: ubuntu
rootfs: local-lvm:vm-159-disk-0,size=192G
snaptime: 1573010169
swap: 16384
lxc.apparmor.profile: unconfined
lxc.cgroup.devices.allow: a
lxc.cap.drop:

Modifications and their effects:

Using:
Code:
lxc.cgroup.devices.allow:
lxc.cgroup.devices.deny:
(that worked for 18.04 containers) prior to lxc-pve version 4.0.9-4, still does not work.

An 18.04 container with an identical conf file, works.

Adding systemd.unified_cgroup_hierarchy=0 to GRUB_CMDLINE_LINUX_DEFAULT, works.

Using cgroup2 instead of cgroup with both setups (ie. devices.allow: a, and devices.allow and devices.deny) does not work.


Thank you!

George
 
The grub alteration works to start a centos7 container, but for example, for users using centos web panel, they will not be able to
turn the cfd firewall on or it will block the login to the centos web panel.
So basically this is not a complete solution.
 
LXC containers with Ubuntu 14.0 wil not start any services. Nesting=on solves the problem
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!