LXC security.nesting

trystan

New Member
Dec 15, 2017
21
1
3
34
Upstream LXC/LXD has had a 'security.nesting' option for over a year that reliably enables LXC to run other container runtimes underneath itself without using an unconfined apparmor profile.

Is there an equivalent lxc.conf option in Proxmox?
 
  • Like
Reactions: Don Daniello
Not yet. You can add a profile manually for now, eg.
Code:
# /etc/apparmor.d/lxc/lxc-default-cgns-with-nesting
profile lxc-container-default-cgns flags=(attach_disconnected,mediate_deleted) {
  #include <abstractions/lxc/container-base>
  #include <abstractions/lxc/start-container>

  deny /dev/.lxc/proc/** rw,
  deny /dev/.lxc/sys/** rw,
  mount fstype=cgroup -> /sys/fs/cgroup/**,
  mount fstype=proc -> /var/cache/lxc/**,
  mount fstype=sysfs -> /var/cache/lxc/**,
  mount options=(rw,bind),
}
and use that via lxc.apparmor.profile in your containers.
 
Using the profile provided (After matching the profile name to the file name mentioned) and reloading apparmor/applying to the container still gives me significant problems that I'm not encountering on an lxd system with 'security.nesting true'. Also attempted the 'default with nesting' profile and various others that apparmor had loaded by default.

There's no reason to have to use an unconfined profile for runc/rkt when upstream docker/LXD has built in support
 
I think this would be a good feature to have, for both AppArmor and SELinux.

In my situation, I attempted to use Ubuntu Snaps and was unable to 'confine' them within the container since AppArmor was unable to run.

When looking to set LXC option "security.nesting=true" on the container in question, I was unable to find a way to do that in PVE. So this effectively prevents me from running AppArmor or SELinux in 'unprivileged' containers.

The profile mentioned above, does that remove any protections compared to just setting 'security.nesting'?

Are there plans to add the option in PVE to use 'security.nesting'?
 
Yes, this is being worked on. There will be some quirks to deal with, however. (Particularly docker and systemd-networkd manage to have conflicting requirements for when they're running in a user namespace, for example.)
 
Even with the above, I get:

root@server:~# snap install rocketchat-server
error: cannot perform the following tasks:
- Setup snap "core" (5328) security profiles (cannot setup apparmor for snap "core": cannot load apparmor profile "snap-update-ns.core": cannot load apparmor profile: exit status 243
apparmor_parser output:
apparmor_parser: Unable to replace "snap-update-ns.core". Permission denied; attempted to load a profile while confined?
)
- Setup snap "core" (5328) security profiles (cannot load apparmor profile "snap-update-ns.core": cannot load apparmor profile: exit status 243
apparmor_parser output:
apparmor_parser: Unable to replace "snap-update-ns.core". Permission denied; attempted to load a profile while confined?
)
 
Yes, this is being worked on. There will be some quirks to deal with, however. (Particularly docker and systemd-networkd manage to have conflicting requirements for when they're running in a user namespace, for example.)
Any news on adding 'security.nesting=true' feature to PVE?
 
With pve-container >=2.0-28 you can start testing the `features` setting in containers. Remove any custom `lxc.apparmor.profile` lines and use `features: nesting=1` if you want to just nest lxc or lxd - if you want to nest docker in an _unprivileged_ container, you'll need to also add 'keyctl' to the features list (which will cause systemd-networkd to refuse to work, btw. - apparently systemd-networkd is fine with keyctl() to not exist, and docker is not, but if it does exist, docker is happy, but systemd-networkd absolutely hates being unprivileged...)

Edit: fixed the `features` example (missing `=1`)
 
Last edited:
With pve-container >=2.0-28 you can start testing the `features` setting in containers. Remove any custom `lxc.apparmor.profile` lines and use `features: nesting` if you want to just nest lxc or lxd - if you want to nest docker in an _unprivileged_ container, you'll need to also add 'keyctl' to the features list
I updated pve-container to 2.0-28. Did you mean just `features: nesting' should be placed in /etc/pve/lxc/<CTID>.conf? It prevents CT from starting with error "unable to parse value of 'features' - value without key, but schema does not define a default key".
 
placed in /etc/pve/lxc/<CTID>.conf
Yes to that part
`features: nesting'
Sorry, should be `features: nesting=1`. (Also updated my post above.)

Edit:
You can also check the `pct(1)` manpage for a little more info on the `features` line.
If you scroll down to the `Configuration` section's `Options` subsection, there's also more details about the individual keys in the features line.
 
should be `features: nesting=1`.
You can also check the `pct(1)` manpage for a little more info on the `features` line.
Thanks. I've tried docker in unprivileged CT with overlay2 and it failed with "operation not permitted". In 2017, some people had said you should choose to use either none overlay2 or privileged container. Is it effective now or can be upgrade later?
 
@wbumiller I just tried your suggestion to add features: nesting=1 in an Ubuntu 18.04 LXC container. It starts fine. Then I go ahead and install snapd, but it fails to start snapd.service. I've tried to run the container both in unprivileged and privileged mode, but it doesn't make a difference.

I'm a little bit unclear about if I still need to add /etc/apparmor.d/lxc/lxc-default-cgns-with-nesting or if that step is unnecessary at this time?

I'm running pve-manager/5.2-9/4b30e8f9 and pve-container=2.0-28.
 
snapd requires a lot more than just nesting, if you look at the log output when starting it you probably see it complain about not being able to mount a squashfs file system - which you can allow by adding ',mount=squashfs' to the features line. However, in order to mount anything from files it needs to be able to set up a loopback device, which requires access to /dev/loop*. You can find how to set this up in this[1] thread, including reasons why doing so is a bad idea from a security point of view.
Now, since the loopdevices are added via bind-mounts, the owning user and group IDs won't be mapped into the container, so this only works with privileged containers.

[1] https://forum.proxmox.com/threads/mount-via-loop-device-in-container.47398
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!