Jessie and 3.10.0 OpenVZ kernel and the future

Are lxc containers going to be enforced to be unprivileged? Because as far as I know thats the only way to make lxc even somewhat acceptable security wise? Without it any attacker managing to start a shell in a container via remote code execution in $insert_php_application_here basically has full access to the host.
 
amongst others, this one: http://andrea.corbellini.name/2015/02/20/are-lxc-and-docker-secure/

relevant bit:

LXC is somewhat incomplete. What I mean is that some parts of special filesystems like procfs or sysfs are not faked. For example, as of now, I can successfully change the value of host’s /proc/sys/kernel/panic or /sys/class/thermal/cooling_device0/cur_state.

The reason why LXC is “incomplete” doesn’t really matter (it’s actually the kernel to be incomplete, but anyhow…). What matters is that certain nasty actions can be forbade, not by LXC itself, but by an AppArmor/SELinux profile that blocks read and write access certain /proc and /sys components. The AppArmor rules were shipped in Ubuntu since 12.10 (Quantal), and have been included upstream since early 2014, together with the SELinux rules.

Therefore, a security context like AppArmor or SELinux is required to run LXC safely. Without it, the root user inside a guest can take control of the host.

this is because if you are uid 0 (root) inside the container (meaning that you are also root on the host!) you have access to /proc (which is not emulated).

So you either need selinux/apparmor as mentioned (which are extremely complicated and error-prone to configure) or unprivileged containers. This is not exactly a problem, but it means that you have to enforce unprivileged containers (until lxc catches up with openvz anyways, openvz is still WAY ahead of lxc, which is funny since most of the openvz code is now in the kernel (under the name lxc).

Another thing that article mentions is that from inside the container you can by default consume the entropy of the host by reading from /dev/(u)random a lot which is desastrous! Youd therefore also have to restrict access to those devices or ensure the host is running havaged.
 
I see. So Solaris Zones and FreeBSD jails does still have an upper hand versus security. Unfortunately I have had no time to dig into the inner of Zones yet but hopefully I will some day.
 
All this information is old. New lxc have emulated /proc (lxcfs) and cgroups,. And we also have apparmor ...
 
is there any rough ETA (are we talking "around february" or "end of 2016" here?)?

Because:

Code:
lxc live migration is currently not implemented

Seems to me like the usability of LXC containers is somewhat limited right now

criu (needed for lxc live migration) is not stable enough, therefore we did not enable it.

LXC containers have a lot of additional features compared to OpenVZ, also migrations works fast. Only live migration is not yet there.

If you need stable live migration, go for qemu or run LXC inside qemu.
 
Hi,

thanks for the quick response.

I used to have my eye on CRIU... its still not in a stable state yet? How unfortunate.

The "problem" with this is not a technical one, but I don't think I can recommend the upgrade to 4.0 just yet for container users then because other parties at the company are always trying to push for either VMware or HyperV (depending on whom you ask and the time of day ;)) and they can't run the containers without live migration because then those people would come screaming that $commercial_product_01 offers this and how its so important and what not (even though they technically don't run a single system where even a minute of downtime wouldnt be tollerable!). I'm mentioning this because a workshop about the future of virtualisation is coming up.

Did you by any chance get some quick performance comparisons between LXC containers and KVMs in 4.0 in your labs? Would have to put this on my todo list if not.
 
Container live migration is very complex and will probably never work stable due to the nature of containers. Also OpenVZ live migration was never perfect, btw.

So if you need stable live migration NOW, use qemu.

But again, you CAN migrate LXC containers on Proxmox VE (stop - migrate - start). As containers start fast, you will just have a very minimal downtime.
 
I don't think I can recommend the upgrade to 4.0 just yet for container users then because other parties at the company are always trying to push for either VMware or HyperV

Are you sure that HyperV or VMware can even run linux containers (outside of a VM)?
 
Are you sure that HyperV or VMware can even run linux containers (outside of a VM)?

well of course not ;) Its really more of a thing of Proxmox vs. those commercial products. Gaining the ability to use containers on Proxmox is basically just a bonus then. But this bonus might be diminished now that you cant easily access files of not running containers (rootfs only mounts the raw file during runtime it seems).

Just dawned on me that I may have to re-evaluate "containers vs. KVM" now that we have LXC. Not having done any (deep) research yet it seems like LXC might come up short though. This is why I asked whether you had any performance comparisons you were willing to share.
 
Thanks for explaining of live migration issues of lxc. Well, I will still wait for this feature, and think I'm not alone :)
 
  • Like
Reactions: Gilles
Container live migration is very complex and will probably never work stable due to the nature of containers.
I am no kernel hacker buy my educated guess would be that the code would share large similarities to suspend/hibernate code. That one seems to be working for quite a time now.

But again, you CAN migrate LXC containers on Proxmox VE (stop - migrate - start). As containers start fast, you will just have a very minimal downtime.
This one I really wonder how. Say I want to live migrate 1000 CT from node1 to node2. Obviosuly web GUI won't do. However on console there seem to be no:
* way to stop a CT on a different node
* migrate a CT from a different node (from where the console is)
So it's a lots of "ssh"ing all over the nodes, which works but pretty ugly.

Is there a cleaner way, eg. "root@node0# pct migrate -shutdown 123 node2" from node1 to node2 with stop and start ...?
 
Is there a cleaner way, eg. "root@node0# pct migrate -shutdown 123 node2" from node1 to node2 with stop and start ...?

no (at least not yet). but what you can do is "pct shutdown 123 && pct migrate 123 othernode && ssh othernode -- pct start 123" - for a container with a working setup (i.e., nothing hanging at boot or shutdown) with volumes on shared storage, this should be very fast.
 
No that won't work since shutdown won't finish before migrate starts. I wrote a small script to get it done, but it's ugly, since watches the output of 'ha-manager status' and try to wait for the proper state of the ct. But if anyone interested, ask.
Is there a way to 'shutdown and wait for completion' etc?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!