Jessie and 3.10.0 OpenVZ kernel and the future

mo_ · Jun 3, 2015

Are lxc containers going to be enforced to be unprivileged? Because as far as I know thats the only way to make lxc even somewhat acceptable security wise? Without it any attacker managing to start a shell in a container via remote code execution in $insert_php_application_here basically has full access to the host.

mir · Jun 3, 2015

Do you have any links proofing this?

mo_ · Jun 3, 2015

amongst others, this one: http://andrea.corbellini.name/2015/02/20/are-lxc-and-docker-secure/

relevant bit:

LXC is somewhat incomplete. What I mean is that some parts of special filesystems like procfs or sysfs are not faked. For example, as of now, I can successfully change the value of host’s /proc/sys/kernel/panic or /sys/class/thermal/cooling_device0/cur_state.

The reason why LXC is “incomplete” doesn’t really matter (it’s actually the kernel to be incomplete, but anyhow…). What matters is that certain nasty actions can be forbade, not by LXC itself, but by an AppArmor/SELinux profile that blocks read and write access certain /proc and /sys components. The AppArmor rules were shipped in Ubuntu since 12.10 (Quantal), and have been included upstream since early 2014, together with the SELinux rules.

Therefore, a security context like AppArmor or SELinux is required to run LXC safely. Without it, the root user inside a guest can take control of the host.

this is because if you are uid 0 (root) inside the container (meaning that you are also root on the host!) you have access to /proc (which is not emulated).

So you either need selinux/apparmor as mentioned (which are extremely complicated and error-prone to configure) or unprivileged containers. This is not exactly a problem, but it means that you have to enforce unprivileged containers (until lxc catches up with openvz anyways, openvz is still WAY ahead of lxc, which is funny since most of the openvz code is now in the kernel (under the name lxc).

Another thing that article mentions is that from inside the container you can by default consume the entropy of the host by reading from /dev/(u)random a lot which is desastrous! Youd therefore also have to restrict access to those devices or ensure the host is running havaged.

mir · Jun 3, 2015

I see. So Solaris Zones and FreeBSD jails does still have an upper hand versus security. Unfortunately I have had no time to dig into the inner of Zones yet but hopefully I will some day.

dietmar · Jun 3, 2015

All this information is old. New lxc have emulated /proc (lxcfs) and cgroups,. And we also have apparmor ...

cmonty14 · Jun 22, 2015

dietmar said:
via backup/restore.

Hello dietmar,

could you please share some more details of the backup/restore procedure when migrating from OpenVZ to LXC?

THX

mo_ · Dec 10, 2015

mir said:
Before release of Proxmox 4.0 with LXC live migration will be implemented in LXC. See http://tycho.ws/blog/2014/09/container-migration.html

is there any rough ETA (are we talking "around february" or "end of 2016" here?)?

Because:

Code:

lxc live migration is currently not implemented

Seems to me like the usability of LXC containers is somewhat limited right now

tom · Dec 10, 2015

mo_ said:
is there any rough ETA (are we talking "around february" or "end of 2016" here?)?

Because:

Code:

lxc live migration is currently not implemented

Seems to me like the usability of LXC containers is somewhat limited right now

criu (needed for lxc live migration) is not stable enough, therefore we did not enable it.

LXC containers have a lot of additional features compared to OpenVZ, also migrations works fast. Only live migration is not yet there.

If you need stable live migration, go for qemu or run LXC inside qemu.

mo_ · Dec 10, 2015

Hi,

thanks for the quick response.

I used to have my eye on CRIU... its still not in a stable state yet? How unfortunate.

The "problem" with this is not a technical one, but I don't think I can recommend the upgrade to 4.0 just yet for container users then because other parties at the company are always trying to push for either VMware or HyperV (depending on whom you ask and the time of day

) and they can't run the containers without live migration because then those people would come screaming that $commercial_product_01 offers this and how its so important and what not (even though they technically don't run a single system where even a minute of downtime wouldnt be tollerable!). I'm mentioning this because a workshop about the future of virtualisation is coming up.

Did you by any chance get some quick performance comparisons between LXC containers and KVMs in 4.0 in your labs? Would have to put this on my todo list if not.

tom · Dec 10, 2015

Container live migration is very complex and will probably never work stable due to the nature of containers. Also OpenVZ live migration was never perfect, btw.

So if you need stable live migration NOW, use qemu.

But again, you CAN migrate LXC containers on Proxmox VE (stop - migrate - start). As containers start fast, you will just have a very minimal downtime.

dietmar · Dec 10, 2015

mo_ said:
I don't think I can recommend the upgrade to 4.0 just yet for container users then because other parties at the company are always trying to push for either VMware or HyperV

Are you sure that HyperV or VMware can even run linux containers (outside of a VM)?

mo_ · Dec 10, 2015

dietmar said:
Are you sure that HyperV or VMware can even run linux containers (outside of a VM)?

well of course not

Its really more of a thing of Proxmox vs. those commercial products. Gaining the ability to use containers on Proxmox is basically just a bonus then. But this bonus might be diminished now that you cant easily access files of not running containers (rootfs only mounts the raw file during runtime it seems).

Just dawned on me that I may have to re-evaluate "containers vs. KVM" now that we have LXC. Not having done any (deep) research yet it seems like LXC might come up short though. This is why I asked whether you had any performance comparisons you were willing to share.

dietmar · Dec 10, 2015

mo_ said:
But this bonus might be diminished now that you cant easily access files of not running containers (rootfs only mounts the raw file during runtime it seems).

If you use ZFS you still have access to files.

mo_ · Dec 10, 2015

Interesting, thanks!

zeebeedee · Dec 14, 2015

Thanks for explaining of live migration issues of lxc. Well, I will still wait for this feature, and think I'm not alone

grin · Jul 12, 2016

tom said:
Container live migration is very complex and will probably never work stable due to the nature of containers.

I am no kernel hacker buy my educated guess would be that the code would share large similarities to suspend/hibernate code. That one seems to be working for quite a time now.

tom said:
But again, you CAN migrate LXC containers on Proxmox VE (stop - migrate - start). As containers start fast, you will just have a very minimal downtime.

This one I really wonder how. Say I want to live migrate 1000 CT from node1 to node2. Obviosuly web GUI won't do. However on console there seem to be no:
* way to stop a CT on a different node
* migrate a CT from a different node (from where the console is)
So it's a lots of "ssh"ing all over the nodes, which works but pretty ugly.

Is there a cleaner way, eg. "root@node0# pct migrate -shutdown 123 node2" from node1 to node2 with stop and start ...?

fabian · Jul 12, 2016

grin said:
Is there a cleaner way, eg. "root@node0# pct migrate -shutdown 123 node2" from node1 to node2 with stop and start ...?

no (at least not yet). but what you can do is "pct shutdown 123 && pct migrate 123 othernode && ssh othernode -- pct start 123" - for a container with a working setup (i.e., nothing hanging at boot or shutdown) with volumes on shared storage, this should be very fast.

grin · Jul 12, 2016

No that won't work since shutdown won't finish before migrate starts. I wrote a small script to get it done, but it's ugly, since watches the output of 'ha-manager status' and try to wait for the proper state of the ct. But if anyone interested, ask.
Is there a way to 'shutdown and wait for completion' etc?

Search

Search

Jessie and 3.10.0 OpenVZ kernel and the future

mo_

Renowned Member

mir

Famous Member

mo_

Renowned Member

mir

Famous Member

dietmar

Proxmox Staff Member

cmonty14

Renowned Member

mo_

Renowned Member

tom

Proxmox Staff Member

mo_

Renowned Member

tom

Proxmox Staff Member

dietmar

Proxmox Staff Member

mo_

Renowned Member

dietmar

Proxmox Staff Member

mo_

Renowned Member

zeebeedee

New Member

grin

Renowned Member

fabian

Proxmox Staff Member

grin

Renowned Member

We value your privacy