Unprivileged LXC container eventually locks up PVE host processes

timdonovan · Apr 20, 2022

I am running 2 unprivileged LXC containers on a proxmox-ve 7.1-1 (running kernel: 5.13.19-3-pve). Both lxc's are running docker. The only "exotic" about it is using Fuse for the docker filesystem. The have nesting,keyctl,fuse,mknod =1.

They run fine for hours, sometimes days, but after a while, the entire host goes "offline" (keep reading..):

Grey question mark appears on pve node/lxc's
Shell access (via web UI) and ssh is still available to the PVE host
Shell access (via web UI) and ssh is still available to the LXC containers
Eventually the node turns red (sometimes takes hours)
The system refuses to reboot or shutdown unless I use the forceful "magic SysRq sysproc" option
My guess is when I do a shutdown, pve gets stuck trying to do a clean shutdown of the lxc containers
I have tried killing the lxc processes and daemons but nothing happens
I have tried restarting the various pve daemons but it does not change host state
Eventually something times out (hours+) and PVE host reboots, returning the node state to green

While I'd love to find a solution to this, I am also perplexed as I thought the point of LXC was they are isolated from a host? Indeed the whole point of Proxmox and virtual machines and containers is guest isolation. If an unprivileged LXC container is managing to screw up an entire PVE host then it calls into question the entire nature of LXC's, surely?

I have left the node alone for the time being, so if any debug info is required please let me know, as I can still access everything via ssh.

Thank you!

Edit: I have attached dmesg log. I cannot attach journalctl because it's 1.1mb zipped and that is apparently too large a file.

timdonovan · Apr 25, 2022

I would have thought unprivileged LXC containers crashing a Proxmox node is quite the urgent bug. Happy to provide more info.

ahriman · Apr 26, 2022

Hi,

I've seen multiple threads where they tell you not to run Docker in LXCs, that it isn't supported, and that you need to use a VM when running Docker.

timdonovan · Apr 26, 2022

Proxmox is free to make recommendations on how to best run Docker. But LXCs are a container technology, and especially when running in unprivileged mode, should not impact a host system. To some degree, it's unimportant what is running inside an LXC or a VM, if a container is escaping its containerisation, that is a problem.

I'm not asking for support on "help docker doesn't work" I'm asking how I can best provide feedback on a fundamental flaw in the LXC/PVE implementation.

fabian · Apr 26, 2022

I think you are vastly overestimating what level of isolation running in a container provides, especially once you start allowing things inside the container that are not allowed by default (like mknod, and especially fuse, which is known to cause problematic interactions).. without more details (logs, configs, versions) narrowing down the problem will not be possible in any case.

some things to check:
- fuse interacts badly with the kernel freezer feature (used for snapshots and and backups)
- memory overcommitting -> swapping -> .. can lead to symptoms like you are describing

timdonovan · Apr 26, 2022

Thanks @fabian, appreciated! Absolutely, it is a shortcoming in my knowledge of LXC's - from what I've read they are being positioned as a lightweight alternative to VM's and the word isolation appears multiple times in any article or definition of them. I guess here be dragons applies to LXC still! Especially with the more exotic options applied.

Thanks for the suggestion of things to check - it doesn't look like overcommitment of memory but I'll keep an eye on it. I've got docker running without FUSE now, so I can take this out of the equation.

Cheers!

fabian · Apr 26, 2022

it does provide some level of isolation (in the sense that users/groups, pids, mountpoints, networking are separate/namespaced from those of the host), but the host and the containers use a single instance of the kernel, which obviously means there are some avenues for DoS like things happening at least (and also the occasional container escape via bugs in the kernel). a VM uses isolation mechanisms provided by the CPU, which work on a much lower level.

alchemydc · Oct 22, 2022

timdonovan said:
I am running 2 unprivileged LXC containers on a proxmox-ve 7.1-1 (running kernel: 5.13.19-3-pve). Both lxc's are running docker. The only "exotic" about it is using Fuse for the docker filesystem. The have nesting,keyctl,fuse,mknod =1.

They run fine for hours, sometimes days, but after a while, the entire host goes "offline" (keep reading..):

Grey question mark appears on pve node/lxc's

Shell access (via web UI) and ssh is still available to the PVE host

Shell access (via web UI) and ssh is still available to the LXC containers

Eventually the node turns red (sometimes takes hours)

The system refuses to reboot or shutdown unless I use the forceful "magic SysRq sysproc" option

My guess is when I do a shutdown, pve gets stuck trying to do a clean shutdown of the lxc containers

I have tried killing the lxc processes and daemons but nothing happens

I have tried restarting the various pve daemons but it does not change host state

Eventually something times out (hours+) and PVE host reboots, returning the node state to green

While I'd love to find a solution to this, I am also perplexed as I thought the point of LXC was they are isolated from a host? Indeed the whole point of Proxmox and virtual machines and containers is guest isolation. If an unprivileged LXC container is managing to screw up an entire PVE host then it calls into question the entire nature of LXC's, surely?

I have left the node alone for the time being, so if any debug info is required please let me know, as I can still access everything via ssh.

Thank you!

Edit: I have attached dmesg log. I cannot attach journalctl because it's 1.1mb zipped and that is apparently too large a file.

Hi Tim,

Running into this issue here as well.

Hypervisor:

pve-manager/7.2-11/b76d3178
kernel Linux 5.15.53-1-pve #1 SMP PVE 5.15.53-1.
libzfs4linux 2.1.5-pve1

LXC:

debian11
- fuse=1
- nesting=1
- keyctl=1
root volume is a ZFS subvol
docker-ce 5:20.10.17~3-0~debian-bullseye
fuse-overlayfs 1.4.0-1

I see the same behaviour you do, in which the stack will run fine for some time (days, weeks even), but then the fused docker-LXC will hang in a way that crashes the PVE UI components of the hypervisor. However, I do note that other workloads on that hypervisor continue to function, as well as clustering capabilities of the hypervisor. I noted, like you did, that the hypervisor will not cleanly restart (presumably hanging trying to cleanly stop LXC processes).

Note also that doing a `kill -9 $(pidof lxc-start -n $CONTAINER_ID)` will kill the hung container, and the PVE UI immediately beings working again -- grey question mark goes away, etc.

Without getting into the religious questions about whether or not it's sane to run docker inside LXC with fuse on ZFS, I'd like to figure out what's going on here.

One thing I notice is that some of these 'exotic' workloads are fine, but others hang every so often. Will take a look at snapshots / replication as a probable culprit, but appreciate any hints / feedback from folks experiencing similar.

DC

timdonovan · Oct 28, 2022

I gave up running docker in LXC with ZFS. You end up compromising somewhere (stability, or losing pve snapshot/replication features). I'm still hopeful one day it will be solved. I've moved most of my apps running in VM docker containers to LXCs, by studying their dockerfiles, but for some it's just not easily possible.

I think my LXC lockup issue in this thread was to do with resource exhaustion. Now why that causes the PVE host to act the way it did I'm not sure...IMO it's problematic that an LXC can cause a PVE host to act so badly. But it is what it is...

qcasey · Apr 8, 2023

This has been happening to my server as well, complete lock ups and having to force the node to shutdown. It's always after doing a snapshot backup of an LXC with `fuse=1`. It's definitely possible the system's resources were limited at the time of the backup. With a couple of these in the backup log:

```
failed to open /var/lib/docker/fuse-overlayfs/b9578d3641af3c1df80254296225e96fcb20591f822d0efb473c857868b682a3/merged: Permission denied
```

I'll also be giving up on Docker in LXC. A few blog posts turned me on to the frankenstein it but it hasn't been very reliable.

prmxyuza · Apr 8, 2023

I have had Docker in an unprivileged LXC running for about a week now, haven't taken any backups yet but no issues so far.

Docker is using overlay2 and the LXC is on ZFS - recent updates in Docker (and or LXC/Kernel?) seem to have fixed the issue with Docker not using overlay2. (Did not need to use the ext4 or fuse workarounds)
Also have `keyctl=1` enabled on the LXC (Proxmox docs say it will break systemd-networkd but it was fixed in systemd around 5 years ago..)

Only issue I encountered was when trying to pull a docker image that had very high user id/group permissions on some files inside it which would fail in the unpriveleged container - got around this by just building the docker image in the LXC itself.

Host: Kernel 6.2.6-1-pve
LXC: Arch Linux, Docker version 23.0.2 build 569dd73db1, systemctl 253 (253.2-1-arch)
extract from `docker info`:

 Server Version: 23.0.2
 Storage Driver: overlay2
  Backing Filesystem: zfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: false
  userxattr: true

Are your versions up to date? Might be worth another shot to see if the overlay2 driver works for you now.

alchemydc · Jun 21, 2023

________ said:
I have had Docker in an unprivileged LXC running for about a week now, haven't taken any backups yet but no issues so far.

Docker is using overlay2 and the LXC is on ZFS - recent updates in Docker (and or LXC/Kernel?) seem to have fixed the issue with Docker not using overlay2. (Did not need to use the ext4 or fuse workarounds)
Also have `keyctl=1` enabled on the LXC (Proxmox docs say it will break systemd-networkd but it was fixed in systemd around 5 years ago..)

Only issue I encountered was when trying to pull a docker image that had very high user id/group permissions on some files inside it which would fail in the unpriveleged container - got around this by just building the docker image in the LXC itself.

Host: Kernel 6.2.6-1-pve
LXC: Arch Linux, Docker version 23.0.2 build 569dd73db1, systemctl 253 (253.2-1-arch)
extract from `docker info`:
Server Version: 23.0.2 Storage Driver: overlay2 Backing Filesystem: zfs Supports d_type: true Using metacopy: false Native Overlay Diff: false userxattr: true

Are your versions up to date? Might be worth another shot to see if the overlay2 driver works for you now.

Updated to the 6.x kernel and the latest Docker a few weeks ago, and have yet to run into any more lockups running Docker within LXC on ZFS yet.

Host: 6.2.11-2-pve
LXC: Debian 12, Docker Version: 5:24.0.2-1~debian.12~bookworm
Server Version: 24.0.2
Storage Driver: overlay2
Backing Filesystem: zfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: false
userxattr: true

Thanks for the suggestion!

ikus060 · Aug 25, 2023

I do have very similar issue with a container with periodic replication.

> Backing Filesystem: zfs

I see you are using ZFS for the storage backend ? I tough it was not supported.

Did you do anything special to make it ran ?

I'm currently using fuse-overlayfs for storage Driver.

ikus060 · Aug 26, 2023

I tried creating a new container with docker without using fuse-overlayfs and forcing overlay2 with zfs backend. The docker deamon start, but when trying to pull and image, I get an error in the host kernel:

Code:

upper fs does not support RENAME_WHITEOUT
fs on '/var/lib/docker/overlay2/l/2NZ4KT2DGDVPBTYUYUEGIAFJVC' does not support file handles, falling back to xino=off.

Code:

# docker pull docker.elastic.co/elasticsearch/elasticsearch-oss:7.10.2
7.10.2: Pulling from elasticsearch/elasticsearch-oss
ddf49b9115d7: Pull complete
a752d85b289a: Extracting [==================================================>]  25.12MB/25.12MB
57c9a166c575: Download complete
44fabf20c8a1: Download complete
45ea1d560ab5: Download complete
0dc15e54b214: Download complete
cf11b2a25e23: Download complete
3a66822889ec: Download complete
be7444f2e9d6: Download complete
failed to register layer: unlinkat /usr/lib/.build-id/29: invalid argument

Search

Search

Unprivileged LXC container eventually locks up PVE host processes

timdonovan

Active Member

Attachments

timdonovan

Active Member

ahriman

Member

timdonovan

Active Member

fabian

Proxmox Staff Member

timdonovan

Active Member

fabian

Proxmox Staff Member

alchemydc

New Member

timdonovan

Active Member

qcasey

New Member

prmxyuza

Active Member

alchemydc

New Member

ikus060

Member

ikus060

Member

We value your privacy