Another docker experience on Proxmox

Neuer_User

Active Member
Jan 5, 2016
22
5
43
56
I just want to describe my experience with docker in lxc container under Proxmox. Maybe someone has some improvement ideas or my description is helpful for him/her.

Goals:

1. Use the effortlessness of docker to install and maintain applications.
2. Monitor and manage centrally all running applications.
3. Separate data from application code, thus simplifying data backup.
4. Allow some basic fallback functionality in case some nodes are in maintenance.

Systems:
  • Most services run on a private home environment, consisting of one HP Microserver Gen 8 and a fallback machine. In addition some services run on small arm based nodes (RPi, NanoPis or RockPis).
  • The Microserver and the fallback machine run Proxmox 6.3 on ZFS. They run mostly classical LXC CTs (about 10), but will be migrated to docker containers, which I will keep in LXC CTs as a few separate nodes.
  • The small arm based edge nodes run docker host on light OSs without additional LXC layer.

Implementation on Proxmox:
  • All LXC machines run a stripped-down Debian buster. Docker-ce is installed via get-docker.sh. All docker hosts are running as root, but use userns-remap for their containers.
  • Proxmox loads additional kernel modules: aufs, ip_vs
Unprivileged LXC container: failed
  • Unprivileged LXC with nesting=1, keyctl=1: Unsuccessful. There are two main issues I couldn't resolve:
  • a.) only available storage driver is VFS (overlay not possible on ZFS, aufs not possible in non-init namespace). After download of any docker image, the storage controller complains of a permission issue during storage of the image layers. Why, I have no idea. I completely deleted the /var/lib/docker content. Doesn't help.
  • b.) I haven't managed to get /proc/sys/net/ipv4/vs/ folder populated in the lxc container. It is always empty. I tried to keep all capabilities, use unconfined AA, mounted /proc as rw, added /.dockerenv and everything I found on the net. No chance. Without this foder, there is no ingress network possible.
Privileged LXC container: works, but needs proc:rw
  • Privileged LXC container with nesting=1. Works perfectly so far, except there is an issue to setup iptables entries for the ingress network.
  • Working fix: adding "lxc.mount.auto: proc:rw" to the lxc conf file. I haven't found any other way yet.

Well, so far I am happy that it works. If anyone has a better working config, I would be interested.

Michael
 
  • Like
Reactions: Drallas
Try this:

1. /etc/sysctl.conf
net.ipv4.ip_forward=1

2. /etc/pve/lxc/XXX.conf
Code:
arch: amd64
cores: 4
features: keyctl=1,nesting=1
hostname: Docker
memory: 2048
net0: name=eth0,bridge=vmbr0,hwaddr=3A:F6:97:F2:5B:7B,ip=dhcp,ip6=auto,type=veth
onboot: 1
ostype: ubuntu
rootfs: SSD:vm-112-disk-0,size=64G
startup: order=12
swap: 512
unprivileged: 1
lxc.apparmor.raw: mount,
The important thing here is the last line: "lxc.apparmor.raw: mount,"

3. /etc/modules-load.d/modules.conf
overlay

This config works for me as unpriviliged container.
I have this container running in my lvm storage. So overlayfs works.
Didn't tryed to move the container to the ZFS Storage, i think it won't work then.

Ah the last thing is, my LXC container is based on ubuntu 20.04, not debian 10.
Debian 10 containers have some weird systemd services, that gives errors "systemctl --failed"
Ubuntu 20.04 containers work somehow better, without any errors.

Cheers :)
 
  • Like
Reactions: Drallas
Try this:

1. /etc/sysctl.conf
net.ipv4.ip_forward=1
This I have. I also checked the ip_forward setting in all the ingress namespace. All of there are 1 (except on my RPI, where I needed to set this manually strangely...)
2. /etc/pve/lxc/XXX.conf
Code:
arch: amd64
cores: 4
features: keyctl=1,nesting=1
hostname: Docker
memory: 2048
net0: name=eth0,bridge=vmbr0,hwaddr=3A:F6:97:F2:5B:7B,ip=dhcp,ip6=auto,type=veth
onboot: 1
ostype: ubuntu
rootfs: SSD:vm-112-disk-0,size=64G
startup: order=12
swap: 512
unprivileged: 1
lxc.apparmor.raw: mount,
The important thing here is the last line: "lxc.apparmor.raw: mount,"
My guess is that "lxc.mount.auto: proc:rw" and "lxc.apparmor.raw: mount," probably have a very similar effect, but I will give it a try on my unprivileged LXC container.
Do you also use "userns-remap" for docker?
3. /etc/modules-load.d/modules.conf
overlay
Overlay does not work on zfs as a backing storage. Unfortunately. I read they are working in the zfs-linux project to support overlay, though. Could be available on one of the next releases, which then, however, will need to find their way into proxmox first.
This config works for me as unpriviliged container.
I have this container running in my lvm storage. So overlayfs works.
Didn't tryed to move the container to the ZFS Storage, i think it won't work then.

Ah the last thing is, my LXC container is based on ubuntu 20.04, not debian 10.
Debian 10 containers have some weird systemd services, that gives errors "systemctl --failed"
Ubuntu 20.04 containers work somehow better, without any errors.

Cheers :)
Cheers :)
 
Nope, just a vanilla ubuntu 20.04 container with get-docker.sh like you do :)
The "userns-remap" option is in the lxc container (the docker host) in the /etc/docker/daemon.json file and does a complete user namespace remapping for all docker containers. So, in effect I have the following:
  • a privileged lxc container with a docker host running as root, but confined by apparmor
  • all docker containers running in user namespace (so, unprivileged), where root (id 0) is actually only a subuid of 165536 (in my case).

So, at least the container processes are isolated and without any privileges. Only, if there is a vulnerability in docker and a container process can escape the docker confinement and elevate to the docker host privileges, then they would be root in a privileged container with proc mounted rw. :rolleyes:
 
The "userns-remap" option is in the lxc container (the docker host) in the /etc/docker/daemon.json file and does a complete user namespace remapping for all docker containers. So, in effect I have the following:
  • a privileged lxc container with a docker host running as root, but confined by apparmor
  • all docker containers running in user namespace (so, unprivileged), where root (id 0) is actually only a subuid of 165536 (in my case).

So, at least the container processes are isolated and without any privileges. Only, if there is a vulnerability in docker and a container process can escape the docker confinement and elevate to the docker host privileges, then they would be root in a privileged container with proc mounted rw. :rolleyes:

Somehow this sounds the same, as if i would install docker on the proxmox host itself. And in addition, i could use then the docker zfs storage driver :D

But yes, we do it in the LXC container, to not mess around with the host itself, keeping the host clean and beeing able to make backups/snapshots the easy way xD

However, did you tryed the unprivileged container?
I mean if it works for me, it should work for you too xD
 
If anybody bump upon this thread looking to find solution for LXC + docker swarm + Traefik: Due to ingress entries not being populated in iptables of host LXC container, below is one of possible work around.
  1. Create one VM - make it docker swarm manager. - Install Traefik on this VM - Ingress network rules will added correctly to the iptables, and Traefik will be accessible form outside of Proxmox
  2. Rest of the node can be lxc containers, and join them to swarm created in step 1. Now any services running on other nodes including lxc host will be accessible through Traefik.
  3. With this solution everything works, including service discovery across all host part of the warm, including lxc container hosts.
  4. This will also have very less resource requirement, specially RAM, as we are using single VM and rest lxc containers.
With this method only all you service will only be available through the VM created in step 1. Like true swam ingress, where any service can be reached from any of the exposed host ip is not possible specially the LXC hosts.
 
Last edited:
  • Like
Reactions: Drallas
Try this:

1. /etc/sysctl.conf
net.ipv4.ip_forward=1

2. /etc/pve/lxc/XXX.conf
Code:
arch: amd64
cores: 4
features: keyctl=1,nesting=1
hostname: Docker
memory: 2048
net0: name=eth0,bridge=vmbr0,hwaddr=3A:F6:97:F2:5B:7B,ip=dhcp,ip6=auto,type=veth
onboot: 1
ostype: ubuntu
rootfs: SSD:vm-112-disk-0,size=64G
startup: order=12
swap: 512
unprivileged: 1
lxc.apparmor.raw: mount,
The important thing here is the last line: "lxc.apparmor.raw: mount,"

3. /etc/modules-load.d/modules.conf
overlay

This config works for me as unpriviliged container.
I have this container running in my lvm storage. So overlayfs works.
Didn't tryed to move the container to the ZFS Storage, i think it won't work then.

Ah the last thing is, my LXC container is based on ubuntu 20.04, not debian 10.
Debian 10 containers have some weird systemd services, that gives errors "systemctl --failed"
Ubuntu 20.04 containers work somehow better, without any errors.

Cheers :)

This doesn't work for me on Proxmox 8!

I tried it with unprivileged Ubuntu 20.04 and Debian 12 LXCs.

Settings that I applied:
Code:
1. nano /etc/sysctl.conf
# add line: 
net.ipv4.ip_forward=1

2. nano /etc/pve/lxc/XXX.conf
# add as last line: 
lxc.apparmor.raw: mount,

3. nano /etc/modules-load.d/modules.conf
# add line:
overlay

4. reboot the Proxmox hosts!

Deployed LXC via a Proxmox VE Helper Scripts installing Docker with get-docker.sh.
After that, initializing a 2 node Docker Swarm and installing the portainer-agent-stack.yml to test connectivity, but no response via http://192.168.1.xxx:9000.

Code:
curl http://192.168.1.168:9000
curl: (7) Failed to connect to 192.168.1.168 port 9000 after 107 ms: Couldn't connect to server

PS: I followed this by the letter.
 
Last edited:
That's the problem with Docker on LX(C) containers as pointed out numerous times. It will break from time-to-time. Spare your lifetime and just go with a real VM. This cannot be said enough, yet most of the people just don't care and do it anyway.
I'd love to go with a VM but implemented Proxmox/Ceph – Full Mesh HCI Cluster w/ Dynamic Routing with an ipv6 lo only cluster network. Can't find a way to connect to my cephfs storage besides bind mounts via LXC.
 
After struggling for some days, and since I really needed this to work, I managed to get Docker to work reliable in privileged Debian 12 LXC Containers on Proxmox 8.

This became a 'live document' and I decided to put in a GitHub Gist, keeping this page clean.

Any feedback how to improve upon this is welcome!


Screenshots
Screenshot 2023-09-18 at 11.44.15.png

Screenshot 2023-09-18 at 11.49.01.png

Screenshot 2023-09-18 at 11.44.40.png
 
Last edited:
  • Like
Reactions: dlasher
Thank you for the detailed writeup.

This doesn't survive reboot, so I created an oneshot systemd service to set this on reboot.
Adding it to rc.local and enabling the service might be easier and faster.

Now, Docker in LXC seems to behave as Docker in a VM, workloads are accessible via the IP address of any node in de Swarm Cluster.
... until the next PVE update that could potentially kill it. It happend before and it will probably happen again.

Also, you're unable to live-migrate the VM to another node in your cluster. This is also a big advantage of QEMU/KVM.
 
That's the problem with Docker on LX(C) containers as pointed out numerous times. It will break from time-to-time. Spare your lifetime and just go with a real VM. This cannot be said enough, yet most of the people just don't care and do it anyway.
Nothing should be as natural and easy as nesting containers to arbitrary depth, because computer science loves trees as much as accountants love budgets.

Eric Brewer, not only the man who proved the CAP theorem, but also the main promotor behind pushing Google's nesting container software out as Kubernetes to torpedo all the 'weaker' container solutions, has quite openly said and admitted both.

And I couldn't agree more, that (at least one layer of) IaaS abstraction container should manage local resources better and then (potentially several layers of) PaaS abstraction container should take care of scale-out, distributed name space and overlay network to make that happen.

Unfortunately, proper and globally accepted abstractions haven't evolved yet and all the current approaches work only well enough in exclusivity and single layer.

So yes, unfortunately, it's easier to run Docker on a VM and the pretense of bare metal underneath, than do what should be natural. Because Docker and other container runtimes assume just like the kernel underneath [each] believe they are God and can mess with network in any way they like to create the illusion their applications want.

And much of that is because the Unix creators realized far too late that networks were much more important than files, but by that time nobody heard them call "Plan 9"!
 
  • Like
Reactions: Drallas and LnxBil
Thank you for the detailed writeup.


Adding it to rc.local and enabling the service might be easier and faster.


... until the next PVE update that could potentially kill it. It happend before and it will probably happen again.

Also, you're unable to live-migrate the VM to another node in your cluster. This is also a big advantage of QEMU/KVM.

I tweaked the setup in the post a bit; I do prefer systemd over rc.local, now it seems to work reliable on my systems. It needs a delay for the run-docker-netns-ingress_sbox.mount service to become active after the docker.service is active. Without the (net.ipv4.ip_forward=1) value can't be set.

This workaround seems to be working since 2017, so I have to take my changes on it!

I have 4 Swarm node's, each on a separate node:
Screenshot 2023-09-19 at 09.57.37.png
Don't really need live migration for these LXC's, but in case needed they do migrate fine, only with some seconds of downtime.
 
Last edited:
Hey so I've tried everything on every post I can find, and i"m unable to get past the point of getting overlay/ingress networking functional.

references:
* https://gist.github.com/Drallas/e03eb5a4f68bb526f920a423455bc0c9
* https://forum.proxmox.com/threads/container-linux-in-a-kvm.43136/#post-207264
* https://stackoverflow.com/questions/74987323/docker-swarm-get-access-from-outside-network
* https://github.com/moby/moby/issues/38723

symptom:

everything appears to come up, but the dockerD doesn't appear to bind with the network interface for ingress connections.

Code:
tcp        0      0 :::8084                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::8081                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::8082                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::8088                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::7946                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::8000                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::5005                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::8754                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::9000                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::9443                 :::*                    LISTEN      560/dockerd

So that looks right -- most of those are ports from various overlay networks, for ingress traffic. Check.

However

Code:
docker-lxc-5:~# nmap -sT -p081,8082,8088,7946,8000,5005,8754,9000,9443 localhost
Starting Nmap 7.93 ( https://nmap.org ) at 2024-01-26 18:37 UTC
Nmap scan report for localhost (127.0.0.1)
Host is up (0.00042s latency).
Other addresses for localhost (not scanned): ::1

PORT     STATE  SERVICE
81/tcp   closed hosts2-ns
5005/tcp closed avt-profile-2
7946/tcp open   unknown
8000/tcp closed http-alt
8082/tcp closed blackice-alerts
8088/tcp closed radan-http
8754/tcp closed unknown
9000/tcp closed cslistener
9443/tcp closed tungsten-https

So the SWARM communication port is working, but the other ports are all ACTIVELY refused on all interfaces, loopback, primary ethernet, etc. Some how the ingress isn't being bound.

I did notice there are multiple entries in netns, so I even tried setting them ALL

Code:
find /run/docker/netns/ -type f | xargs -i%x nsenter --net="%x" sysctl -w net.ipv4.ip_forward=1

still no dice. This appears to be the issue I'm running into : https://github.com/portainer/portainer/issues/7736

help?
 
Last edited:
Did you try https://gist.github.com/Drallas/e03eb5a4f68bb526f920a423455bc0c9
from scratch? Set it up twice like that before, without any issue!

Hey so I've tried everything on every post I can find, and i"m unable to get past the point of getting overlay/ingress networking functional.

references:
* https://gist.github.com/Drallas/e03eb5a4f68bb526f920a423455bc0c9
* https://forum.proxmox.com/threads/container-linux-in-a-kvm.43136/#post-207264
* https://stackoverflow.com/questions/74987323/docker-swarm-get-access-from-outside-network

symptom:

everything appears to come up, but the dockerD doesn't appear to bind with the network interface for ingress connections.

Code:
tcp        0      0 :::8084                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::8081                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::8082                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::8088                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::7946                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::8000                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::5005                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::8754                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::9000                 :::*                    LISTEN      560/dockerd
tcp        0      0 :::9443                 :::*                    LISTEN      560/dockerd

So that looks right -- most of those are ports from various overlay networks, for ingress traffic. Check.

However

Code:
docker-lxc-5:~# nmap -sT -p081,8082,8088,7946,8000,5005,8754,9000,9443 localhost
Starting Nmap 7.93 ( https://nmap.org ) at 2024-01-26 18:37 UTC
Nmap scan report for localhost (127.0.0.1)
Host is up (0.00042s latency).
Other addresses for localhost (not scanned): ::1

PORT     STATE  SERVICE
81/tcp   closed hosts2-ns
5005/tcp closed avt-profile-2
7946/tcp open   unknown
8000/tcp closed http-alt
8082/tcp closed blackice-alerts
8088/tcp closed radan-http
8754/tcp closed unknown
9000/tcp closed cslistener
9443/tcp closed tungsten-https

So the SWARM communication port is working, but the other ports are all ACTIVELY refused on all interfaces, loopback, primary ethernet, etc. Some how the ingress isn't being bound.

I did notice there are multiple entries in netns, so I even tried setting them ALL

Code:
find /run/docker/netns/ -type f | xargs -i%x nsenter --net="%x" sysctl -w net.ipv4.ip_forward=1

still no dice. This appears to be the issue I'm running into : https://github.com/portainer/portainer/issues/7736

help?
 
.
Did you try https://gist.github.com/Drallas/e03eb5a4f68bb526f920a423455bc0c9
from scratch? Set it up twice like that before, without any issue!
yes, repeatedly. :( Your guide was the closest I got to success, thank you. (in hindsight, I missed that you had to use PRIV containers as well)

In addition I set up alpine, ubuntu, debian, etc, all from scratch, tried the same steps, no love.

HOWEVER, I just repeated the exercise, but made the container priviledged, and.....
Code:
arch: amd64
cores: 4
cpulimit: 8
hostname: docker-test-5
memory: 4096
nameserver: 10.4.10.11 10.4.10.12 10.4.10.13 10.4.10.14 10.4.10.15 10.4.10.16 10.4.10.17 10.4.10.18
net0: name=eth10,bridge=vmbr11,gw=10.4.10.1,hwaddr=BC:24:11:D1:60:48,ip=10.4.10.85/24,mtu=1500,tag=10,type=veth
ostype: alpine
rootfs: RP32:vm-9105-disk-0,mountoptions=lazytime;noatime;nosuid,size=10G
searchdomain: docker
swap: 128

(( as an aside, worth noting I'm using CEPH as the backend file system))

bingo, ingress can bind.

So there's still something not being enabled/granted in UNPRIV mode that no "features" are fixing. :(


Now that the container is PRIV, the bind happens, and we're listening!!

Code:
docker-test-5:~# /etc/local.d/fix.docker.start
net.ipv4.ip_forward = 1
net.ipv4.ip_forward = 1
net.ipv4.ip_forward = 1
net.ipv4.ip_forward = 1
docker-test-5:~# nmap -sT -p081,8082,8088,7946,8000,5005,8754,9000,9443 localhost
Starting Nmap 7.93 ( https://nmap.org ) at 2024-01-26 19:06 UTC
Nmap scan report for localhost (127.0.0.1)
Host is up (0.0012s latency).
Other addresses for localhost (not scanned): ::1

PORT     STATE  SERVICE
81/tcp   closed hosts2-ns
5005/tcp open   avt-profile-2
7946/tcp open   unknown
8000/tcp open   http-alt
8082/tcp open   blackice-alerts
8088/tcp open   radan-http
8754/tcp open   unknown
9000/tcp open   cslistener
9443/tcp open   tungsten-https

Because I forgot to mention it:

Code:
pve-manager/8.1.3/b46aac3b42da5d15 (running kernel: 6.2.16-20-pve)
 
Last edited:
  • Like
Reactions: Drallas
I did find a more elegant way to set the forwarding, it's absolutely still needed to function -- using this as "/etc/local.d/fix.ingress.start"

Code:
#!/bin/bash
for lp in {1..60};do
        if exists=$(test -f /run/docker/netns/ingress_sbox)
        then
                nsenter --net=/run/docker/netns/ingress_sbox sysctl -w net.ipv4.ip_forward=1
                exit
        else
                echo "waiting $lp/60 - ingress_sbox does not exist"
                sleep 1
        fi
done


EDIT: didn't realize local.d scripts hang boot - had to give it an escape after 60 seconds - needs BASH not ASH
 
Last edited:
  • Like
Reactions: Drallas
And then VFS bit me, and I had 16G in /var/lib/docker/vfs/

* https://c-goes.github.io/posts/proxmox-lxc-docker-fuse-overlayfs/
* https://github.com/containers/fuse-overlayfs/releases


Code:
ON PROXMOX HOST
pmx1:~# apt -y install fuse-overlayfs

and then add the "FUSE" capability to the container
Code:
IN LXC CONFIG ON PROXMOX HOST
features: fuse=1,nesting=1

Code:
IN DOCKER CONTAINER
wget https://github.com/containers/fuse-overlayfs/releases/download/v1.13/fuse-overlayfs-x86_64 && chmod +x fuse-overlayfs-x86_64 && cp fuse-overlayfs-x86_64 /usr/local/bin/fuse-overlayfs && reboot


Check to make sure it's changed over to the new driver
Code:
IN DOCKER CONTAINER
docker-lxc-1:~# docker info | grep -i storage
 Storage Driver: fuse-overlayfs



Now to clean up the /var/lib/docker/vfs folders, but all the containers are running, the ingress networking is working, and until the next issue....
 
Last edited:
ok, so with that extra work, I appear to have working the following:

docker-swarm inside Alpine 3.18 within PRIV LXC containers on Proxmox 8.1.x, with CEPH as the backing file system.

This allows me to bind-mount a /CEPH folder in each docker-host, and then use bind mounts in the YAML files to have persistent storage.

LXC config
Code:
mp0: /mnt/pve/cephfs/STORAGE/docker,mp=/ceph,shared=1

docker-compose.yaml
Code:
    volumes:
      - type: bind
        source: /ceph/data/adsb/tar1090/heatmap
        target: /var/globe_history
      - type: bind
        source: /ceph/data/adsb/tar1090/timelapse
        target: /var/timelapse1090

This reduced the interfaces required in the docker HOST, no separate interfaces for storage, no NFS/CEPH/SMB mounts, just a bind from the host.


(I had it working just fine with KVM, but being able to move both the filesystem layer and the networking layer to proxmox, AND not have the KVM-full-VM-overhead was the demon I was chasing. Thanks to all the docs I referenced, it appears none of them were a complete solution yet)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!