Docker on LXC slow at startup

hugolet

Member
Feb 10, 2023
6
2
8
Hi everyone !

I have some trouble at start up with docker running inside LXC. LXC boot "correctly" but docker inside take 3-5 min to lunch...
It looks like similar to issues : here and here maybe there is other threads I hvan't seen...

So it's clearly seems to be caused by broken network setups (e.g., waiting for DHCP on container start...)
But none of this post solved my issue. My cluster network configuration i a bit tricky (see below)
I have tried :
- rebooting server / update & upgrade / fix missing....
- to select SLAC mode on IPV6
- DHCP for IPV4 and fix IPV4 and fix IPV4/geteway
- Add new network device (net0 on vmbr0 net1 on vmbr1 see my config below)
- different storage location and type (ZFS / Directory / local / shared / HDD & SSD...)
- changing nesting 0/1 & Keyctl 0/1

At the end, it works, but i realy don't want to wait 5min each time a CT is migrated ou rebooted....any suggestion ?


proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph-fuse: 15.2.17-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-4
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1

auto lo
iface lo inet loopback

iface eno1 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.1.24/24
gateway 192.168.1.254
bridge-ports eno1
bridge-stp off
bridge-fd 0
#Interface proxmox et VM/CT en local

iface enp4s0 inet manual

auto vmbr1
iface vmbr1 inet static
address 10.10.10.1/24
bridge-ports none
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
#Interface sécurisée et propagée pour CT/VM

post-up echo 1 > /proc/sys/net/ipv4/ip_forward
post-up iptables -t nat -A POSTROUTING -s '10.10.10.0/24' -o vmbr0 -j MASQUERADE
post-down iptables -t nat -D POSTROUTING -s '10.10.10.0/24' -o vmbr0 -j MASQUERADE

auto eno1.1819
iface eno1.1819 inet static
address XX.XX.XX.XX/24
#tailscale

Exemple with LXC 104 :
CT template: debian-11-turnkey-core_17.1-1_amd64.tar.gz
Docker + Portainer
Bash:
apt-get update
apt-get -y upgrade
curl -fsSL https://get.docker.com -o get-docker.sh
sh get-docker.sh


docker run -d -p 8000:8000 -p 9443:9443 --name portainer --restart=always -v /var/run/docker.sock:/var/run/docker.sock -v portainer_data:/data portainer/portainer-ce:latest
Adding vmbr1

lxc.cgroup.relative = 0
lxc.cgroup.dir.monitor = lxc.monitor/104
lxc.cgroup.dir.container = lxc/104
lxc.cgroup.dir.container.inner = ns
lxc.arch = amd64
lxc.include = /usr/share/lxc/config/debian.common.conf
lxc.include = /usr/share/lxc/config/debian.userns.conf
lxc.seccomp.profile = /var/lib/lxc/104/rules.seccomp
lxc.apparmor.profile = generated
lxc.apparmor.allow_nesting = 1
lxc.mount.auto = sys:mixed
lxc.monitor.unshare = 1
lxc.idmap = u 0 100000 65536
lxc.idmap = g 0 100000 65536
lxc.tty.max = 2
lxc.environment = TERM=linux
lxc.uts.name = testdocker
lxc.cgroup2.memory.max = 1073741824
lxc.cgroup2.memory.swap.max = 536870912
lxc.rootfs.path = /var/lib/lxc/104/rootfs
lxc.net.0.type = veth
lxc.net.0.veth.pair = veth104i0
lxc.net.0.hwaddr = C2:A3:B4:0D:2C:4F
lxc.net.0.name = eth0
lxc.net.0.script.up = /usr/share/lxc/lxcnetaddbr
lxc.net.1.type = veth
lxc.net.1.veth.pair = veth104i1
lxc.net.1.hwaddr = 0A:DE:5C:B1:72:1A
lxc.net.1.name = eth1
lxc.net.1.script.up = /usr/share/lxc/lxcnetaddbr
lxc.cgroup2.cpuset.cpus = 0,2
systemd 247.3-7+deb11u1 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +ZSTD +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified)
Detected virtualization lxc.
Detected architecture x86-64.

Welcome to Debian GNU/Linux 11 (bullseye)!

Set hostname to <testdocker>.
Queued start job for default target Graphical Interface.
[ OK ] Created slice system-container\x2dgetty.slice.
[ OK ] Created slice system-modprobe.slice.
[ OK ] Created slice system-postfix.slice.
[ OK ] Created slice system-stunnel4.slice.
[ OK ] Created slice User and Session Slice.
[ OK ] Started Dispatch Password Requests to Console Directory Watch.
[ OK ] Started Forward Password Requests to Wall Directory Watch.
[ OK ] Reached target Local Encrypted Volumes.
[ OK ] Reached target Remote Encrypted Volumes.
[ OK ] Reached target Remote File Systems.
[ OK ] Reached target Slices.
[ OK ] Reached target TLS tunnels for network services - per-config-file target.
[ OK ] Reached target Swap.
[ OK ] Listening on Device-mapper event daemon FIFOs.
[ OK ] Listening on LVM2 poll daemon socket.
[ OK ] Listening on Syslog Socket.
[ OK ] Listening on initctl Compatibility Named Pipe.
systemd-journald-audit.socket: Failed to create listening socket (audit 1): Operation not permitted
systemd-journald-audit.socket: Failed to listen on sockets: Operation not permitted
systemd-journald-audit.socket: Failed with result 'resources'.
[FAILED] Failed to listen on Journal Audit Socket.
See 'systemctl status systemd-journald-audit.socket' for details.
[ OK ] Listening on Journal Socket (/dev/log).
[ OK ] Listening on Journal Socket.
[ OK ] Listening on Network Service Netlink Socket.
Mounting POSIX Message Queue File System...
sys-kernel-debug.mount: Failed to check directory /sys/kernel/debug: Permission denied
Mounting Kernel Debug File System...
[ OK ] Finished Availability of block devices.
Starting Wait for network to be configured by ifupdown...
Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling...
Starting Load Kernel Module configfs...
Starting Load Kernel Module drm...
Starting Load Kernel Module fuse...
[ OK ] Started Nameserver information manager.
[ OK ] Reached target Network (Pre).
Starting Journal Service...
Starting Load Kernel Modules...
Starting Remount Root and Kernel File Systems...
Starting Helper to synchronize boot up for ifupdown...
[ OK ] Mounted POSIX Message Queue File System.
sys-kernel-debug.mount: Mount process exited, code=exited, status=32/n/a
sys-kernel-debug.mount: Failed with result 'exit-code'.
[FAILED] Failed to mount Kernel Debug File System.
See 'systemctl status sys-kernel-debug.mount' for details.
modprobe@configfs.service: Succeeded.
[ OK ] Finished Load Kernel Module configfs.
modprobe@drm.service: Succeeded.
[ OK ] Finished Load Kernel Module drm.
Mounting Kernel Configuration File System...
sys-kernel-config.mount: Mount process exited, code=exited, status=32/n/a
sys-kernel-config.mount: Failed with result 'exit-code'.
[FAILED] Failed to mount Kernel Configuration File System.
See 'systemctl status sys-kernel-config.mount' for details.
modprobe@fuse.service: Succeeded.
[ OK ] Finished Load Kernel Module fuse.
[ OK ] Finished Helper to synchronize boot up for ifupdown.
[ OK ] Finished Remount Root and Kernel File Systems.
Starting Create System Users...
[ OK ] Finished Create System Users.
Starting Create Static Device Nodes in /dev...
[ OK ] Finished Load Kernel Modules.
Starting Apply Kernel Variables...
[ OK ] Finished Create Static Device Nodes in /dev.
[ OK ] Finished Apply Kernel Variables.
Starting Network Service...
[ OK ] Finished Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling.
[ OK ] Reached target Local File Systems (Pre).
[ OK ] Reached target Local File Systems.
Starting Raise network interfaces...
[ OK ] Started Journal Service.
Starting Flush Journal to Persistent Storage...
[ OK ] Started Network Service.
Starting Wait for Network to be Configured...
[ OK ] Finished Wait for Network to be Configured.
[FAILED] Failed to start Raise network interfaces.
See 'systemctl status networking.service' for details.
[ OK ] Finished Flush Journal to Persistent Storage.
Starting Create Volatile Files and Directories...
[ OK ] Finished Create Volatile Files and Directories.
Starting Network Name Resolution...
Starting Update UTMP about System Boot/Shutdown...
[ OK ] Finished Update UTMP about System Boot/Shutdown.
[ OK ] Reached target System Initialization.
[ OK ] Started resolvconf-pull-resolved.path.
[ OK ] Started Daily apt download activities.
[ OK ] Started Daily apt upgrade and clean activities.
[ OK ] Started Periodic ext4 Online Metadata Check for All Filesystems.
[ OK ] Started Daily autocommit of changes in /etc directory.
[ OK ] Started Daily rotation of log files.
[ OK ] Started Daily man-db regeneration.
[ OK ] Started Daily Cleanup of Temporary Directories.
[ OK ] Reached target Paths.
[ OK ] Reached target Timers.
[ OK ] Listening on D-Bus System Message Bus Socket.
Starting Docker Socket for the API.
[ OK ] Listening on Docker Socket for the API.
[ OK ] Reached target Sockets.
[ OK ] Reached target Basic System.
[ OK ] Started Regular background program processing daemon.
[ OK ] Started D-Bus System Message Bus.
Starting Remove Stale Online ext4 Metadata Check Snapshots...
Starting inithooks-lxc: firstboot and everyboot initialization scripts (lxc)...
Starting System Logging Service...
Starting User Login Management...
[ OK ] Started System Logging Service.
[ OK ] Finished inithooks-lxc: firstboot and everyboot initialization scripts (lxc).
[ OK ] Finished Remove Stale Online ext4 Metadata Check Snapshots.
[ OK ] Started User Login Management.
[ OK ] Started Network Name Resolution.
[ OK ] Reached target Network.
[ OK ] Reached target Host and Network Name Lookups.
Starting containerd container runtime...
[ OK ] Started Fail2Ban Service.
Starting resolvconf-pull-resolved.service...
Starting OpenBSD Secure Shell server...
Starting Universal SSL tunnel for network daemons (shellinabox)...
Starting Universal SSL tunnel for network daemons (webmin)...
Starting Permit User Sessions...
[ OK ] Finished Permit User Sessions.
[ OK ] Started Console Getty.
[ OK ] Started Container Getty on /dev/tty1.
[ OK ] Started Container Getty on /dev/tty2.
[ OK ] Reached target Login Prompts.
[ OK ] Started Universal SSL tunnel for network daemons (shellinabox).
[ OK ] Started OpenBSD Secure Shell server.
[ OK ] Finished resolvconf-pull-resolved.service.
[ OK ] Started Universal SSL tunnel for network daemons (webmin).
[ OK ] Started containerd container runtime.

Debian GNU/Linux 11 testdocker console

testdocker login:

On debug above it stop at "Started containerd runtime" and when the issue does not occurre it also show the docker container ID in started state after that .

for exemple on a brand new LXC that didn't have any network modification since his creation. (IPV4 on vmbr0 full dhcp) :

...
[ OK ] Started containerd container runtime.
Starting Docker Application Container Engine...
Starting resolvconf-pull-resolved.service...
[ OK ] Finished resolvconf-pull-resolved.service.
Starting resolvconf-pull-resolved.service...
[ OK ] Finished resolvconf-pull-resolved.service.
[ OK ] Started Postfix Mail Transport Agent (instance -).
Starting Postfix Mail Transport Agent...
[ OK ] Finished Postfix Mail Transport Agent.
[ OK ] Started libcontainer container 41fdb09bf174bdd859e43879b4f6fbdd5bbb02fdf751d7cc607d2532657b8d63.
[FAILED] Failed to start resolvconf-pull-resolved.service.
See 'systemctl status resolvconf-pull-resolved.service' for details.
[ OK ] Started Webmin server daemon.
[ OK ] Started Docker Application Container Engine.
[ OK ] Reached target Multi-User System.
[ OK ] Reached target Graphical Interface.
Starting Update UTMP about System Runlevel Changes...
[ OK ] Finished Update UTMP about System Runlevel Changes.

... See next post ...
 
Back on LXC 104 :

I log in an do a "docker ps" nothing show up until few minutes...

Before it start :

* docker.service - Docker Application Container Engine

Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)

Active: inactive (dead)

TriggeredBy: * docker.socket

Docs: https://docs.docker.com

After it start :

* docker.service - Docker Application Container Engine

Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)

Active: active (running) since Thu 2023-03-30 12:55:21 UTC; 37s ago

TriggeredBy: * docker.socket

Docs: https://docs.docker.com

Main PID: 1413 (dockerd)

Tasks: 34

Memory: 30.2M

CPU: 372ms




CGroup: /system.slice/docker.service

|-1413 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

|-2216 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 9443 -container-ip 172.17.0.2 -container-port 9443

|-2222 /usr/bin/docker-proxy -proto tcp -host-ip :: -host-port 9443 -container-ip 172.17.0.2 -container-port 9443

|-2236 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 8000 -container-ip 172.17.0.2 -container-port 8000

`-2242 /usr/bin/docker-proxy -proto tcp -host-ip :: -host-port 8000 -container-ip 172.17.0.2 -container-port 8000




Mar 30 12:55:20 testdocker dockerd[1413]: time="2023-03-30T12:55:20.722594866Z" level=info msg="Removing stale sandbox 5de69948d9258d07fd6d3e58c062ac1e73351554171bb547b520b7083d2e68e8 (848fa8a1d91d77bf41a883dfb>

Mar 30 12:55:20 testdocker dockerd[1413]: time="2023-03-30T12:55:20.746069501Z" level=warning msg="Error (Unable to complete atomic operation, key modified) deleting object [endpoint b175600f2e402aa3a2b9d096dd3>

Mar 30 12:55:20 testdocker dockerd[1413]: time="2023-03-30T12:55:20.791557019Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a pr>

Mar 30 12:55:21 testdocker dockerd[1413]: time="2023-03-30T12:55:21.147972259Z" level=info msg="Loading containers: done."

Mar 30 12:55:21 testdocker dockerd[1413]: time="2023-03-30T12:55:21.153302121Z" level=warning msg="Not using native diff for overlay2, this may cause degraded performance for building images: running in a user >

Mar 30 12:55:21 testdocker dockerd[1413]: time="2023-03-30T12:55:21.153456461Z" level=info msg="Docker daemon" commit=219f21b graphdriver=overlay2 version=23.0.2

Mar 30 12:55:21 testdocker dockerd[1413]: time="2023-03-30T12:55:21.153525399Z" level=info msg="Daemon has completed initialization"

Mar 30 12:55:21 testdocker dockerd[1413]: time="2023-03-30T12:55:21.175745820Z" level=info msg="[core] [Server #7] Server created" module=grpc

Mar 30 12:55:21 testdocker systemd[1]: Started Docker Application Container Engine.

Mar 30 12:55:21 testdocker dockerd[1413]: time="2023-03-30T12:55:21.186875318Z" level=info msg="API listen on /run/docker.sock"

I have googled the different issues from docker status above but nothing much...

Maybe some one here have an idea or a solution !
Please help !

Thanks !
 
Same issue here, and changing the Ipv6 of the LXC to Ipv6 didn´t fix it either

I "solved" my problem :

When I was deleting a network card on proxmox, the /etc/network/interfaces of the LXC did not follow. So the problem is not on proxmox side.

Try to Edit the /etc/network/interfaces of your LXC by removing the useless lines. reboot your LXC and docker services should start instantly.

I would like to find a way for the etc/network/interfaces to update automatically... but that work for me.
 
@hugolet you are a Star!! I just removed all the extra lines in /etc/network/interfaces and the issue is gone. This is clearly an issue in proxmox, not sure if it has been previously reported.
 
It is advisable to run Docker in a VM instead of a container. With containers, there is always the possibility for weird interactions, since Docker and LXC somewhat intersect in their feature sets/responsibilities.
 
  • Like
Reactions: Johannes S
I "solved" my problem :

When I was deleting a network card on proxmox, the /etc/network/interfaces of the LXC did not follow. So the problem is not on proxmox side.

Try to Edit the /etc/network/interfaces of your LXC by removing the useless lines. reboot your LXC and docker services should start instantly.

I would like to find a way for the etc/network/interfaces to update automatically... but that work for me.
Thank you habibi, I created the account just to comment this! <3
 
Man, I was all excited to finally "fix" this, but guess there must be alternate reasons for this long pause/timeout on docker start, as my interfaces is just the vanilla/correct contents for the LXC:

Code:
root@portainer:/etc/network# cat interfaces
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
        address 192.168.50.9/24
        gateway 192.168.50.1

root@portainer:/etc/network#

It happens on every LXC with docker installed that I have. I do believe most (all?) were originally installed via the PVE community script, so I'll dig more into when I have the chance, and will update here with what I find is the ultimate reason (for me).