Slow boot times with lxc

azop · Jan 26, 2016

My LXC containers are taking 4+ minutes to boot. The console log shows things are booting fast then it stalls out at "Stopping populate /dev filesystem" for 3-4 minutes. ssh connections are refused until this finishes.

Any suggestions?

Code:

 * Stopping flush early job output to logs                                                                                                                       [ OK ]
* Stopping Mount network filesystems                                                                                                                            [ OK ]
* Starting Mount network filesystems                                                                                                                            [ OK ]
* Starting system logging daemon                                                                                                                                [ OK ]
* Stopping Mount network filesystems                                                                                                                            [ OK ]
* Starting configure virtual network devices                                                                                                                    [ OK ]
* Starting Bridge file events into upstart                                                                                                                      [ OK ]
* Starting Bridge socket events into upstart                                                                                                                    [ OK ]
* Stopping Populate /dev filesystem                                                                                                                             [ OK ]

azop · Jan 26, 2016

Appears to be apparmor related?

root@rovio:/var/log# tail -f /var/log/kern.log
Jan 26 10:14:17 rovio kernel: [1252484.880847] vmbr0: port 4(veth102i1) entered disabled state
Jan 26 10:14:17 rovio kernel: [1252485.069130] audit: type=1400 audit(1453821257.491:942): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-container-default" name="/" pid=14272 comm="mount" flags="ro, remount, noatime"
Jan 26 10:14:17 rovio kernel: [1252485.167294] vmbr0: port 4(veth102i1) entered disabled state
Jan 26 10:14:17 rovio kernel: [1252485.177961] vmbr2: port 1(veth102i0) entered disabled state
Jan 26 10:14:17 rovio kernel: [1252485.178427] vmbr0: port 4(veth102i1) entered disabled state
Jan 26 10:14:35 rovio kernel: [1252502.623272] device veth102i0 entered promiscuous mode
Jan 26 10:14:35 rovio kernel: [1252502.623354] vmbr2: port 1(veth102i0) entered forwarding state
Jan 26 10:14:35 rovio kernel: [1252502.623395] vmbr2: port 1(veth102i0) entered forwarding state
Jan 26 10:14:35 rovio kernel: [1252502.661676] vmbr2: port 1(veth102i0) entered disabled state
Jan 26 10:14:35 rovio kernel: [1252503.355106] audit: type=1400 audit(1453821275.779:943): apparmor="DENIED" operation="mount" info="failed type match" error=-13 profile="lxc-container-default" name="/sys/" pid=14839 comm="mount" flags="rw, nosuid, nodev, noexec, remount"

RobFantini · Jan 26, 2016

from cli start lxc in debug mode, there may be info to help fix the issue:

Code:

lxc-start -n  102  -F --logfile=lxc.log --logpriority=debug

azop · Jan 26, 2016

Thanks for the response, attached is the log

azop · Jan 26, 2016

I found out the delay for "Stopping Populate /dev filesystem" was caused by libpam-systemd:amd64, I removed that package and when rebooting it flies past "Stopping Populate /dev/ filesystem" then stalls out at "Starting Bridge file events into upstart".

gemini · Jul 25, 2017

azop said:
I found out the delay for "Stopping Populate /dev filesystem" was caused by libpam-systemd:amd64, I removed that package and when rebooting it flies past "Stopping Populate /dev/ filesystem" then stalls out at "Starting Bridge file events into upstart".

Was this ever resolved? I've always had slow boot times on Proxmox.

Where I've had slow boot times:
- Containers hosted in ZFS on machine with P6X58D-E mobo on PVE 4.4
- Containers hosted in Linux LVM on HP ProLiant DL360 G7 on PVE 5

FastLaneJB · Jul 25, 2017

gemini said:
Was this ever resolved? I've always had slow boot times on Proxmox.

Where I've had slow boot times:
- Containers hosted in ZFS on machine with P6X58D-E mobo on PVE 4.4
- Containers hosted in Linux LVM on HP ProLiant DL360 G7 on PVE 5

That's not normal, my containers probably take around 1 second to boot. It's amazingly quick compared to a VM and even those I don't consider to be slow.

Even doing a storage migration using PVE 5 with local ZFS storage between 2 nodes where it stops the container, replicates the changes since last sync and then fires the container up on the 2nd host I only miss around 2 ping packets from stop, replication and restart. At least on containers that don't write lots of data.

gemini · Jul 25, 2017

FastLaneJB said:
That's not normal, my containers probably take around 1 second to boot. It's amazingly quick compared to a VM and even those I don't consider to be slow.

Even doing a storage migration using PVE 5 with local ZFS storage between 2 nodes where it stops the container, replicates the changes since last sync and then fires the container up on the 2nd host I only miss around 2 ping packets from stop, replication and restart. At least on containers that don't write lots of data.

That's incredible. I really want to get to the bottom of this. I'll keep researching and post back in this thread when I fix it.

Here's the process, typically:
1. Click Start in WebUI
2. After 2-3 minutes from Start, the container IP is pingable (but no console access)
3. After 6+ minutes from Start, the console is accessible, and I can SSH to the container.

FastLaneJB · Jul 25, 2017

gemini said:
That's incredible. I really want to get to the bottom of this. I'll keep researching and post back in this thread when I fix it.

Here's the process, typically:
1. Click Start in WebUI
2. After 2-3 minutes from Start, the container IP is pingable (but no console access)
3. After 6+ minutes from Start, the console is accessible, and I can SSH to the container.

Ouch that's bad. I don't think there's anything special about my setup so think most would have performance similar to what I get.

Is this on a heavily loaded server and maybe has huge IO Wait or it takes this long after a fresh boot with nothing else running at all?

If it's shared storage. What about trying a container on local storage to see if there's a storage issue?

Have you run pveperf to see what kind of performance your getting on the host? Mainly around FSYNC.

gemini · Jul 25, 2017

It is on local storage. This is what I get while the container is starting. Also, it appears that the container is pingable as soon as it is started, but the console or SSH is not functional for several minutes:
root@gemini:/tmp# pveperf
CPU BOGOMIPS: 70557.96
REGEX/SECOND: 1634983
HD SIZE: 57.88 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 178.48
DNS EXT: 256.41 ms
DNS INT: 89.40 ms (will.mx)

joulester · Nov 29, 2017

Any solution to this problem?

gemini · Nov 29, 2017

joulester said:
Any solution to this problem?

I never found the solution. I actually ditched Proxmox partly because of this. Using ESXi now. It has problems too of course, but this was a show stopper.

joulester · Nov 29, 2017

To bad! I like it but it should not take 6 min to start an lxc. I hope someone can help me with this. I have the problem the first time I install Debian on an lxc.

fabian · Nov 30, 2017

usually issues like this are caused by broken network setups (e.g., waiting for DHCP on container start, but no DHCP server running on the network). just check your logs inside the container for clues on what is blocking startup

joulester · Nov 30, 2017

Solved it with using SLAAC for Ipv6 now it boots in under 10sek. Thanks for the help. @gemini I think proxmox is worth to go back to

jnecr · May 16, 2018

joulester said:
Solved it with using SLAAC for Ipv6 now it boots in under 10sek. Thanks for the help. @gemini I think proxmox is worth to go back to

Just wanted to reply that I've had this issue for well over a year and never found an answer. But, changing IPv6 for SLAAC also solved my problems. I guess when the network was starting it was stuck at DHCP for IPv6 on my network. I will have to read up on what SLAAC is and why it solved the problem...

The strange part is that when I first create and start a container it never has this issue, only when it starts for the second and subsequent times.

gemini · May 17, 2018

joulester said:
Solved it with using SLAAC for Ipv6 now it boots in under 10sek. Thanks for the help. @gemini I think proxmox is worth to go back to

Thanks for posting the solution! What a thread, eh?

I'm actually mid-migration from ESXi to Ubuntu 18.04 with LXD now. I started last weekend. I've had a lot of problems trying to dedicate one of the host's two physical NICs explicitly for usage by the LXC containers and not the host, but the new way Ubuntu 18.04 handles networking with Netplan has made this very hard to figure out, as I'd like to use a macvlan bridge for the reduced CPU usage compared to a linux bridge.There's just so little documentation.

Long story short, I was going to host Docker containers inside the LXC containers as LXC has a reputation of being easier to work with networking than Docker (I want each app that Docker hosts to have its own IP address). Now I think I'm just going to use Docker directly on the host. I don't know...still experimenting.

Of course you're wondering...why not Proxmox? Answer: I'm trying to learn more low-level stuff. I want to become better with scripting, Ansible, Python, automation, etc. and Proxmox does too much for me. It makes it too easy. I want to do it the hard way, 100% CLI-driven without a GUI crutch.

fabian · May 17, 2018

gemini said:
Of course you're wondering...why not Proxmox? Answer: I'm trying to learn more low-level stuff. I want to become better with scripting, Ansible, Python, automation, etc. and Proxmox does too much for me. It makes it too easy. I want to do it the hard way, 100% CLI-driven without a GUI crutch.

just so you no, you don't need the GUI for anything in PVE (everything the GUI offers and more is exposed over the CLI + pvesh + API)

kappclark · Oct 23, 2019

Late to the thread, but I was having the exact plm with very slow boot times with LXC -- took your advice and changed IPV6 setting from DHCP to SLAAC, and now the start plm has gone away --

Thank you for sharing your solution..

I wonder if there is any way to eliminate IPV6, as I am using this in a home network environment.

gemini · Oct 23, 2019

Another year, another hypervisor... I'm back on PVE! And I no longer have the LXC slow start issue. Didn't change any default IPv6 settings...I don't even use IPv6 at home. It's all IPv4, and my LXC containers come up fast whether they are static or DHCP. No problems at all.

I have one host on 5.4 and another host on 6.0 - no problems on either one.

Wish I knew why the problem was happening before or why it's not happening now - but ever since nuking ESXi/Ubuntu on these two machines and spinning PVE back up, I haven't had the issue.

Slow boot times with lxc

Well-Known Member

Well-Known Member

Famous Member

Well-Known Member

Attachments

Well-Known Member

Active Member

Well-Known Member

Active Member

Well-Known Member

Active Member

Member

Active Member

Member

Proxmox Staff Member

Member

New Member

Active Member

Proxmox Staff Member

Member

Active Member