Slow boot times with lxc

azop

Well-Known Member
Feb 6, 2012
44
1
48
My LXC containers are taking 4+ minutes to boot. The console log shows things are booting fast then it stalls out at "Stopping populate /dev filesystem" for 3-4 minutes. ssh connections are refused until this finishes.

Any suggestions?


Code:
 * Stopping flush early job output to logs                                                                                                                       [ OK ]
* Stopping Mount network filesystems                                                                                                                            [ OK ]
* Starting Mount network filesystems                                                                                                                            [ OK ]
* Starting system logging daemon                                                                                                                                [ OK ]
* Stopping Mount network filesystems                                                                                                                            [ OK ]
* Starting configure virtual network devices                                                                                                                    [ OK ]
* Starting Bridge file events into upstart                                                                                                                      [ OK ]
* Starting Bridge socket events into upstart                                                                                                                    [ OK ]
* Stopping Populate /dev filesystem                                                                                                                             [ OK ]
 
Appears to be apparmor related?

root@rovio:/var/log# tail -f /var/log/kern.log
Jan 26 10:14:17 rovio kernel: [1252484.880847] vmbr0: port 4(veth102i1) entered disabled state
Jan 26 10:14:17 rovio kernel: [1252485.069130] audit: type=1400 audit(1453821257.491:942): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-container-default" name="/" pid=14272 comm="mount" flags="ro, remount, noatime"
Jan 26 10:14:17 rovio kernel: [1252485.167294] vmbr0: port 4(veth102i1) entered disabled state
Jan 26 10:14:17 rovio kernel: [1252485.177961] vmbr2: port 1(veth102i0) entered disabled state
Jan 26 10:14:17 rovio kernel: [1252485.178427] vmbr0: port 4(veth102i1) entered disabled state
Jan 26 10:14:35 rovio kernel: [1252502.623272] device veth102i0 entered promiscuous mode
Jan 26 10:14:35 rovio kernel: [1252502.623354] vmbr2: port 1(veth102i0) entered forwarding state
Jan 26 10:14:35 rovio kernel: [1252502.623395] vmbr2: port 1(veth102i0) entered forwarding state
Jan 26 10:14:35 rovio kernel: [1252502.661676] vmbr2: port 1(veth102i0) entered disabled state
Jan 26 10:14:35 rovio kernel: [1252503.355106] audit: type=1400 audit(1453821275.779:943): apparmor="DENIED" operation="mount" info="failed type match" error=-13 profile="lxc-container-default" name="/sys/" pid=14839 comm="mount" flags="rw, nosuid, nodev, noexec, remount"
 
from cli start lxc in debug mode, there may be info to help fix the issue:
Code:
lxc-start -n  102  -F --logfile=lxc.log --logpriority=debug
 
Thanks for the response, attached is the log
 

Attachments

  • lxc.txt
    76 KB · Views: 18
I found out the delay for "Stopping Populate /dev filesystem" was caused by libpam-systemd:amd64, I removed that package and when rebooting it flies past "Stopping Populate /dev/ filesystem" then stalls out at "Starting Bridge file events into upstart".
 
Last edited:
I found out the delay for "Stopping Populate /dev filesystem" was caused by libpam-systemd:amd64, I removed that package and when rebooting it flies past "Stopping Populate /dev/ filesystem" then stalls out at "Starting Bridge file events into upstart".
Was this ever resolved? I've always had slow boot times on Proxmox.

Where I've had slow boot times:
- Containers hosted in ZFS on machine with P6X58D-E mobo on PVE 4.4
- Containers hosted in Linux LVM on HP ProLiant DL360 G7 on PVE 5
 
Was this ever resolved? I've always had slow boot times on Proxmox.

Where I've had slow boot times:
- Containers hosted in ZFS on machine with P6X58D-E mobo on PVE 4.4
- Containers hosted in Linux LVM on HP ProLiant DL360 G7 on PVE 5

That's not normal, my containers probably take around 1 second to boot. It's amazingly quick compared to a VM and even those I don't consider to be slow.

Even doing a storage migration using PVE 5 with local ZFS storage between 2 nodes where it stops the container, replicates the changes since last sync and then fires the container up on the 2nd host I only miss around 2 ping packets from stop, replication and restart. At least on containers that don't write lots of data.
 
That's not normal, my containers probably take around 1 second to boot. It's amazingly quick compared to a VM and even those I don't consider to be slow.

Even doing a storage migration using PVE 5 with local ZFS storage between 2 nodes where it stops the container, replicates the changes since last sync and then fires the container up on the 2nd host I only miss around 2 ping packets from stop, replication and restart. At least on containers that don't write lots of data.
That's incredible. I really want to get to the bottom of this. I'll keep researching and post back in this thread when I fix it.

Here's the process, typically:
1. Click Start in WebUI
2. After 2-3 minutes from Start, the container IP is pingable (but no console access)
3. After 6+ minutes from Start, the console is accessible, and I can SSH to the container.
 
That's incredible. I really want to get to the bottom of this. I'll keep researching and post back in this thread when I fix it.

Here's the process, typically:
1. Click Start in WebUI
2. After 2-3 minutes from Start, the container IP is pingable (but no console access)
3. After 6+ minutes from Start, the console is accessible, and I can SSH to the container.

Ouch that's bad. I don't think there's anything special about my setup so think most would have performance similar to what I get.

Is this on a heavily loaded server and maybe has huge IO Wait or it takes this long after a fresh boot with nothing else running at all?

If it's shared storage. What about trying a container on local storage to see if there's a storage issue?

Have you run pveperf to see what kind of performance your getting on the host? Mainly around FSYNC.
 
It is on local storage. This is what I get while the container is starting. Also, it appears that the container is pingable as soon as it is started, but the console or SSH is not functional for several minutes:
root@gemini:/tmp# pveperf
CPU BOGOMIPS: 70557.96
REGEX/SECOND: 1634983
HD SIZE: 57.88 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 178.48
DNS EXT: 256.41 ms
DNS INT: 89.40 ms (will.mx)
 
To bad! I like it but it should not take 6 min to start an lxc. I hope someone can help me with this. I have the problem the first time I install Debian on an lxc.
 
  • Like
Reactions: AltairX
usually issues like this are caused by broken network setups (e.g., waiting for DHCP on container start, but no DHCP server running on the network). just check your logs inside the container for clues on what is blocking startup
 
  • Like
Reactions: teeeeee
Solved it with using SLAAC for Ipv6 now it boots in under 10sek. Thanks for the help. @gemini I think proxmox is worth to go back to ;)

Just wanted to reply that I've had this issue for well over a year and never found an answer. But, changing IPv6 for SLAAC also solved my problems. I guess when the network was starting it was stuck at DHCP for IPv6 on my network. I will have to read up on what SLAAC is and why it solved the problem...

The strange part is that when I first create and start a container it never has this issue, only when it starts for the second and subsequent times.
 
Solved it with using SLAAC for Ipv6 now it boots in under 10sek. Thanks for the help. @gemini I think proxmox is worth to go back to ;)
Thanks for posting the solution! What a thread, eh?

I'm actually mid-migration from ESXi to Ubuntu 18.04 with LXD now. I started last weekend. I've had a lot of problems trying to dedicate one of the host's two physical NICs explicitly for usage by the LXC containers and not the host, but the new way Ubuntu 18.04 handles networking with Netplan has made this very hard to figure out, as I'd like to use a macvlan bridge for the reduced CPU usage compared to a linux bridge.There's just so little documentation.

Long story short, I was going to host Docker containers inside the LXC containers as LXC has a reputation of being easier to work with networking than Docker (I want each app that Docker hosts to have its own IP address). Now I think I'm just going to use Docker directly on the host. I don't know...still experimenting.

Of course you're wondering...why not Proxmox? Answer: I'm trying to learn more low-level stuff. I want to become better with scripting, Ansible, Python, automation, etc. and Proxmox does too much for me. It makes it too easy. I want to do it the hard way, 100% CLI-driven without a GUI crutch.
 
Of course you're wondering...why not Proxmox? Answer: I'm trying to learn more low-level stuff. I want to become better with scripting, Ansible, Python, automation, etc. and Proxmox does too much for me. It makes it too easy. I want to do it the hard way, 100% CLI-driven without a GUI crutch.

just so you no, you don't need the GUI for anything in PVE (everything the GUI offers and more is exposed over the CLI + pvesh + API) ;)
 
Late to the thread, but I was having the exact plm with very slow boot times with LXC -- took your advice and changed IPV6 setting from DHCP to SLAAC, and now the start plm has gone away --

Thank you for sharing your solution..

I wonder if there is any way to eliminate IPV6, as I am using this in a home network environment.
 
Another year, another hypervisor... I'm back on PVE! And I no longer have the LXC slow start issue. Didn't change any default IPv6 settings...I don't even use IPv6 at home. It's all IPv4, and my LXC containers come up fast whether they are static or DHCP. No problems at all.

I have one host on 5.4 and another host on 6.0 - no problems on either one.

Wish I knew why the problem was happening before or why it's not happening now - but ever since nuking ESXi/Ubuntu on these two machines and spinning PVE back up, I haven't had the issue.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!