LXC containers failed to start randomly

raoulh

Active Member
Dec 22, 2015
10
2
43
Swiss
Hi,

I have a few servers with proxmox 4 (up to date) and a bunch of LXC running on them. Starting from 1 week I have problems with some LXC that do not start correctly anymore.

At regular interval all LXC are stopped for doing some maintenance, and then started again. This is done using a cron job every night that do:

Code:
pct stop 150
# some work on rootfs
pct start 150

For a reason starting a week ago randomly i have 2 or more LXC that do not restart anymore by saying in the cron log:
Code:
command 'lxc-start -n 150' failed: exit code 1
<root@pam> end task UPID:svr-linux2:0000599F:0355671C:56E89B37:vzstart:150:root@pam: command 'lxc-start -n 150' failed: exit code 1

I have to start the LXC manually after that, and it then works correctly.

After digging, it seems to come from the veth network that is not destroyed/created correctly:
Code:
─➤  tail /var/log/lxc/150.log
      lxc-start 1458083560.397 ERROR    lxc_conf - conf.c:instantiate_veth:2767 - failed to create veth pair (veth150i0 and veth1XGOX6): File exists
      lxc-start 1458083560.447 ERROR    lxc_conf - conf.c:lxc_create_network:3084 - failed to create netdev
      lxc-start 1458083560.447 ERROR    lxc_start - start.c:lxc_spawn:954 - failed to create the network
      lxc-start 1458083560.447 ERROR    lxc_start - start.c:__lxc_start:1211 - failed to spawn '150'
      lxc-start 1458083567.068 ERROR    lxc_start_ui - lxc_start.c:main:344 - The container failed to start.
      lxc-start 1458083567.068 ERROR    lxc_start_ui - lxc_start.c:main:346 - To get more details, run the container in foreground mode.
      lxc-start 1458083567.068 ERROR    lxc_start_ui - lxc_start.c:main:348 - Additional information can be obtained by setting the --logfile and --logpriority options.

This is my journald log file for the relevant LXC:
Code:
Mar 16 00:11:39 svr-linux1 systemd-timesyncd[910]: interval/delta/delay/jitter/drift 2048s/-0.017s/0.052s/0.011s/-16ppm (ignored)
Mar 16 00:12:39 svr-linux1 pct[27324]: <root@pam> starting task UPID:svr-linux1:00006ABD:035363C8:56E896E7:vzstart:150:root@pam:
Mar 16 00:12:39 svr-linux1 pct[27325]: starting CT 150: UPID:svr-linux1:00006ABD:035363C8:56E896E7:vzstart:150:root@pam:
Mar 16 00:12:40 svr-linux1 kernel: EXT4-fs (loop4): mounted filesystem with ordered data mode. Opts: (null)
Mar 16 00:12:40 svr-linux1 kernel: vmbr0: port 6(veth150i0) entered disabled state
Mar 16 00:12:40 svr-linux1 kernel: device veth150i0 left promiscuous mode
Mar 16 00:12:40 svr-linux1 kernel: vmbr0: port 6(veth150i0) entered disabled state
Mar 16 00:12:47 svr-linux1 pct[27325]: command 'lxc-start -n 150' failed: exit code 1
Mar 16 00:12:47 svr-linux1 pct[27324]: <root@pam> end task UPID:svr-linux1:00006ABD:035363C8:56E896E7:vzstart:150:root@pam: command 'lxc-start -n 150' failed: exit code 1

My question is, what is causing the lxc to not start? Is it a bug somewhere? Where to look?

Thanks,
Raoul
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!