Hi,
I have a few servers with proxmox 4 (up to date) and a bunch of LXC running on them. Starting from 1 week I have problems with some LXC that do not start correctly anymore.
At regular interval all LXC are stopped for doing some maintenance, and then started again. This is done using a cron job every night that do:
For a reason starting a week ago randomly i have 2 or more LXC that do not restart anymore by saying in the cron log:
I have to start the LXC manually after that, and it then works correctly.
After digging, it seems to come from the veth network that is not destroyed/created correctly:
This is my journald log file for the relevant LXC:
My question is, what is causing the lxc to not start? Is it a bug somewhere? Where to look?
Thanks,
Raoul
I have a few servers with proxmox 4 (up to date) and a bunch of LXC running on them. Starting from 1 week I have problems with some LXC that do not start correctly anymore.
At regular interval all LXC are stopped for doing some maintenance, and then started again. This is done using a cron job every night that do:
Code:
pct stop 150
# some work on rootfs
pct start 150
For a reason starting a week ago randomly i have 2 or more LXC that do not restart anymore by saying in the cron log:
Code:
command 'lxc-start -n 150' failed: exit code 1
<root@pam> end task UPID:svr-linux2:0000599F:0355671C:56E89B37:vzstart:150:root@pam: command 'lxc-start -n 150' failed: exit code 1
I have to start the LXC manually after that, and it then works correctly.
After digging, it seems to come from the veth network that is not destroyed/created correctly:
Code:
─➤ tail /var/log/lxc/150.log
lxc-start 1458083560.397 ERROR lxc_conf - conf.c:instantiate_veth:2767 - failed to create veth pair (veth150i0 and veth1XGOX6): File exists
lxc-start 1458083560.447 ERROR lxc_conf - conf.c:lxc_create_network:3084 - failed to create netdev
lxc-start 1458083560.447 ERROR lxc_start - start.c:lxc_spawn:954 - failed to create the network
lxc-start 1458083560.447 ERROR lxc_start - start.c:__lxc_start:1211 - failed to spawn '150'
lxc-start 1458083567.068 ERROR lxc_start_ui - lxc_start.c:main:344 - The container failed to start.
lxc-start 1458083567.068 ERROR lxc_start_ui - lxc_start.c:main:346 - To get more details, run the container in foreground mode.
lxc-start 1458083567.068 ERROR lxc_start_ui - lxc_start.c:main:348 - Additional information can be obtained by setting the --logfile and --logpriority options.
This is my journald log file for the relevant LXC:
Code:
Mar 16 00:11:39 svr-linux1 systemd-timesyncd[910]: interval/delta/delay/jitter/drift 2048s/-0.017s/0.052s/0.011s/-16ppm (ignored)
Mar 16 00:12:39 svr-linux1 pct[27324]: <root@pam> starting task UPID:svr-linux1:00006ABD:035363C8:56E896E7:vzstart:150:root@pam:
Mar 16 00:12:39 svr-linux1 pct[27325]: starting CT 150: UPID:svr-linux1:00006ABD:035363C8:56E896E7:vzstart:150:root@pam:
Mar 16 00:12:40 svr-linux1 kernel: EXT4-fs (loop4): mounted filesystem with ordered data mode. Opts: (null)
Mar 16 00:12:40 svr-linux1 kernel: vmbr0: port 6(veth150i0) entered disabled state
Mar 16 00:12:40 svr-linux1 kernel: device veth150i0 left promiscuous mode
Mar 16 00:12:40 svr-linux1 kernel: vmbr0: port 6(veth150i0) entered disabled state
Mar 16 00:12:47 svr-linux1 pct[27325]: command 'lxc-start -n 150' failed: exit code 1
Mar 16 00:12:47 svr-linux1 pct[27324]: <root@pam> end task UPID:svr-linux1:00006ABD:035363C8:56E896E7:vzstart:150:root@pam: command 'lxc-start -n 150' failed: exit code 1
My question is, what is causing the lxc to not start? Is it a bug somewhere? Where to look?
Thanks,
Raoul