[SOLVED] Since kernel 4.15 startproblems with LXC containers

fireon

Distinguished Member
Oct 25, 2010
4,484
466
153
Austria/Graz
deepdoc.at
Hello,

pve-manager/5.2-1/0fcd7879 (running kernel: 4.15.17-1-pve)

since kernel 4.15 there are problems starting containers. Not at boot, but if you stop some container, sporadicly they can't start any more without reboot. Webinterface didn't responce on the affected container.

Processes:
Code:
26026 ?        Ds     0:00 [lxc monitor] /var/lib/lxc 104
26365 ?        S      0:00 lxc-info -n 104 -p
26366 ?        S      0:00 lxc-info -n 104 -p
30070 ?        S      0:00 lxc-info -n 104 -p
The log shows:
Code:
May 19 22:15:47 virtu01 pvedaemon[7984]: <root@pam> starting task UPID:virtu01:000044D7:01A8C6BD:5B0085F3:vzstart:104:root@pam:
May 19 22:15:47 virtu01 pvedaemon[17623]: starting CT 104: UPID:virtu01:000044D7:01A8C6BD:5B0085F3:vzstart:104:root@pam:
May 19 22:15:47 virtu01 systemd[1]: Starting PVE LXC Container: 104...
May 19 22:15:47 virtu01 zed[17701]: eid=1248 class=history_event pool_guid=0x8BF9CFD7A2BDFFE2
May 19 22:15:47 virtu01 lxc-start[17626]: lxc-start: 104: lxccontainer.c: wait_on_daemonized_start: 824 Received container state "STOPPING" instead of "RUNNING"
May 19 22:15:47 virtu01 lxc-start[17626]: The container failed to start.
May 19 22:15:47 virtu01 lxc-start[17626]: To get more details, run the container in foreground mode.
May 19 22:15:47 virtu01 lxc-start[17626]: Additional information can be obtained by setting the --logfile and --logpriority options.
May 19 22:15:47 virtu01 systemd[1]: pve-container@104.service: Control process exited, code=exited status=1
May 19 22:15:47 virtu01 systemd[1]: pve-container@104.service: Killing process 17628 (lxc-start) with signal SIGKILL.
May 19 22:15:47 virtu01 pvedaemon[6465]: unable to get PID for CT 104 (not running?)
May 19 22:15:47 virtu01 systemd[1]: pve-container@104.service: Killing process 17778 (sh) with signal SIGKILL.
May 19 22:15:47 virtu01 systemd[1]: Failed to start PVE LXC Container: 104.
May 19 22:15:47 virtu01 systemd[1]: pve-container@104.service: Unit entered failed state.
May 19 22:15:47 virtu01 systemd[1]: pve-container@104.service: Failed with result 'exit-code'.
May 19 22:15:47 virtu01 pvedaemon[17623]: command 'systemctl start pve-container@104' failed: exit code 1
May 19 22:15:47 virtu01 pvedaemon[7984]: <root@pam> end task UPID:virtu01:000044D7:01A8C6BD:5B0085F3:vzstart:104:root@pam: command 'systemctl start pve-container@104' failed: exit code 1
I can kill the processes, but that didn't help. The affected container didn't start. On older kernel 4.13, 4.10, i never had the problem.
Before pve 5.2 i have done a downgrade to kernel 4.10, so i hoped on enterpriserepo this issue is fixed.
So what is to do for debug? :)
 
Have some debug:
Code:
xc-start 104 20180519204046.524 ERROR    lxc_network - network.c:instantiate_veth:130 - Failed to create veth pair "veth104i0" and "vethDQG92W": File exists
lxc-start 104 20180519204046.524 ERROR    lxc_network - network.c:lxc_create_network_priv:2441 - Failed to create network device
lxc-start 104 20180519204046.524 ERROR    lxc_start - start.c:lxc_spawn:1545 - Failed to create the network
lxc-start 104 20180519204046.524 ERROR    lxc_start - start.c:__lxc_start:1883 - Failed to spawn container "104"
Look like an problem with the network.

And another problem:
Code:
ls /sys/fs/cgroup/systemd/lxc
104/  104-1/    cgroup.clone_children  cgroup.procs  notify_on_release  tasks
So the container ID is more then once there.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!