[SOLVED] LXC Containers won't start after reboot

osmos

New Member
Sep 2, 2020
5
0
1
35
Hello,
so I had a problem with a container that wouldn't shut down so I killed the process so it would shut down since I didn't need it anymore. After that I destroyed the container and checked for updates. I updated the host system and since there was a kernel update I decided to shut down all of my containers and reboot.
After the reboot none of my containers would start again.
So I restored a backup which worked and the container started again now I rebooted again to test if the container would start after a reboot but again it didn't :(
I also tried restoring the container to a different ID so I would have one that would run and one with which I could experiment so Solve that issue this is where I got

Code:
TASK ERROR: unable to restore CT 106 - command 'tar xpf - --zstd --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' -C /var/lib/lxc/106/rootfs --skip-old-files --anchored --exclude './dev/*'' failed: received interrupt

But I don't know if that is related.

I can mount the containers with pct mount.

I tried "lxc-start -n 103 -F -l DEBUG --logfile=boot101.log --logpriority=9" where I get:

Code:
lxc-start: 103: sync.c: __sync_wait: 41 An error occurred in another process (expected sequence number 7)
lxc-start: 103: start.c: __lxc_start: 1950 Failed to spawn container "103"
lxc-start: 103: tools/lxc_start.c: main: 308 The container failed to start
lxc-start: 103: tools/lxc_start.c: main: 314 Additional information can be obtained by setting the --logfile and --logpriority options

And in the log:

Code:
lxc-start 103 20200902174105.292 ERROR    start - start.c:start:2042 - No such file or directory - Failed to exec "/sbin/init"
lxc-start 103 20200902174105.292 ERROR    sync - sync.c:__sync_wait:41 - An error occurred in another process (expected sequence number 7)
lxc-start 103 20200902174105.294 ERROR    start - start.c:__lxc_start:1950 - Failed to spawn container "103"
lxc-start 103 20200902174108.538 ERROR    lxc_start - tools/lxc_start.c:main:308 - The container failed to start
lxc-start 103 20200902174108.539 ERROR    lxc_start - tools/lxc_start.c:main:314 - Additional information can be obtained by setting the --logfile and --logpriority options
lxc-start 103 20200902174150.432 ERROR    start - start.c:start:2042 - No such file or directory - Failed to exec "/sbin/init"
lxc-start 103 20200902174150.432 ERROR    sync - sync.c:__sync_wait:41 - An error occurred in another process (expected sequence number 7)
lxc-start 103 20200902174150.432 ERROR    start - start.c:__lxc_start:1950 - Failed to spawn container "103"
lxc-start 103 20200902174153.157 ERROR    lxc_start - tools/lxc_start.c:main:308 - The container failed to start
lxc-start 103 20200902174153.157 ERROR    lxc_start - tools/lxc_start.c:main:314 - Additional information can be obtained by setting the --logfile and --logpriority options

For some context:

Code:
root@pve1:~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.60-1-pve)
pve-manager: 6.2-11 (running version: 6.2-11/22fb4983)
pve-kernel-5.4: 6.2-6
pve-kernel-helper: 6.2-6
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.60-1-pve: 5.4.60-1
pve-kernel-5.4.55-1-pve: 5.4.55-1
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ifupdown2: residual config
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-1
libpve-guest-common-perl: 3.1-2
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-6
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-11
pve-cluster: 6.1-8
pve-container: 3.1-13
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-3
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-13
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-14
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve1


I tried googeling for the issue but couldn't find a solution maybe someone could help.

Thank you :)
 

Attachments

Hi,

Could you please send full log output as attach and post config of the container as well.
 
Config of one Container the others look quite similar:

Code:
arch: amd64
cores: 1
hostname: OnlyOffice
memory: 1024
net0: name=eth0,bridge=vmbr0,gw=192.168.0.1,hwaddr=2E:8E:E6:0F:F0:78,ip=192.168.0.105/24,type=veth
onboot: 1
ostype: ubuntu
rootfs: PVE-RAID5:subvol-103-disk-0,size=40G
startup: order=2
swap: 1024

I sent the Logoutput of: "lxc-start -n 103 -F -l DEBUG --logfile=boot101.log --logpriority=9"
There wasn't more in the Log but I'll also send some other logs. I also couldn't find a log in /var/log/lxc for that container but I'll send the logs I found there as well.

I hope that there is something useful or maybe you could point me in the direction which specific log you need.

Thank you
 

Attachments

I don't know what exactly you mean but here is the output of that command:

Code:
PVE-RAID5/subvol-103-disk-0  2.56G  37.4G     2.56G  /PVE-RAID5/subvol-103-disk-0
 
Hi again,

Thank you for logs.

from 100.log i noted this two lines
Code:
lxc-start 100 20200902161841.696 NOTICE   start - start.c:start:2039 - Exec'ing "/sbin/init"
lxc-start 100 20200902161841.696 ERROR    start - start.c:start:2042 - No such file or directory - Failed to exec "/sbin/init"

Can you please check if the init exists?

Bash:
pct mount 100

ls /var/lib/lxc/100/rootfs/sbin/init
 
Sorry for the late reply, busy weekend.
that's exactly what's missing now there are two things that I question.

1. why would it be missing after a reboot of the host? Or how can I find out?
2. how can I restore that file?

Thank you for the help
 
Hi,

1. why would it be missing after a reboot of the host? Or how can I find out?
2. how can I restore that file?
actually that depending on whether it's an unprivileged container or privileged.

see if the file does exist - if yes you can try chroot inside or install the missing packages by using lxc-usernsexec.
 
Hi,
so I examined some more containers and the others seem to have no files left in them exept for the folder dev and proc :(

So a restore from a backup is probably the only solution that would work.
But I wan't to prevent that from happening again.
So my containers are privileged how would I go about finding out what went wrong there?

Thanks
 
So my containers are privileged how would I go about finding out what went wrong there?

Please do restore container make sure the restore has init file, then try to reboot the host and see if happen again.

journalctl/syslog always help