[SOLVED] LXC Containers won't start after reboot

osmos

New Member
Sep 2, 2020
5
0
1
34
Hello,
so I had a problem with a container that wouldn't shut down so I killed the process so it would shut down since I didn't need it anymore. After that I destroyed the container and checked for updates. I updated the host system and since there was a kernel update I decided to shut down all of my containers and reboot.
After the reboot none of my containers would start again.
So I restored a backup which worked and the container started again now I rebooted again to test if the container would start after a reboot but again it didn't :(
I also tried restoring the container to a different ID so I would have one that would run and one with which I could experiment so Solve that issue this is where I got

Code:
TASK ERROR: unable to restore CT 106 - command 'tar xpf - --zstd --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*' '--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' -C /var/lib/lxc/106/rootfs --skip-old-files --anchored --exclude './dev/*'' failed: received interrupt

But I don't know if that is related.

I can mount the containers with pct mount.

I tried "lxc-start -n 103 -F -l DEBUG --logfile=boot101.log --logpriority=9" where I get:

Code:
lxc-start: 103: sync.c: __sync_wait: 41 An error occurred in another process (expected sequence number 7)
lxc-start: 103: start.c: __lxc_start: 1950 Failed to spawn container "103"
lxc-start: 103: tools/lxc_start.c: main: 308 The container failed to start
lxc-start: 103: tools/lxc_start.c: main: 314 Additional information can be obtained by setting the --logfile and --logpriority options

And in the log:

Code:
lxc-start 103 20200902174105.292 ERROR    start - start.c:start:2042 - No such file or directory - Failed to exec "/sbin/init"
lxc-start 103 20200902174105.292 ERROR    sync - sync.c:__sync_wait:41 - An error occurred in another process (expected sequence number 7)
lxc-start 103 20200902174105.294 ERROR    start - start.c:__lxc_start:1950 - Failed to spawn container "103"
lxc-start 103 20200902174108.538 ERROR    lxc_start - tools/lxc_start.c:main:308 - The container failed to start
lxc-start 103 20200902174108.539 ERROR    lxc_start - tools/lxc_start.c:main:314 - Additional information can be obtained by setting the --logfile and --logpriority options
lxc-start 103 20200902174150.432 ERROR    start - start.c:start:2042 - No such file or directory - Failed to exec "/sbin/init"
lxc-start 103 20200902174150.432 ERROR    sync - sync.c:__sync_wait:41 - An error occurred in another process (expected sequence number 7)
lxc-start 103 20200902174150.432 ERROR    start - start.c:__lxc_start:1950 - Failed to spawn container "103"
lxc-start 103 20200902174153.157 ERROR    lxc_start - tools/lxc_start.c:main:308 - The container failed to start
lxc-start 103 20200902174153.157 ERROR    lxc_start - tools/lxc_start.c:main:314 - Additional information can be obtained by setting the --logfile and --logpriority options

For some context:

Code:
root@pve1:~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.60-1-pve)
pve-manager: 6.2-11 (running version: 6.2-11/22fb4983)
pve-kernel-5.4: 6.2-6
pve-kernel-helper: 6.2-6
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.60-1-pve: 5.4.60-1
pve-kernel-5.4.55-1-pve: 5.4.55-1
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ifupdown2: residual config
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-1
libpve-guest-common-perl: 3.1-2
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-6
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-11
pve-cluster: 6.1-8
pve-container: 3.1-13
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-3
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-13
pve-xtermjs: 4.7.0-2
qemu-server: 6.2-14
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.4-pve1


I tried googeling for the issue but couldn't find a solution maybe someone could help.

Thank you :)
 

Attachments

  • boot103.log
    1.3 KB · Views: 12
Hi,

Could you please send full log output as attach and post config of the container as well.
 
Config of one Container the others look quite similar:

Code:
arch: amd64
cores: 1
hostname: OnlyOffice
memory: 1024
net0: name=eth0,bridge=vmbr0,gw=192.168.0.1,hwaddr=2E:8E:E6:0F:F0:78,ip=192.168.0.105/24,type=veth
onboot: 1
ostype: ubuntu
rootfs: PVE-RAID5:subvol-103-disk-0,size=40G
startup: order=2
swap: 1024

I sent the Logoutput of: "lxc-start -n 103 -F -l DEBUG --logfile=boot101.log --logpriority=9"
There wasn't more in the Log but I'll also send some other logs. I also couldn't find a log in /var/log/lxc for that container but I'll send the logs I found there as well.

I hope that there is something useful or maybe you could point me in the direction which specific log you need.

Thank you
 

Attachments

  • 100.log
    27.2 KB · Views: 4
  • 101.log
    27.2 KB · Views: 1
  • lxc-monitord.log
    5.4 KB · Views: 1
  • boot103.log
    1.3 KB · Views: 4
  • debug.1.log
    21 KB · Views: 2
I don't know what exactly you mean but here is the output of that command:

Code:
PVE-RAID5/subvol-103-disk-0  2.56G  37.4G     2.56G  /PVE-RAID5/subvol-103-disk-0
 
Hi again,

Thank you for logs.

from 100.log i noted this two lines
Code:
lxc-start 100 20200902161841.696 NOTICE   start - start.c:start:2039 - Exec'ing "/sbin/init"
lxc-start 100 20200902161841.696 ERROR    start - start.c:start:2042 - No such file or directory - Failed to exec "/sbin/init"

Can you please check if the init exists?

Bash:
pct mount 100

ls /var/lib/lxc/100/rootfs/sbin/init
 
Sorry for the late reply, busy weekend.
that's exactly what's missing now there are two things that I question.

1. why would it be missing after a reboot of the host? Or how can I find out?
2. how can I restore that file?

Thank you for the help
 
Hi,

1. why would it be missing after a reboot of the host? Or how can I find out?
2. how can I restore that file?
actually that depending on whether it's an unprivileged container or privileged.

see if the file does exist - if yes you can try chroot inside or install the missing packages by using lxc-usernsexec.
 
Hi,
so I examined some more containers and the others seem to have no files left in them exept for the folder dev and proc :(

So a restore from a backup is probably the only solution that would work.
But I wan't to prevent that from happening again.
So my containers are privileged how would I go about finding out what went wrong there?

Thanks
 
So my containers are privileged how would I go about finding out what went wrong there?

Please do restore container make sure the restore has init file, then try to reboot the host and see if happen again.

journalctl/syslog always help
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!