[SOLVED] LXC container won't start

BenDDD

Member
Nov 28, 2019
59
1
11
41
Hello,

An LXC container no longer starts. Here are the error messages:

pve-container@205.service - PVE LXC Container: 205
Loaded: loaded (/lib/systemd/system/pve-container@.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Fri 2020-08-28 09:30:40 CEST; 16min ago
Docs: man:lxc-start
man:lxc
man: pct
Process: 636 ExecStart=/usr/bin/lxc-start -n 205 (code=exited, status=1/FAILURE)

Aug 28 09:30:39 galaxie8 systemd[1]: Starting PVE LXC Container: 205...
Aug 28 09:30:40 galaxie8 lxc-start[636]: lxc-start: 205: lxccontainer.c: wait_on_daemonized_start: 865 No such file or directory - Failed to receive the container state
Aug 28 09:30:40 galaxie8 systemd[1]: pve-container@205.service: Control process exited, code=exited, status=1/FAILURE
Aug 28 09:30:40 galaxie8 lxc-start[636]: lxc-start: 205: tools/lxc_start.c: main: 329 The container failed to start
Aug 28 09:30:40 galaxie8 lxc-start[636]: lxc-start: 205: tools/lxc_start.c: main: 332 To get more details, run the container in foreground mode
Aug 28 09:30:40 galaxie8 lxc-start[636]: lxc-start: 205: tools/lxc_start.c: main: 335 Additional information can be obtained by setting the --logfile and --logpriority options
Aug 28 09:30:40 galaxie8 systemd[1]: pve-container@205.service: Failed with result 'exit-code'.
Aug 28 09:30:40 galaxie8 systemd[1]: Failed to start PVE LXC Container: 205.

lxc-start: 205: conf.c: run_buffer: 352 Script exited with status 2
lxc-start: 205: start.c: lxc_init: 897 Failed to run lxc.hook.pre-start for container "205"
lxc-start: 205: start.c: __lxc_start: 2032 Failed to initialize container "205"
Segmentation fault

I have seen several forum posts dealing with this subject with several suggested solutions but I don't want to try everything at the risk of making the problem worse.

Thank you in advance for your feedback.
 
Last edited:
Hi Moayad,

Thank you for your answer. Here are the requested returns:

#net0%3A name=eth0,bridge=vmbr1,firewall=1,gw=147.215.191.1,ip=147.215.150.43,tag=191,type=veth
arch: amd64
cores: 1
hostname: be-annuaire
memory: 512
onboot: 0
ostype: debian
rootfs: KVM:205/vm-205-disk-0.raw,size=1G
swap: 512
unprivileged: 1
proxmox-ve: 6.1-2 (running kernel: 5.3.18-2-pve)
pve-manager: 6.1-7 (running version: 6.1-7/13e58d5e)
pve-kernel-5.3: 6.1-5
pve-kernel-helper: 6.1-5
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
pve-kernel-4.13.13-2-pve: 4.13.13-33
pve-kernel-4.13.4-1-pve: 4.13.4-26
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.14-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-12
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-4
libpve-storage-perl: 6.1-4
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-4
pve-container: 3.0-19
pve-docs: 6.1-6
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-10
pve-firmware: 3.0-5
pve-ha-manager: 3.0-8
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-3
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-6
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

By looking at the config of the LXC container, I see that it uses the vm-205-disk-0.raw disk while I have the vm-205-disk-1.raw disk on the FS:


Code:
ls -al /mnt/pve/KVM/images/205
total 12637456
drwxr-----   2 root root        4096 Jul 28 19:02 .
drwxr-xr-x 122 root root       12288 Jul  8 12:45 ..
-rw-r--r--   1 root root  6619398144 Jul 28 19:11 vm-205-disk-1.qcow2
-rw-r-----   1 root root 34359738368 Jul 25 12:36 vm-205-disk-1.raw
 
Hmm, first make upgrade to the current version

Bash:
apt update && apt full-upgrade -y

after upgrade restart the node and try start the container again if the error still, please start the container with lxc-start -n 205 -F -l DEBUG -o /tmp/lxc-205.log and send content of logs as attach.

also output of systemctl status lxcfs.service
 
When I restarted the node, the whole cluster crashed :

proxmox.png
I will try to fix it before doing anything else.
 
I have successfully synchronized the cluster.

I added the request logs and here is the return of the command :
Code:
systemctl status lxcfs.service
● lxcfs.service - FUSE filesystem for LXC
   Loaded: loaded (/lib/systemd/system/lxcfs.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2020-08-28 12:26:18 CEST; 24min ago
     Docs: man:lxcfs(1)
 Main PID: 676 (lxcfs)
    Tasks: 3 (limit: 4915)
   Memory: 1.3M
   CGroup: /system.slice/lxcfs.service
           └─676 /usr/bin/lxcfs /var/lib/lxcfs

Aug 28 12:26:18 galaxie8 lxcfs[676]: - proc_diskstats
Aug 28 12:26:18 galaxie8 lxcfs[676]: - proc_loadavg
Aug 28 12:26:18 galaxie8 lxcfs[676]: - proc_meminfo
Aug 28 12:26:18 galaxie8 lxcfs[676]: - proc_stat
Aug 28 12:26:18 galaxie8 lxcfs[676]: - proc_swaps
Aug 28 12:26:18 galaxie8 lxcfs[676]: - proc_uptime
Aug 28 12:26:18 galaxie8 lxcfs[676]: - shared_pidns
Aug 28 12:26:18 galaxie8 lxcfs[676]: - cpuview_daemon
Aug 28 12:26:18 galaxie8 lxcfs[676]: - loadavg_daemon
Aug 28 12:26:18 galaxie8 lxcfs[676]: - pidfds

Don't you think the disc number issue matters?
 

Attachments

Code:
lsblk -f
NAME               FSTYPE      LABEL UUID                                   FSAVAIL FSUSE% MOUNTPOINT
sda                                                                                       
├─sda1                                                                                     
├─sda2             vfat              72FF-0255                                             
└─sda3             LVM2_member       GscOm0-Nzdi-6Rds-WoRr-9xiF-oe0M-U4xGbo               
  ├─pve-swap       swap              2342d24a-bb55-4742-b268-e210f928cbb0                  [SWAP]
  ├─pve-root       ext4              fd0f6968-cc71-42b3-aae4-cc99f15176df     41.6G    12% /
  ├─pve-data_tmeta                                                                         
  │ └─pve-data                                                                             
  └─pve-data_tdata                                                                         
    └─pve-data                                                                             
sr0

Code:
lvs -a
  LV              VG  Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  data            pve twi-a-tz-- 129.75g             0.00   10.43                           
  [data_tdata]    pve Twi-ao---- 129.75g                                                   
  [data_tmeta]    pve ewi-ao----  68.00m                                                   
  [lvol0_pmspare] pve ewi-------  68.00m                                                   
  root            pve -wi-ao----  51.00g                                                   
  swap            pve -wi-ao----   8.00g

Code:
vgs -a
  VG  #PV #LV #SN Attr   VSize   VFree
  pve   1   3   0 wz--n- 204.75g 15.86g
 
Code:
pct mount 205
volume 'KVM:205/vm-205-disk-0.raw' does not exist


It brings up the disc number problem again.
 
Code:
pct rescan --vmid 205
rescan volumes...
CT is locked (mounted)

Code:
cat /etc/pve/storage.cfg
dir: local
    path /var/lib/vz
    content iso,backup,vztmpl

lvmthin: local-lvm
    thinpool data
    vgname pve
    content rootdir,images

nfs: ISO
    export /vol/iso
    path /mnt/pve/ISO
    server yfiler
    content iso
    maxfiles 1
    options vers=3

nfs: KVM
    export /vol/kvm
    path /mnt/pve/KVM
    server yfiler
    content rootdir,vztmpl,images
    maxfiles 1
    options vers=3

nfs: KVM2
    export /volume1/KVM
    path /mnt/pve/KVM2
    server vfiler
    content images,rootdir
 
hi,

can you please try: pct unmount 205 && pct unlock 205 and afterwards pct rescan --vmid 205 again?
 
Code:
pct unmount 205 && pct unlock 205
no lock found trying to remove any lock

pct rescan --vmid 205
rescan volumes...
CT 205: add unreferenced volume 'KVM:205/vm-205-disk-1.qcow2' as 'unused0' to config.
CT 205: add unreferenced volume 'KVM:205/vm-205-disk-1.raw' as 'unused1' to config.
CT 205: updated volume size of 'KVM:205/vm-205-disk-0.raw' in config.
 
okay and does the container start now? or do you get an error again?
 
I no longer have an error message when I start the container but it does not seem to start correctly:

Code:
pct start 205 (no return)
... (wait few minutes)
pct status 205
status: stopped
 

Attachments

  • be.png
    be.png
    78.2 KB · Views: 26
could you obtain debug logs again with lxc-start -n 205 -F -l DEBUG -o /tmp/lxc-205.log and attach the file here?
 
it still says: lxc-start 205 20200831123127.496 DEBUG conf - conf.c:run_buffer:312 - Script exec /usr/share/lxc/hooks/lxc-pve-prestart-hook 205 lxc pre-start produced output: volume 'KVM:205/vm-205-disk-0.raw' does not exist

so it seems the disk is just missing... normally if it's on the filesystem pct rescan will find it and update the config, but i think since it's not finding it, it's probably just updating the size to 0? you can confirm it by checking pct config 205

were you doing something before the container problem appeared? maybe someone deleted or moved the disk by mistake?