Can't start lxc after migration

Salzi

Active Member
May 4, 2017
9
2
43
Hello
I have a cluster with 3 servers. All running proxmox 5.1. All servers are also ceph nodes. All ceph pools have a size of 3 and min size of 2.
On server 2 there is a container which is part of HA and its installed in a ceph pool. This morning server 1 and 2 where restarted (currently I don't know why but that's not the topic here). Therefore the ha-manager migrates the HA container to server 3 which was still running. Unfortunately it was not able to start the container and the container is since than in an error state. Now all 3 servers are back online and the system status and ceph status is good. But I'm still not able to start the container. I removed it from the HA and started it by hand and got the following error:
Code:
     lxc-start 101 20171212123545.646 INFO     lxc_start_ui - tools/lxc_start.c:main:277 - using rcfile /var/lib/lxc/101/config
      lxc-start 101 20171212123545.646 INFO     lxc_lsm - lsm/lsm.c:lsm_init:48 - LSM security driver AppArmor
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .reject_force_umount  # comment this to allow umount -f;  not recommended.
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for reject_force_umount action 0(kill).
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:do_resolve_add_rule:276 - Setting Seccomp rule to reject force umounts.
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for reject_force_umount action 0(kill).
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:do_resolve_add_rule:276 - Setting Seccomp rule to reject force umounts.
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:do_resolve_add_rule:276 - Setting Seccomp rule to reject force umounts.
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .[all].
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .kexec_load errno 1.
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for kexec_load action 327681(errno).
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for kexec_load action 327681(errno).
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .open_by_handle_at errno 1.
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for open_by_handle_at action 327681(errno).
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for open_by_handle_at action 327681(errno).
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .init_module errno 1.
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for init_module action 327681(errno).
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for init_module action 327681(errno).
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .finit_module errno 1.
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for finit_module action 327681(errno).
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for finit_module action 327681(errno).
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .delete_module errno 1.
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for delete_module action 327681(errno).
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for delete_module action 327681(errno).
      lxc-start 101 20171212123545.647 INFO     lxc_seccomp - seccomp.c:parse_config_v2:624 - Merging in the compat Seccomp ctx into the main one.
      lxc-start 101 20171212123545.647 INFO     lxc_conf - conf.c:run_script_argv:457 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "101", config section "lxc".
      lxc-start 101 20171212123546.397 ERROR    lxc_conf - conf.c:run_buffer:438 - Script exited with status 32.
      lxc-start 101 20171212123546.397 ERROR    lxc_start - start.c:lxc_init:639 - Failed to run lxc.hook.pre-start for container "101".
      lxc-start 101 20171212123546.397 ERROR    lxc_start - start.c:__lxc_start:1436 - Failed to initialize container "101".
      lxc-start 101 20171212123546.397 ERROR    lxc_start_ui - tools/lxc_start.c:main:368 - The container failed to start.
      lxc-start 101 20171212123546.397 ERROR    lxc_start_ui - tools/lxc_start.c:main:372 - Additional information can be obtained by setting the --logfile and --logpriority options.

Couldn't find any useful information. Error 32 is a broken pipe but more than that I don't know. What can I do?
Thanks for help in advance!
Regards
Salzi
 
Probably it was because of a corrupt file system. Was able to fix it with pct fsck.

Code:
pct fsck 101
fsck from util-linux 2.29.2
/dev/rbd0: recovering journal
JBD2: Invalid checksum recovering block 1 in log
Journal checksum error found in /dev/rbd0
/dev/rbd0: Clearing orphaned inode 395757 (uid=0, gid=0, mode=0140777, size=0)
/dev/rbd0: Clearing orphaned inode 367 (uid=0, gid=0, mode=0100755, size=207336)
/dev/rbd0: Clearing orphaned inode 346 (uid=0, gid=0, mode=0100755, size=117160)
/dev/rbd0: Clearing orphaned inode 368 (uid=0, gid=0, mode=0100644, size=2217104)
/dev/rbd0: Clearing orphaned inode 140 (uid=0, gid=0, mode=0100644, size=557552)
/dev/rbd0: Clearing orphaned inode 100 (uid=0, gid=0, mode=0100644, size=170128)
/dev/rbd0: Clearing orphaned inode 135 (uid=0, gid=0, mode=0100644, size=325776)
/dev/rbd0: Clearing orphaned inode 143907 (uid=0, gid=0, mode=0100755, size=224208)
/dev/rbd0: Clearing orphaned inode 253 (uid=0, gid=0, mode=0100644, size=155400)
/dev/rbd0: Clearing orphaned inode 276 (uid=0, gid=0, mode=0100644, size=1108088)
/dev/rbd0: Clearing orphaned inode 251 (uid=0, gid=0, mode=0100644, size=31744)
/dev/rbd0: Clearing orphaned inode 87 (uid=0, gid=0, mode=0100755, size=135440)
/dev/rbd0: Clearing orphaned inode 259 (uid=0, gid=0, mode=0100644, size=47688)
/dev/rbd0: Clearing orphaned inode 111 (uid=0, gid=0, mode=0100644, size=47632)
/dev/rbd0: Clearing orphaned inode 88 (uid=0, gid=0, mode=0100644, size=31616)
/dev/rbd0: Clearing orphaned inode 123 (uid=0, gid=0, mode=0100644, size=89064)
/dev/rbd0: Clearing orphaned inode 103 (uid=0, gid=0, mode=0100644, size=14640)
/dev/rbd0: Clearing orphaned inode 188 (uid=0, gid=0, mode=0100755, size=1689360)
/dev/rbd0: Clearing orphaned inode 274 (uid=0, gid=0, mode=0100755, size=153288)
/dev/rbd0: Clearing orphaned inode 9231 (uid=0, gid=0, mode=0100600, size=0)
/dev/rbd0 was not cleanly unmounted, check forced.
/dev/rbd0: 31967/524288 files (0.2% non-contiguous), 263059/2097152 blocks
command 'fsck -a -l /dev/rbd/ceph-lxc/vm-101-disk-1' failed: exit code 1

now I was able to start the container. Still don't know why this happend. Any idea?