pct start vs lxc-start

Discussion in 'Proxmox VE: Installation and configuration' started by sigxcpu, Oct 21, 2015.

  1. sigxcpu

    sigxcpu Member

    Joined:
    May 4, 2012
    Messages:
    433
    Likes Received:
    9
    Is there a reason why pct start fails but lxc-start works?

    Code:
    # pct start 101
    lxc-start: lxc_start.c: main: 344 The container failed to start.
    lxc-start: lxc_start.c: main: 346 To get more details, run the container in foreground mode.
    lxc-start: lxc_start.c: main: 348 Additional information can be obtained by setting the --logfile and --logpriority options.
    
    Code:
    root@gen8:/usr/bin# lxc-start -n 101
    root@gen8:/usr/bin# pct list
    VMID       Status     Name
    101        running    SeedBox
    
    Of course debugging this would be impossible because --logfile and --logpriority belong to lxc-start, which works.


    LE: It looks like the GUI is very happy with the error

    Code:
    [COLOR=#000000][FONT=tahoma]lxc-start: lxc_start.c: main: 344 The container failed to start.[/FONT][/COLOR]
    [COLOR=#000000][FONT=tahoma]lxc-start: lxc_start.c: main: 346 To get more details, run the container in foreground mode.[/FONT][/COLOR]
    [COLOR=#000000][FONT=tahoma]lxc-start: lxc_start.c: main: 348 Additional information can be obtained by setting the --logfile and --logpriority options.[/FONT][/COLOR]
    [COLOR=#000000][FONT=tahoma]TASK OK
    
    [/FONT][/COLOR]
     
    #1 sigxcpu, Oct 21, 2015
    Last edited: Oct 21, 2015
  2. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    16,431
    Likes Received:
    298
    Well, the code from 'pct start' looks like this:

    Code:
            my $cmd = ['lxc-start', '-n', $vmid];
    
    
            run_command($cmd);
    
    So that is exactly the same.

    Yes, I know that this does not really help ;-)
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    16,431
    Likes Received:
    298
    Maybe it works if you start twice?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  4. sigxcpu

    sigxcpu Member

    Joined:
    May 4, 2012
    Messages:
    433
    Likes Received:
    9
    This happened after replacing lxcfs (my previous issue). A single container has this issue and, after restarting pve-manager, now another strange thing happens:

    - although it is marked as auto-start, it is not started when pve-manager comes up. It is marked as started (with those 3 error lines and "TASK OK" but it is not up).
    - if i do "pct start 101" from command line now it works

    So it seems that the first attempt fails, second one works. This might be related to lxcfs that I've replaced (to fix the corrupted systemd folder).
    I'll modify pct to add logging to the startup, maybe I get more information.
     
  5. sigxcpu

    sigxcpu Member

    Joined:
    May 4, 2012
    Messages:
    433
    Likes Received:
    9
    I've managed to reproduce it. One starts, one fails.

    Here's the relevant part:

    Code:
          lxc-start 1445432235.705 ERROR    lxc_conf - conf.c:instantiate_veth:2643 - failed to create veth pair (veth101i0 and vethLJ33G1): File exists
          lxc-start 1445432235.718 ERROR    lxc_conf - conf.c:lxc_create_network:2960 - failed to create netdev
          lxc-start 1445432235.718 ERROR    lxc_start - start.c:lxc_spawn:920 - failed to create the network
    


    Failed:
    Code:
          lxc-start 1445432235.355 INFO     lxc_start_ui - lxc_start.c:main:264 - using rcfile /var/lib/lxc/101/config
          lxc-start 1445432235.355 WARN     lxc_confile - confile.c:config_pivotdir:1825 - lxc.pivotdir is ignored.  It will soon become an error.
          lxc-start 1445432235.356 WARN     lxc_cgmanager - cgmanager.c:cgm_get:993 - do_cgm_get exited with error
          lxc-start 1445432235.356 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 4
          lxc-start 1445432235.356 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 6
          lxc-start 1445432235.356 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 7
          lxc-start 1445432235.359 INFO     lxc_container - lxccontainer.c:do_lxcapi_start:708 - Attempting to set proc title to [lxc monitor] /var/lib/lxc 101
          lxc-start 1445432235.359 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 6
          lxc-start 1445432235.359 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 7
          lxc-start 1445432235.359 INFO     lxc_lsm - lsm/lsm.c:lsm_init:48 - LSM security driver AppArmor
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:318 - processing: .reject_force_umount  # comment this to allow umount -f;  not recommended.
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:410 - Adding native rule for reject_force_umount action 0
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:do_resolve_add_rule:210 - Setting seccomp rule to reject force umounts
    
    
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:413 - Adding compat rule for reject_force_umount action 0
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:do_resolve_add_rule:210 - Setting seccomp rule to reject force umounts
    
    
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:318 - processing: .[all].
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:318 - processing: .kexec_load errno 1.
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:410 - Adding native rule for kexec_load action 327681
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:413 - Adding compat rule for kexec_load action 327681
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:318 - processing: .open_by_handle_at errno 1.
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:410 - Adding native rule for open_by_handle_at action 327681
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:413 - Adding compat rule for open_by_handle_at action 327681
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:318 - processing: .init_module errno 1.
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:410 - Adding native rule for init_module action 327681
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:413 - Adding compat rule for init_module action 327681
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:318 - processing: .finit_module errno 1.
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:410 - Adding native rule for finit_module action 327681
          lxc-start 1445432235.360 WARN     lxc_seccomp - seccomp.c:do_resolve_add_rule:227 - Seccomp: got negative # for syscall: finit_module
          lxc-start 1445432235.360 WARN     lxc_seccomp - seccomp.c:do_resolve_add_rule:228 - This syscall will NOT be blacklisted
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:413 - Adding compat rule for finit_module action 327681
          lxc-start 1445432235.360 WARN     lxc_seccomp - seccomp.c:do_resolve_add_rule:227 - Seccomp: got negative # for syscall: finit_module
          lxc-start 1445432235.360 WARN     lxc_seccomp - seccomp.c:do_resolve_add_rule:228 - This syscall will NOT be blacklisted
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:318 - processing: .delete_module errno 1.
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:410 - Adding native rule for delete_module action 327681
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:413 - Adding compat rule for delete_module action 327681
          lxc-start 1445432235.360 INFO     lxc_seccomp - seccomp.c:parse_config_v2:420 - Merging in the compat seccomp ctx into the main one
          lxc-start 1445432235.360 INFO     lxc_conf - conf.c:run_script_argv:356 - Executing script '/usr/share/lxc/hooks/lxc-pve-prestart-hook' for container '101', config section 'lxc'
          lxc-start 1445432235.360 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 4
          lxc-start 1445432235.360 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 6
          lxc-start 1445432235.360 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 7
          lxc-start 1445432235.363 INFO     lxc_monitor - monitor.c:lxc_monitor_sock_name:178 - using monitor sock name lxc/ad055575fe28ddd5//var/lib/lxc
          lxc-start 1445432235.704 INFO     lxc_start - start.c:lxc_init:454 - '101' is initialized
          lxc-start 1445432235.705 ERROR    lxc_conf - conf.c:instantiate_veth:2643 - failed to create veth pair (veth101i0 and vethLJ33G1): File exists
          lxc-start 1445432235.718 ERROR    lxc_conf - conf.c:lxc_create_network:2960 - failed to create netdev
          lxc-start 1445432235.718 ERROR    lxc_start - start.c:lxc_spawn:920 - failed to create the network
          lxc-start 1445432235.718 ERROR    lxc_start - start.c:__lxc_start:1172 - failed to spawn '101'
          lxc-start 1445432235.718 INFO     lxc_conf - conf.c:run_script_argv:356 - Executing script '/usr/share/lxc/hooks/lxc-pve-poststop-hook' for container '101', config section 'lxc'
          lxc-start 1445432236.060 WARN     lxc_commands - commands.c:lxc_cmd_rsp_recv:172 - command get_init_pid failed to receive response
          lxc-start 1445432236.060 WARN     lxc_cgmanager - cgmanager.c:cgm_get:993 - do_cgm_get exited with error
    
    
    
    
    
    
    
    
    
    
    
    
    
    
          lxc-start 1445432241.065 ERROR    lxc_start_ui - lxc_start.c:main:344 - The container failed to start.
          lxc-start 1445432241.065 ERROR    lxc_start_ui - lxc_start.c:main:346 - To get more details, run the container in foreground mode.
          lxc-start 1445432241.065 ERROR    lxc_start_ui - lxc_start.c:main:348 - Additional information can be obtained by setting the --logfile and --logpriority options.
    lxc-start: lxc_start.c: main: 344 The container failed to start.
    lxc-start: lxc_start.c: main: 346 To get more details, run the container in foreground mode.
    lxc-start: lxc_start.c: main: 348 Additional information can be obtained by setting the --logfile and --logpriority options.
    
    Success:

    Code:
          lxc-start 1445432297.910 INFO     lxc_start_ui - lxc_start.c:main:264 - using rcfile /var/lib/lxc/101/config
          lxc-start 1445432297.910 WARN     lxc_confile - confile.c:config_pivotdir:1825 - lxc.pivotdir is ignored.  It will soon become an error.
          lxc-start 1445432297.911 WARN     lxc_cgmanager - cgmanager.c:cgm_get:993 - do_cgm_get exited with error
          lxc-start 1445432297.911 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 4
          lxc-start 1445432297.911 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 6
          lxc-start 1445432297.911 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 7
          lxc-start 1445432297.913 INFO     lxc_container - lxccontainer.c:do_lxcapi_start:708 - Attempting to set proc title to [lxc monitor] /var/lib/lxc 101
          lxc-start 1445432297.914 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 6
          lxc-start 1445432297.914 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 7
          lxc-start 1445432297.914 INFO     lxc_lsm - lsm/lsm.c:lsm_init:48 - LSM security driver AppArmor
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:318 - processing: .reject_force_umount  # comment this to allow umount -f;  not recommended.
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:410 - Adding native rule for reject_force_umount action 0
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:do_resolve_add_rule:210 - Setting seccomp rule to reject force umounts
    
    
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:413 - Adding compat rule for reject_force_umount action 0
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:do_resolve_add_rule:210 - Setting seccomp rule to reject force umounts
    
    
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:318 - processing: .[all].
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:318 - processing: .kexec_load errno 1.
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:410 - Adding native rule for kexec_load action 327681
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:413 - Adding compat rule for kexec_load action 327681
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:318 - processing: .open_by_handle_at errno 1.
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:410 - Adding native rule for open_by_handle_at action 327681
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:413 - Adding compat rule for open_by_handle_at action 327681
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:318 - processing: .init_module errno 1.
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:410 - Adding native rule for init_module action 327681
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:413 - Adding compat rule for init_module action 327681
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:318 - processing: .finit_module errno 1.
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:410 - Adding native rule for finit_module action 327681
          lxc-start 1445432297.914 WARN     lxc_seccomp - seccomp.c:do_resolve_add_rule:227 - Seccomp: got negative # for syscall: finit_module
          lxc-start 1445432297.914 WARN     lxc_seccomp - seccomp.c:do_resolve_add_rule:228 - This syscall will NOT be blacklisted
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:413 - Adding compat rule for finit_module action 327681
          lxc-start 1445432297.914 WARN     lxc_seccomp - seccomp.c:do_resolve_add_rule:227 - Seccomp: got negative # for syscall: finit_module
          lxc-start 1445432297.914 WARN     lxc_seccomp - seccomp.c:do_resolve_add_rule:228 - This syscall will NOT be blacklisted
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:318 - processing: .delete_module errno 1.
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:410 - Adding native rule for delete_module action 327681
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:413 - Adding compat rule for delete_module action 327681
          lxc-start 1445432297.914 INFO     lxc_seccomp - seccomp.c:parse_config_v2:420 - Merging in the compat seccomp ctx into the main one
          lxc-start 1445432297.914 INFO     lxc_conf - conf.c:run_script_argv:356 - Executing script '/usr/share/lxc/hooks/lxc-pve-prestart-hook' for container '101', config section 'lxc'
          lxc-start 1445432297.914 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 4
          lxc-start 1445432297.914 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 6
          lxc-start 1445432297.914 INFO     lxc_start - start.c:lxc_check_inherited:224 - closed inherited fd 7
          lxc-start 1445432297.918 INFO     lxc_monitor - monitor.c:lxc_monitor_sock_name:178 - using monitor sock name lxc/ad055575fe28ddd5//var/lib/lxc
          lxc-start 1445432298.270 INFO     lxc_start - start.c:lxc_init:454 - '101' is initialized
          lxc-start 1445432298.271 INFO     lxc_conf - conf.c:run_script:406 - Executing script '/usr/share/lxc/lxcnetaddbr' for container '101', config section 'net'
          lxc-start 1445432298.614 INFO     lxc_cgroup - cgroup.c:cgroup_init:65 - cgroup driver cgmanager initing for 101
          lxc-start 1445432298.618 INFO     lxc_cgmanager - cgmanager.c:cgm_setup_limits:1397 - cgroup limits have been setup
          lxc-start 1445432298.637 INFO     lxc_conf - conf.c:setup_utsname:919 - 'SeedBox' hostname has been setup
          lxc-start 1445432298.642 INFO     lxc_conf - conf.c:setup_network:2492 - network has been setup
          lxc-start 1445432298.642 INFO     lxc_conf - conf.c:mount_autodev:1148 - Mounting /dev under /usr/lib/x86_64-linux-gnu/lxc/rootfs
          lxc-start 1445432298.642 INFO     lxc_conf - conf.c:mount_autodev:1169 - Mounted tmpfs onto /usr/lib/x86_64-linux-gnu/lxc/rootfs/dev
          lxc-start 1445432298.642 INFO     lxc_conf - conf.c:mount_autodev:1187 - Mounted /dev under /usr/lib/x86_64-linux-gnu/lxc/rootfs
          lxc-start 1445432298.642 INFO     lxc_conf - conf.c:mount_file_entries:2026 - mount points have been setup
          lxc-start 1445432298.642 INFO     lxc_conf - conf.c:run_script_argv:356 - Executing script '/usr/share/lxcfs/lxc.mount.hook' for container '101', config section 'lxc'
          lxc-start 1445432298.670 INFO     lxc_conf - conf.c:run_script_argv:356 - Executing script '/usr/share/lxc/hooks/lxc-pve-mount-hook' for container '101', config section 'lxc'
          lxc-start 1445432299.008 INFO     lxc_conf - conf.c:fill_autodev:1215 - Creating initial consoles under /usr/lib/x86_64-linux-gnu/lxc/rootfs/dev
          lxc-start 1445432299.008 INFO     lxc_conf - conf.c:fill_autodev:1226 - Populating /dev under /usr/lib/x86_64-linux-gnu/lxc/rootfs
          lxc-start 1445432299.008 INFO     lxc_conf - conf.c:fill_autodev:1258 - Populated /dev under /usr/lib/x86_64-linux-gnu/lxc/rootfs
          lxc-start 1445432299.008 INFO     lxc_conf - conf.c:setup_ttydir_console:1528 - created /usr/lib/x86_64-linux-gnu/lxc/rootfs/dev/lxc
          lxc-start 1445432299.008 INFO     lxc_conf - conf.c:setup_ttydir_console:1574 - console has been setup on lxc/console
          lxc-start 1445432299.008 INFO     lxc_utils - utils.c:mount_proc_if_needed:1430 - I am 1, /proc/self points to '1'
          lxc-start 1445432299.014 INFO     lxc_conf - conf.c:lxc_create_tty:3374 - tty's configured
          lxc-start 1445432299.014 INFO     lxc_conf - conf.c:setup_tty:1071 - 2 tty(s) has been setup
          lxc-start 1445432299.014 INFO     lxc_conf - conf.c:setup_personality:1462 - set personality to '0x0'
          lxc-start 1445432299.014 NOTICE   lxc_conf - conf.c:lxc_setup:3912 - '101' is setup.
          lxc-start 1445432299.016 INFO     lxc_cgmanager - cgmanager.c:cgm_setup_limits:1397 - cgroup limits have been setup
          lxc-start 1445432299.016 INFO     lxc_apparmor - lsm/apparmor.c:apparmor_process_label_set:187 - changed apparmor profile to lxc-container-default
          lxc-start 1445432299.017 NOTICE   lxc_start - start.c:start:1249 - exec'ing '/sbin/init'
          lxc-start 1445432299.017 NOTICE   lxc_start - start.c:post_start:1260 - '/sbin/init' started with pid '14424'
          lxc-start 1445432299.017 WARN     lxc_start - start.c:signal_handler:310 - invalid pid for SIGCHLD
          lxc-start 1445432299.017 ERROR    lxc_commands - commands.c:lxc_cmd_rsp_send:237 - failed to send command response -1 Broken pipe
          lxc-start 1445432299.017 INFO     lxc_start - start.c:signal_handler:296 - forwarded signal 13 to pid 14424
    
     
  6. sigxcpu

    sigxcpu Member

    Joined:
    May 4, 2012
    Messages:
    433
    Likes Received:
    9
    It looke like this interface stays up after pct stop.

    Code:
    veth101i0 Link encap:Ethernet  HWaddr fe:6a:ab:18:eb:3f
              inet6 addr: fe80::fc6a:abff:fe18:eb3f/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:596 errors:0 dropped:0 overruns:0 frame:0
              TX packets:682 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:114192 (111.5 KiB)  TX bytes:74472 (72.7 KiB)
    
     
  7. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    16,431
    Likes Received:
    298
    I just uploaded updates to the pvetest repository, including some additional lxcfs fixes. Please can you test with those packages?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  8. sigxcpu

    sigxcpu Member

    Joined:
    May 4, 2012
    Messages:
    433
    Likes Received:
    9
    I will update right now. Meanwhile, here's the error on stop when container is running:

    Code:
          lxc-start 1445432676.451 INFO     lxc_error - error.c:lxc_error_set_and_log:55 - child <28447> ended on signal (9)
          lxc-start 1445432676.451 WARN     lxc_conf - conf.c:lxc_delete_network:2995 - failed to remove interface 'eth0'
          lxc-start 1445432676.451 INFO     lxc_conf - conf.c:run_script_argv:356 - Executing script '/usr/share/lxc/hooks/lxc-pve-poststop-hook' for container '101', config section 'lxc'
    
     
  9. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    16,431
    Likes Received:
    298
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  10. sigxcpu

    sigxcpu Member

    Joined:
    May 4, 2012
    Messages:
    433
    Likes Received:
    9
    Nope, the updates didn't fix the issue, unless there is something in the kernel (I didn't reboot).

    It looks like a race condition, because I've managed to start it twice without errors, then moved back to one good, one bad.

    Maybe some service (I'm running nginx with SSL and webdav, transmission-daemon) keeps somehow the network device busy and sometimes it manages to shutdown before the network cleanup happens.
     
    #10 sigxcpu, Oct 21, 2015
    Last edited: Oct 21, 2015
  11. sigxcpu

    sigxcpu Member

    Joined:
    May 4, 2012
    Messages:
    433
    Likes Received:
    9
    Nope, just rebooted, so everything is new (4.0-51), including kernel. Same problem. Anyway, considering that a single container is doing that and all containers have the same template, it means that a specific service is at fault.
     
  12. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    16,431
    Likes Received:
    298
    Would you mind to post the container config - I am interested in the network setup.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  13. sigxcpu

    sigxcpu Member

    Joined:
    May 4, 2012
    Messages:
    433
    Likes Received:
    9
    Here it is:

    Code:
    arch: amd64
    cpulimit: 2
    cpuunits: 1024
    hostname: SeedBox
    memory: 512
    net0: name=eth0,hwaddr=12:B0:87:0B:B4:6E,bridge=vmbr0,ip=192.168.27.21/24,gw=192.168.27.1
    onboot: 1
    ostype: ubuntu
    rootfs: Containers:subvol-101-rootfs
    swap: 512
    lxc.mount.entry: /raid0/torrents data/torrents none bind 0 0
    protection: true
    
     
  14. wbumiller

    wbumiller Proxmox Staff Member
    Staff Member

    Joined:
    Jun 23, 2015
    Messages:
    642
    Likes Received:
    80
    Can you reproduce it with any other containers? Otherwise maybe you can clone it and wipe all unnecessary data (including package cache, home dirs etc) and provide a backup we can test? Without a way for us to reproduce it and LXC's unhelpful error messages it's hard to debug.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  15. sigxcpu

    sigxcpu Member

    Joined:
    May 4, 2012
    Messages:
    433
    Likes Received:
    9
    No, I can't.

    Unfortunately, I don't think a stripped down clone will help, as being such a hard to hit error, I'm afraid any change in the configuration (like the number of torrents in cache for transmission) will make it work.
    I'm pretty sure that the 2nd attempt (on failed startup) to clear the virtual interface succeeds because all the services belonging to that container are long gone. The stupid liblxc does not log the reason on why deleting the interface failed and, unfortunately, that part belongs to a binary, no easy changing to log something (like do a ps right before clearing the interface).
    I'll dig more into it when I have time.

    Anyway, the good news is that this happens to a "warm" container, used at least once, so the automatic startup after a real reboot (i.e. not service pve-manager restart) will be honored, so it will not be hit by many. Also, I don't think it happens on container "reboot" from inside, because the network interface is not destroyed.

    LE: I'll try to stop services before stopping the container and come back with the results.
     
  16. sigxcpu

    sigxcpu Member

    Joined:
    May 4, 2012
    Messages:
    433
    Likes Received:
    9
    No, it didn't work. For now, I've added a hack, maybe somebody else can elaborate it.

    In /usr/share/lxc/hooks/lxc-pve-poststop-hook I've put "ip link delete veth${vmid}i0" and it works every time.

    Btw, this issue goes back to 2013, it seems:
    https://lists.linuxcontainers.org/pipermail/lxc-users/2013-September/005620.html

    Considering that ip link delete is the big hammer here, it would not be a bad idea for Proxmox to do that, just in case. Anyway the container is teared down at this stage, so no harm will be done.

     
    #16 sigxcpu, Oct 22, 2015
    Last edited: Oct 22, 2015
  17. wbumiller

    wbumiller Proxmox Staff Member
    Staff Member

    Joined:
    Jun 23, 2015
    Messages:
    642
    Likes Received:
    80
    We're patching lxc to spit out a more useful error message with the next package updates ;-)
    There's a theory floating around but we can't really be sure (and can't reproduce/test it) until we see the real error message.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  18. sigxcpu

    sigxcpu Member

    Joined:
    May 4, 2012
    Messages:
    433
    Likes Received:
    9
    Based on what I've read until now, it seems that it happens where there is high network traffic (or many sockets open?) and the veth peer is pinned in container's namespace, so the outside peer can't be gracefully removed.
    My opinion, again, is to simply ip link delete all the container's interface peers (veth*) in the poststop hook. Container namespace is more or less deleted or going to be deleted at that point, so no harm will be done.
    This opinion is based on the fact that this error is from 2013, still randomly happens in 2015, so fixing it is non-trivial, as all the race errors. Somebody added a 10 seconds delay before tearing down the interfaces, which confirms the race.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice