[SOLVED] LXC container reboot fails - LXC becomes unusable

Discussion in 'Proxmox VE: Installation and configuration' started by denos, Feb 7, 2018.

  1. denos

    denos Member

    Joined:
    Jul 27, 2015
    Messages:
    72
    Likes Received:
    32
    Overview
    We're seeing a new issue since in-place upgrading Proxmox 4.x to Proxmox 5.x. For 2 different LXC containers after 1-3 reboots, the shutdown will fail. This occurs if triggered inside the OS (shutdown -r now) or from the hypervisor using "pct shutdown XXX --forceStop 1". Once it occurs, pct and the underlying lxc-XXX commands all hang.

    kill -9 (lxc monitor process for CT) will allow pct and lxc-XXX commands to function again, but it becomes impossible to start any LXC containers from this point forward without a hypervisor reboot.

    Debug
    pveversion
    Code:
    proxmox-ve: 5.1-38 (running kernel: 4.13.13-5-pve)
    pve-manager: 5.1-43 (running version: 5.1-43/bdb08029)
    pve-kernel-4.4.98-5-pve: 4.4.98-105
    pve-kernel-4.4.35-1-pve: 4.4.35-77
    pve-kernel-4.13.13-5-pve: 4.13.13-38
    pve-kernel-4.4.67-1-pve: 4.4.67-92
    libpve-http-server-perl: 2.0-8
    lvm2: 2.02.168-pve6
    corosync: 2.4.2-pve3
    libqb0: 1.0.1-1
    pve-cluster: 5.0-19
    qemu-server: 5.0-20
    pve-firmware: 2.0-3
    libpve-common-perl: 5.0-25
    libpve-guest-common-perl: 2.0-14
    libpve-access-control: 5.0-7
    libpve-storage-perl: 5.0-17
    pve-libspice-server1: 0.12.8-3
    vncterm: 1.5-3
    pve-docs: 5.1-16
    pve-qemu-kvm: 2.9.1-6
    pve-container: 2.0-18
    pve-firewall: 3.0-5
    pve-ha-manager: 2.0-4
    ksm-control-daemon: 1.2-2
    glusterfs-client: 3.8.8-1
    lxc-pve: 2.1.1-2
    lxcfs: 2.0.8-1
    criu: 2.11.1-1~bpo90
    novnc-pve: 0.6-4
    smartmontools: 6.5+svn4324-1
    zfsutils-linux: 0.7.4-pve2~bpo9
    

    strace of a pct start just shows Proxmox waiting on the underlying thead to start and fails on timeout.

    "strace -f lxc-start -l trace -o /tmp/trace.log -n CTID", yields too much debug to post in it's entirety, but here is the end of the dump showing that it's hanging on a clone operation (12161 is the lxc-monitor for the container):
    Code:
    ... snip ...
    [pid 12161] write(2, "cgroups/cgfsng.c: cgfsng_create:"..., 38) = 38
    [pid 12161] write(2, "Failed to create \"/sys/fs/cgroup"..., 51) = 51
    [pid 12161] write(2, "\n", 1)           = 1
    [pid 12161] stat("/sys/fs/cgroup/systemd//lxc/2022-1", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
    [pid 12161] write(3, "      lxc-start 2022 20180207193"..., 171) = 171
    [pid 12161] write(2, "lxc-start: 2022: ", 17) = 17
    [pid 12161] write(2, "cgroups/cgfsng.c: create_path_fo"..., 50) = 50
    [pid 12161] write(2, "Path \"/sys/fs/cgroup/systemd//lx"..., 58) = 58
    [pid 12161] write(2, "\n", 1)           = 1
    [pid 12161] write(3, "      lxc-start 2022 20180207193"..., 154) = 154
    [pid 12161] write(2, "lxc-start: 2022: ", 17) = 17
    [pid 12161] write(2, "cgroups/cgfsng.c: cgfsng_create:"..., 38) = 38
    [pid 12161] write(2, "Failed to create \"/sys/fs/cgroup"..., 53) = 53
    [pid 12161] write(2, "\n", 1)           = 1
    [pid 12161] stat("/sys/fs/cgroup/systemd//lxc/2022-2", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
    [pid 12161] write(3, "      lxc-start 2022 20180207193"..., 171) = 171
    [pid 12161] write(2, "lxc-start: 2022: ", 17) = 17
    [pid 12161] write(2, "cgroups/cgfsng.c: create_path_fo"..., 50) = 50
    [pid 12161] write(2, "Path \"/sys/fs/cgroup/systemd//lx"..., 58) = 58
    [pid 12161] write(2, "\n", 1)           = 1
    [pid 12161] write(3, "      lxc-start 2022 20180207193"..., 154) = 154
    [pid 12161] write(2, "lxc-start: 2022: ", 17) = 17
    [pid 12161] write(2, "cgroups/cgfsng.c: cgfsng_create:"..., 38) = 38
    [pid 12161] write(2, "Failed to create \"/sys/fs/cgroup"..., 53) = 53
    [pid 12161] write(2, "\n", 1)           = 1
    [pid 12161] stat("/sys/fs/cgroup/systemd//lxc/2022-3", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
    [pid 12161] write(3, "      lxc-start 2022 20180207193"..., 171) = 171
    [pid 12161] write(2, "lxc-start: 2022: ", 17) = 17
    [pid 12161] write(2, "cgroups/cgfsng.c: create_path_fo"..., 50) = 50
    [pid 12161] write(2, "Path \"/sys/fs/cgroup/systemd//lx"..., 58) = 58
    [pid 12161] write(2, "\n", 1)           = 1
    [pid 12161] write(3, "      lxc-start 2022 20180207193"..., 154) = 154
    [pid 12161] write(2, "lxc-start: 2022: ", 17) = 17
    [pid 12161] write(2, "cgroups/cgfsng.c: cgfsng_create:"..., 38) = 38
    [pid 12161] write(2, "Failed to create \"/sys/fs/cgroup"..., 53) = 53
    [pid 12161] write(2, "\n", 1)           = 1
    [pid 12161] stat("/sys/fs/cgroup/systemd//lxc/2022-4", 0x7ffe5a3f1c50) = -1 ENOENT (No such file or directory)
    [pid 12161] mkdir("/", 0755)            = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/", 0755)        = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/", 0755)     = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/systemd//", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/systemd//lxc/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/systemd//lxc/2022-4", 0755) = 0
    [pid 12161] stat("/sys/fs/cgroup/memory//lxc/2022-4", 0x7ffe5a3f1c50) = -1 ENOENT (No such file or directory)
    [pid 12161] mkdir("/", 0755)            = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/", 0755)        = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/", 0755)     = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/memory//", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/memory//lxc/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/memory//lxc/2022-4", 0755) = 0
    [pid 12161] stat("/sys/fs/cgroup/freezer//lxc/2022-4", 0x7ffe5a3f1c50) = -1 ENOENT (No such file or directory)
    [pid 12161] mkdir("/", 0755)            = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/", 0755)        = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/", 0755)     = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/freezer//", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/freezer//lxc/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/freezer//lxc/2022-4", 0755) = 0
    [pid 12161] stat("/sys/fs/cgroup/cpu//lxc/2022-4", 0x7ffe5a3f1c50) = -1 ENOENT (No such file or directory)
    [pid 12161] mkdir("/", 0755)            = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/", 0755)        = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/", 0755)     = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/cpu//", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/cpu//lxc/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/cpu//lxc/2022-4", 0755) = 0
    [pid 12161] stat("/sys/fs/cgroup/pids//lxc/2022-4", 0x7ffe5a3f1c50) = -1 ENOENT (No such file or directory)
    [pid 12161] mkdir("/", 0755)            = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/", 0755)        = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/", 0755)     = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/pids//", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/pids//lxc/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/pids//lxc/2022-4", 0755) = 0
    [pid 12161] stat("/sys/fs/cgroup/hugetlb//lxc/2022-4", 0x7ffe5a3f1c50) = -1 ENOENT (No such file or directory)
    [pid 12161] mkdir("/", 0755)            = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/", 0755)        = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/", 0755)     = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/hugetlb//", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/hugetlb//lxc/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/hugetlb//lxc/2022-4", 0755) = 0
    [pid 12161] stat("/sys/fs/cgroup/net_cls//lxc/2022-4", 0x7ffe5a3f1c50) = -1 ENOENT (No such file or directory)
    [pid 12161] mkdir("/", 0755)            = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/", 0755)        = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/", 0755)     = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/net_cls//", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/net_cls//lxc/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/net_cls//lxc/2022-4", 0755) = 0
    [pid 12161] stat("/sys/fs/cgroup/blkio//lxc/2022-4", 0x7ffe5a3f1c50) = -1 ENOENT (No such file or directory)
    [pid 12161] mkdir("/", 0755)            = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/", 0755)        = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/", 0755)     = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/blkio//", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/blkio//lxc/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/blkio//lxc/2022-4", 0755) = 0
    [pid 12161] stat("/sys/fs/cgroup/cpuset//lxc/2022-4", 0x7ffe5a3f1c50) = -1 ENOENT (No such file or directory)
    [pid 12161] mkdir("/sys/fs/cgroup/cpuset//lxc", 0755) = -1 EEXIST (File exists)
    [pid 12161] stat("/sys/fs/cgroup/cpuset//lxc/cgroup.clone_children", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
    [pid 12161] open("/sys/fs/cgroup/cpuset//lxc/cgroup.clone_children", O_RDONLY|O_CLOEXEC) = 14
    [pid 12161] read(14, "1", 1)            = 1
    [pid 12161] close(14)                   = 0
    [pid 12161] open("/sys/fs/cgroup/cpuset//cpuset.cpus", O_RDONLY) = 14
    [pid 12161] fstat(14, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
    [pid 12161] read(14, "0-31\n", 4096)    = 5
    [pid 12161] read(14, "", 4096)          = 0
    [pid 12161] close(14)                   = 0
    [pid 12161] stat("/sys/devices/system/cpu/isolated", {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
    [pid 12161] open("/sys/devices/system/cpu/isolated", O_RDONLY) = 14
    [pid 12161] fstat(14, {st_mode=S_IFREG|0444, st_size=4096, ...}) = 0
    [pid 12161] read(14, "\n", 4096)        = 1
    [pid 12161] read(14, "", 4096)          = 0
    [pid 12161] close(14)                   = 0
    [pid 12161] write(3, "      lxc-start 2022 20180207193"..., 132) = 132
    [pid 12161] write(3, "      lxc-start 2022 20180207193"..., 157) = 157
    [pid 12161] mkdir("/", 0755)            = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/", 0755)        = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/", 0755)     = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/cpuset//", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/cpuset//lxc/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/cpuset//lxc/2022-4", 0755) = 0
    [pid 12161] stat("/sys/fs/cgroup/devices//lxc/2022-4", 0x7ffe5a3f1c50) = -1 ENOENT (No such file or directory)
    [pid 12161] mkdir("/", 0755)            = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/", 0755)        = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/", 0755)     = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/devices//", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/devices//lxc/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/devices//lxc/2022-4", 0755) = 0
    [pid 12161] stat("/sys/fs/cgroup/rdma//lxc/2022-4", 0x7ffe5a3f1c50) = -1 ENOENT (No such file or directory)
    [pid 12161] mkdir("/", 0755)            = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/", 0755)        = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/", 0755)     = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/rdma//", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/rdma//lxc/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/rdma//lxc/2022-4", 0755) = 0
    [pid 12161] stat("/sys/fs/cgroup/perf_event//lxc/2022-4", 0x7ffe5a3f1c50) = -1 ENOENT (No such file or directory)
    [pid 12161] mkdir("/", 0755)            = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/", 0755)        = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/", 0755)     = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/perf_event//", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/perf_event//lxc/", 0755) = -1 EEXIST (File exists)
    [pid 12161] mkdir("/sys/fs/cgroup/perf_event//lxc/2022-4", 0755) = 0
    [pid 12161] lstat("/var", {st_mode=S_IFDIR|0755, st_size=13, ...}) = 0
    [pid 12161] lstat("/var/lib", {st_mode=S_IFDIR|0755, st_size=49, ...}) = 0
    [pid 12161] lstat("/var/lib/lxc", {st_mode=S_IFDIR|0755, st_size=15, ...}) = 0
    [pid 12161] lstat("/var/lib/lxc/2022", {st_mode=S_IFDIR|0755, st_size=4, ...}) = 0
    [pid 12161] lstat("/var/lib/lxc/2022/rootfs", {st_mode=S_IFDIR|0755, st_size=36, ...}) = 0
    [pid 12161] access("/var/lib/lxc/2022/rootfs", F_OK) = 0
    [pid 12161] stat("/var/lib/lxc/2022/rootfs", {st_mode=S_IFDIR|0755, st_size=36, ...}) = 0
    [pid 12161] open("/var/lib/lxc/2022/rootfs/lxc.hold", O_RDWR|O_CREAT, 0600) = 14
    [pid 12161] unlink("/var/lib/lxc/2022/rootfs/lxc.hold") = 0
    [pid 12161] getpid()                    = 12161
    [pid 12161] open("/proc/12161/ns", O_RDONLY|O_CLOEXEC) = 15
    [pid 12161] close(15)                   = 0
    [pid 12161] clone(
    netstat -xp shows a failure to connect to the command socket for the container:
    Code:
    root@dvm2:/proc/7600# netstat -xp | grep lxc | grep 2022
    unix  3      [ ]         STREAM     CONNECTING    0        -                    @/var/lib/lxc/2022/command
    unix  3      [ ]         STREAM     CONNECTING    0        -                    @/var/lib/lxc/2022/command
    unix  3      [ ]         STREAM     CONNECTING    0        -                    @/var/lib/lxc/2022/command
    unix  2      [ ]         STREAM     CONNECTING    0        -                    @/var/lib/lxc/2022/command
    unix  3      [ ]         STREAM     CONNECTING    0        -                    @/var/lib/lxc/2022/command
    unix  3      [ ]         STREAM     CONNECTING    0        -                    @/var/lib/lxc/2022/command
    
    The trace debug of the lxc-start command that was strace'd:
    Code:
          lxc-start 2022 20180207175421.764 INFO     lxc_start_ui - tools/lxc_start.c:main:280 - using rcfile /var/lib/lxc/2022/config
          lxc-start 2022 20180207175421.765 TRACE    lxc_commands - commands.c:lxc_cmd:290 - command get_init_pid tries to connect command socket
          lxc-start 2022 20180207175421.765 TRACE    lxc_commands - commands.c:lxc_cmd:295 - command get_init_pid failed to connect command socket: Connection refused
          lxc-start 2022 20180207175421.765 TRACE    lxc_commands - commands.c:lxc_cmd:290 - command get_cgroup tries to connect command socket
          lxc-start 2022 20180207175421.765 TRACE    lxc_commands - commands.c:lxc_cmd:295 - command get_cgroup failed to connect command socket: Connection refused
          lxc-start 2022 20180207175421.765 TRACE    lxc_commands - commands.c:do_lxc_cmd_get_cgroup_path:439 - command get_cgroup failed for container "2022": Connection refused.
          lxc-start 2022 20180207175421.765 TRACE    lxc_commands - commands.c:lxc_cmd:290 - command get_state tries to connect command socket
          lxc-start 2022 20180207175421.765 TRACE    lxc_commands - commands.c:lxc_cmd:295 - command get_state failed to connect command socket: Connection refused
          lxc-start 2022 20180207175421.765 TRACE    lxc_start - start.c:lxc_init_handler:583 - Created anonymous pair {4,5} of unix sockets
          lxc-start 2022 20180207175421.765 TRACE    lxc_start - start.c:lxc_init_handler:592 - unix domain socket 6 for command server is ready
          lxc-start 2022 20180207175421.765 INFO     lxc_container - lxccontainer.c:do_lxcapi_start:883 - Attempting to set proc title to [lxc monitor] /var/lib/lxc 2022
          lxc-start 2022 20180207175421.767 INFO     lxc_lsm - lsm/lsm.c:lsm_init:48 - LSM security driver AppArmor
          lxc-start 2022 20180207175421.767 TRACE    lxc_start - start.c:lxc_init:607 - initialized LSM
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .reject_force_umount  # comment this to allow umount -f;  not recommended.
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for reject_force_umount action 0(kill).
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:do_resolve_add_rule:276 - Setting Seccomp rule to reject force umounts.
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for reject_force_umount action 0(kill).
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:do_resolve_add_rule:276 - Setting Seccomp rule to reject force umounts.
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:do_resolve_add_rule:276 - Setting Seccomp rule to reject force umounts.
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .[all].
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .kexec_load errno 1.
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for kexec_load action 327681(errno).
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for kexec_load action 327681(errno).
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .open_by_handle_at errno 1.
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for open_by_handle_at action 327681(errno).
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for open_by_handle_at action 327681(errno).
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .init_module errno 1.
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for init_module action 327681(errno).
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for init_module action 327681(errno).
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .finit_module errno 1.
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for finit_module action 327681(errno).
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for finit_module action 327681(errno).
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .delete_module errno 1.
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for delete_module action 327681(errno).
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for delete_module action 327681(errno).
          lxc-start 2022 20180207175421.767 INFO     lxc_seccomp - seccomp.c:parse_config_v2:624 - Merging in the compat Seccomp ctx into the main one.
          lxc-start 2022 20180207175421.767 TRACE    lxc_start - start.c:lxc_init:613 - read seccomp policy
          lxc-start 2022 20180207175421.767 TRACE    lxc_start - start.c:lxc_serve_state_clients:373 - set container state to STARTING
          lxc-start 2022 20180207175421.767 TRACE    lxc_start - start.c:lxc_serve_state_clients:376 - no state clients registered
          lxc-start 2022 20180207175421.767 TRACE    lxc_start - start.c:lxc_init:620 - set container state to "STARTING"
          lxc-start 2022 20180207175421.767 TRACE    lxc_start - start.c:lxc_init:648 - set environment variables
          lxc-start 2022 20180207175421.767 INFO     lxc_conf - conf.c:run_script_argv:457 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "2022", config section "lxc".
          lxc-start 2022 20180207175422.297 TRACE    lxc_start - start.c:lxc_init:654 - ran pre-start hooks
          lxc-start 2022 20180207175422.297 DEBUG    lxc_start - start.c:setup_signal_fd:301 - Set SIGCHLD handler with file descriptor: 7.
          lxc-start 2022 20180207175422.297 TRACE    lxc_start - start.c:lxc_init:665 - set up signal fd
          lxc-start 2022 20180207175422.297 DEBUG    console - console.c:lxc_console_peer_default:450 - process does not have a controlling terminal
          lxc-start 2022 20180207175422.297 TRACE    lxc_start - start.c:lxc_init:672 - created console
          lxc-start 2022 20180207175422.297 TRACE    lxc_start - start.c:lxc_init:678 - shifted tty ids
          lxc-start 2022 20180207175422.297 INFO     lxc_start - start.c:lxc_init:680 - container "2022" is initialized
          lxc-start 2022 20180207175422.299 ERROR    lxc_network - network.c:instantiate_veth:130 - Failed to create veth pair "veth2022i0" and "vethC5KAG8": File exists
          lxc-start 2022 20180207175422.299 ERROR    lxc_network - network.c:lxc_create_network_priv:2407 - Failed to create network device
          lxc-start 2022 20180207175422.299 ERROR    lxc_start - start.c:lxc_spawn:1206 - Failed to create the network.
          lxc-start 2022 20180207175422.299 ERROR    lxc_start - start.c:__lxc_start:1477 - Failed to spawn container "2022".
          lxc-start 2022 20180207175422.299 TRACE    lxc_start - start.c:lxc_serve_state_socket_pair:446 - Sent container state "STOPPING" to 5
          lxc-start 2022 20180207175422.299 TRACE    lxc_start - start.c:lxc_serve_state_clients:373 - set container state to STOPPING
          lxc-start 2022 20180207175422.299 TRACE    lxc_start - start.c:lxc_serve_state_clients:376 - no state clients registered
          lxc-start 2022 20180207175422.299 ERROR    lxc_container - lxccontainer.c:wait_on_daemonized_start:760 - Received container state "STOPPING" instead of "RUNNING"
          lxc-start 2022 20180207175422.299 TRACE    lxc_start - start.c:lxc_serve_state_clients:373 - set container state to STOPPED
          lxc-start 2022 20180207175422.299 TRACE    lxc_start - start.c:lxc_serve_state_clients:376 - no state clients registered
          lxc-start 2022 20180207175422.299 ERROR    lxc_start_ui - tools/lxc_start.c:main:371 - The container failed to start.
          lxc-start 2022 20180207175422.299 INFO     lxc_conf - conf.c:run_script_argv:457 - Executing script "/usr/share/lxc/hooks/lxc-pve-poststop-hook" for container "2022", config section "lxc".
          lxc-start 2022 20180207175422.299 ERROR    lxc_start_ui - tools/lxc_start.c:main:373 - To get more details, run the container in foreground mode.
          lxc-start 2022 20180207175422.299 ERROR    lxc_start_ui - tools/lxc_start.c:main:375 - Additional information can be obtained by setting the --logfile and --logpriority options.
          lxc-start 2022 20180207175437.694 INFO     lxc_conf - conf.c:run_script_argv:457 - Executing script "/usr/share/lxcfs/lxc.reboot.hook" for container "2022", config section "lxc".
          lxc-start 2022 20180207175505.412 INFO     lxc_start_ui - tools/lxc_start.c:main:280 - using rcfile /var/lib/lxc/2022/config
          lxc-start 2022 20180207175505.412 TRACE    lxc_commands - commands.c:lxc_cmd:290 - command get_init_pid tries to connect command socket
          lxc-start 2022 20180207175505.413 TRACE    lxc_commands - commands.c:lxc_cmd:295 - command get_init_pid failed to connect command socket: Connection refused
          lxc-start 2022 20180207175505.413 TRACE    lxc_commands - commands.c:lxc_cmd:290 - command get_cgroup tries to connect command socket
          lxc-start 2022 20180207175505.413 TRACE    lxc_commands - commands.c:lxc_cmd:295 - command get_cgroup failed to connect command socket: Connection refused
          lxc-start 2022 20180207175505.413 TRACE    lxc_commands - commands.c:do_lxc_cmd_get_cgroup_path:439 - command get_cgroup failed for container "2022": Connection refused.
          lxc-start 2022 20180207175505.413 TRACE    lxc_commands - commands.c:lxc_cmd:290 - command get_state tries to connect command socket
          lxc-start 2022 20180207175505.413 TRACE    lxc_commands - commands.c:lxc_cmd:295 - command get_state failed to connect command socket: Connection refused
          lxc-start 2022 20180207175505.413 TRACE    lxc_start - start.c:lxc_init_handler:592 - unix domain socket 4 for command server is ready
          lxc-start 2022 20180207175505.413 INFO     lxc_lsm - lsm/lsm.c:lsm_init:48 - LSM security driver AppArmor
          lxc-start 2022 20180207175505.413 TRACE    lxc_start - start.c:lxc_init:607 - initialized LSM
          lxc-start 2022 20180207175505.413 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .reject_force_umount  # comment this to allow umount -f;  not recommended.
          lxc-start 2022 20180207175505.413 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for reject_force_umount action 0(kill).
          lxc-start 2022 20180207175505.413 INFO     lxc_seccomp - seccomp.c:do_resolve_add_rule:276 - Setting Seccomp rule to reject force umounts.
          lxc-start 2022 20180207175505.413 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for reject_force_umount action 0(kill).
          lxc-start 2022 20180207175505.413 INFO     lxc_seccomp - seccomp.c:do_resolve_add_rule:276 - Setting Seccomp rule to reject force umounts.
          lxc-start 2022 20180207175505.413 INFO     lxc_seccomp - seccomp.c:do_resolve_add_rule:276 - Setting Seccomp rule to reject force umounts.
          lxc-start 2022 20180207175505.413 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .[all].
          lxc-start 2022 20180207175505.413 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .kexec_load errno 1.
          lxc-start 2022 20180207175505.413 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for kexec_load action 327681(errno).
          lxc-start 2022 20180207175505.413 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for kexec_load action 327681(errno).
          lxc-start 2022 20180207175505.413 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .open_by_handle_at errno 1.
          lxc-start 2022 20180207175505.413 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for open_by_handle_at action 327681(errno).
          lxc-start 2022 20180207175505.413 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for open_by_handle_at action 327681(errno).
          lxc-start 2022 20180207175505.414 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .init_module errno 1.
          lxc-start 2022 20180207175505.414 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for init_module action 327681(errno).
          lxc-start 2022 20180207175505.414 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for init_module action 327681(errno).
          lxc-start 2022 20180207175505.414 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .finit_module errno 1.
          lxc-start 2022 20180207175505.414 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for finit_module action 327681(errno).
          lxc-start 2022 20180207175505.414 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for finit_module action 327681(errno).
          lxc-start 2022 20180207175505.414 INFO     lxc_seccomp - seccomp.c:parse_config_v2:435 - processing: .delete_module errno 1.
          lxc-start 2022 20180207175505.414 INFO     lxc_seccomp - seccomp.c:parse_config_v2:610 - Adding native rule for delete_module action 327681(errno).
          lxc-start 2022 20180207175505.414 INFO     lxc_seccomp - seccomp.c:parse_config_v2:614 - Adding compat rule for delete_module action 327681(errno).
          lxc-start 2022 20180207175505.414 INFO     lxc_seccomp - seccomp.c:parse_config_v2:624 - Merging in the compat Seccomp ctx into the main one.
          lxc-start 2022 20180207175505.414 TRACE    lxc_start - start.c:lxc_init:613 - read seccomp policy
          lxc-start 2022 20180207175505.414 TRACE    lxc_start - start.c:lxc_serve_state_clients:373 - set container state to STARTING
          lxc-start 2022 20180207175505.414 TRACE    lxc_start - start.c:lxc_serve_state_clients:376 - no state clients registered
          lxc-start 2022 20180207175505.414 TRACE    lxc_start - start.c:lxc_init:620 - set container state to "STARTING"
          lxc-start 2022 20180207175505.414 TRACE    lxc_start - start.c:lxc_init:648 - set environment variables
          lxc-start 2022 20180207175505.414 INFO     lxc_conf - conf.c:run_script_argv:457 - Executing script "/usr/share/lxc/hooks/lxc-pve-prestart-hook" for container "2022", config section "lxc".
          lxc-start 2022 20180207175505.945 TRACE    lxc_start - start.c:lxc_init:654 - ran pre-start hooks
          lxc-start 2022 20180207175505.945 DEBUG    lxc_start - start.c:setup_signal_fd:301 - Set SIGCHLD handler with file descriptor: 5.
          lxc-start 2022 20180207175505.945 TRACE    lxc_start - start.c:lxc_init:665 - set up signal fd
          lxc-start 2022 20180207175505.946 DEBUG    console - console.c:lxc_console_peer_default:459 - using "/dev/tty" as peer tty device
          lxc-start 2022 20180207175505.946 DEBUG    console - console.c:lxc_console_sigwinch_init:151 - process 42013 created signal fd 9 to handle SIGWINCH events
          lxc-start 2022 20180207175505.946 DEBUG    console - console.c:lxc_console_winsz:71 - set winsz dstfd:6 cols:89 rows:26
          lxc-start 2022 20180207175505.946 TRACE    lxc_start - start.c:lxc_init:672 - created console
          lxc-start 2022 20180207175505.946 TRACE    lxc_start - start.c:lxc_init:678 - shifted tty ids
          lxc-start 2022 20180207175505.946 INFO     lxc_start - start.c:lxc_init:680 - container "2022" is initialized
          lxc-start 2022 20180207175505.948 INFO     lxc_conf - conf.c:run_script:507 - Executing script "/usr/share/lxc/lxcnetaddbr" for container "2022", config section "net".
          lxc-start 2022 20180207175506.465 DEBUG    lxc_network - network.c:instantiate_veth:219 - Instantiated veth "veth2022i0/vethSUM8EJ", index is "50"
          lxc-start 2022 20180207175506.465 INFO     lxc_cgroup - cgroups/cgroup.c:cgroup_init:67 - cgroup driver cgroupfs-ng initing for 2022
          lxc-start 2022 20180207175506.465 ERROR    lxc_cgfsng - cgroups/cgfsng.c:create_path_for_hierarchy:1337 - Path "/sys/fs/cgroup/systemd//lxc/2022" already existed.
          lxc-start 2022 20180207175506.465 ERROR    lxc_cgfsng - cgroups/cgfsng.c:cgfsng_create:1433 - Failed to create "/sys/fs/cgroup/systemd//lxc/2022"
          lxc-start 2022 20180207175506.466 DEBUG    lxc_cgfsng - cgroups/cgfsng.c:filter_and_set_cpus:469 - No isolated cpus detected.
          lxc-start 2022 20180207175506.466 DEBUG    lxc_cgfsng - cgroups/cgfsng.c:handle_cpuset_hierarchy:640 - "cgroup.clone_children" was already set to "1".
    
    Any attempt to start any container will now hang, even after a kill -9 of the monitor process which only allows informational pct and lxc commands to execute. The containers are CentOS 6 based images that were migrated from OpenVZ. They were restarting without issue on Proxmox 4.x.

    Any help would be greatly appreciated.
     
  2. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,097
    Likes Received:
    184
    Could you please try a kernel 4.10 from our repository? AFAIK, to get container start working again, only a reboot of the host will work.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. denos

    denos Member

    Joined:
    Jul 27, 2015
    Messages:
    72
    Likes Received:
    32
    I have been trying different kernels, network changes, LXC container image changes, etc. without solving this problem. However, I notice these on the console regularly:
    kernel:[ 1987.452238] unregister_netdevice: waiting for lo to become free. Usage count = 1

    I've seen that before without it causing problems, but some research shows that it has caused issues similar to mine for Docker:
    https://github.com/coreos/bugs/issues/254
    https://github.com/moby/moby/issues/5618

    From the comments, it looks like there is a Jan/2018 patch for this issue in 4.4.114 (so it's not likely in 4.13.13):
    https://github.com/moby/moby/issues/5618#issuecomment-365484617
    Second commit in changelog:
    https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.4.114

    Also in 4.14.16
    https://cdn.kernel.org/pub/linux/kernel/v4.x/ChangeLog-4.14.16

    Patch Details:
    https://patchwork.kernel.org/patch/10174111/

    Any chance this could be ported into 4.13.13 testing?
     
    #3 denos, Feb 20, 2018
    Last edited: Feb 20, 2018
  4. denos

    denos Member

    Joined:
    Jul 27, 2015
    Messages:
    72
    Likes Received:
    32
    While reading kernel commits yesterday, I noticed multiple namespaces fixes in 4.14.16 - 4.14.20 so I built a vanilla 4.14.20 kernel (using the config but no patchsets from 4.13.13-6) and there have been no further namespace cloning issues after almost 1000 reboots.

    I can confirm the issue is still present in 4.13.13-6.
     
  5. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,097
    Likes Received:
    184
    Our newest 4.13.13-6 kernel has patches included, please try the kernel from our pvetest repository.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  6. denos

    denos Member

    Joined:
    Jul 27, 2015
    Messages:
    72
    Likes Received:
    32
    The issue is still present in 4.13.13-6. I tried with kpti on and off -- both broken. So it must be another change that resolved it. 4.14.20 works.
     
  7. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,195
    Likes Received:
    494
    which exact kernel did you test? there are three kernel versions with the ABI 4.13.13-6-pve, only the latest (package version 4.13.13-41) includes the patches.

    uname -a should output:
    Code:
    Linux host 4.13.13-6-pve #1 SMP PVE 4.13.13-41 (Wed, 21 Feb 2018 10:07:54 +0100) x86_64 GNU/Linux
    
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  8. denos

    denos Member

    Joined:
    Jul 27, 2015
    Messages:
    72
    Likes Received:
    32
    I didn't realize the pve-kernel versions aren't unique. I evaluated pve-kernel-4.13.13-6 backed by kernel 4.13.13-40 which still had the issue. I see the update you referenced to pve-kernel-4.13.13-6 (4.13.13-41) and will test that shortly. Thanks for clarifying.
     
  9. denos

    denos Member

    Joined:
    Jul 27, 2015
    Messages:
    72
    Likes Received:
    32
    4-13-13-6 (4.13.13-41) still has the issue. I test for the issue by triggering a reboot every minute via cron in a CentOS 6 or Ubuntu 14.04 container. It died within a few minutes with the same symptoms (stuck cloning the network namespace):

    Here's the stack for the monitor process:

    Code:
    # cat /proc/23276/stack
    [<ffffffff8e1dfd1b>] copy_net_ns+0xab/0x220
    [<ffffffff8daa9beb>] create_new_namespaces+0x11b/0x1c0
    [<ffffffff8daa9cfd>] copy_namespaces+0x6d/0xa0
    [<ffffffff8da84cbc>] copy_process.part.32+0x91c/0x1a40
    [<ffffffff8da85fbf>] _do_fork+0xdf/0x3f0
    [<ffffffff8da86379>] SyS_clone+0x19/0x20
    [<ffffffff8da03a91>] do_syscall_64+0x61/0xd0
    [<ffffffff8e40024d>] entry_SYSCALL64_slow_path+0x25/0x25
    [<ffffffffffffffff>] 0xffffffffffffffff
    
     
  10. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    I also have exactly same problem..
    I have 5 clusters with 5 nodes each.
    All clusters has same issue.
    I didn't know LXC has so many issues. I should have stayed with openvz.
    KVM has no issue in promxox 5.1 though.
     
  11. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    uname -a
    Linux T064 4.13.13-6-pve #1 SMP PVE 4.13.13-41 (Wed, 21 Feb 2018 10:07:54 +0100) x86_64 GNU/Linux

    I also have same version.

    Any new version available?
     
  12. denos

    denos Member

    Joined:
    Jul 27, 2015
    Messages:
    72
    Likes Received:
    32
    If you're desperate for an immediate resolution, I have debian packages built using GCC 7.3 (full generic retpoline) for 4.14.23 along with ZFS 0.7.6 DKMS packages. The issue is definitely resolved in any kernel 4.14.20 or later (possibly earlier but that's where I started).

    Things to consider:
    * I'm a stranger.
    * This is a different kernel branch (4.14) than what Proxmox is tracking (4.13) for the pve-kernel (ie. untested).
    * Although I've structured the changes so that new pve-kernels will still work, some future cleanup should be done once pve-kernel has this issue resolved.

    If you still want to go ahead, send me a private message and I'll get you the files with some instructions.
     
  13. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,195
    Likes Received:
    494
    if you can reliably reproduce the issue, it would be great to verify whether mainline 4.14 itself is affected or not. so far, we cannot reproduce the issue in our test lab, so we cannot bisect it either (we did cherry-pick some likely fixes in related areas to our 4.13 kernel, but it seems those did not fix the issue at hand).
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  14. tom

    tom Proxmox Staff Member
    Staff Member

    Joined:
    Aug 29, 2006
    Messages:
    13,449
    Likes Received:
    391
    Can you provide a test container (backup) where we can see this behavior? So far we cannot see this with our containers, so there must be something different in your containers.
    (send a download link to office(at)proxmox.com, with no sensitive data inside)
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  15. denos

    denos Member

    Joined:
    Jul 27, 2015
    Messages:
    72
    Likes Received:
    32
    The kernels (4.14.20 and 4.14.23 -- both resolve the issue) are from the source tars on kernel.org with no patches applied. I just copied over the Proxmox config from the 4.13.13 kernel (make oldconfig) then used the make target "deb-pkg" to get the kernel and headers packaged so I could distribute them to other servers.

    By mainline, did you mean from source (tested - works) or using the Debian 4.14 backport (I haven't tested this)?
     
  16. denos

    denos Member

    Joined:
    Jul 27, 2015
    Messages:
    72
    Likes Received:
    32
    There are 5.1 hypervisors (running pve-kernel) where I don't see the issue either and I'm at a loss as to why some hypervisors can reproduce this easily while others appear unaffected (at exactly the same patch level). My best guess is that it's an interaction or timing problem caused by a specific mix of containers.

    Unfortunately, the container I test with is one of 17 on my home server and is currently very large (database). I'll investigate whether the issue is repeatable on the container without the database as that would be a size that could be shared.

    Another user reported that they were able to reproduce this issue on a clean install with 4 generic containers. I'll encourage them to post the details to this thread.
     
  17. denos

    denos Member

    Joined:
    Jul 27, 2015
    Messages:
    72
    Likes Received:
    32
    You mentioned in another thread that you were able to duplicate this issue on a clean install. Can you please share the details here so Proxmox can attempt to duplicate the environment? Thanks!
     
  18. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    Yes, I took a fresh node, installed 5 CT with centos 6.

    Keep it in loop to stop and start each CT every minute.

    And issue happened in 10-20 minutes time.

    Keep in mind I had ZFS, and ARC cache was set at 8GB.

    When I changed ARC cache to 1GB, issue reduced to almost 10%, it ran for 2-4 hours without issues.

    In my production nodes (25 of them) , I avoided stop and restart maximum possible and nodes are running for last 48 hours without an issue. (with 1GB ARC cache)
     
  19. denos

    denos Member

    Joined:
    Jul 27, 2015
    Messages:
    72
    Likes Received:
    32
    This could indicate that the trigger for the issue is sensitive to timing / performance at the storage layer. With a smaller ARC (especially 1GB) performance will be degraded and the delays may reduce the occurrence of the issue. All of our environments run large ARCs (32GB minimum) tuned for very high cache hit rates. The servers that regularly experience this issue are showing ARC hit rates between 84% and 95%.
     
  20. Vasu Sreekumar

    Vasu Sreekumar Active Member

    Joined:
    Mar 3, 2018
    Messages:
    123
    Likes Received:
    34
    I can confirm by 100% that reducing ARC cache from 8GB to 1GB reduced the issue by almost 90%

    But you did test with 3.14.20 and issue was not there even with 32GB ARC cache right?
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice