Hi there,
we're actually running a four node Cluster with about 250 lxc containers on each node (evenly distributed). Primary Storage for almost all containers (except 4) is on the integrated ceph within proxmox.
We've had 3 Outages within the last week, all due to lxcfs fubar*ing up:
Apr 27 03:10:16 lxc-prox1 kernel: [741590.180559] cgroup: fork rejected by pids controller in /system.slice/lxcfs.service
Apr 27 03:10:16 lxc-prox1 lxcfs[1771]: fuse: error creating thread: Resource temporarily unavailable
Apr 27 03:10:18 lxc-prox1 lxcfs[1771]: bindings.c: 2473: recv_creds: Timed out waiting for scm_cred: No such file or directory
Restarting lxcfs to properly shutdown running containers (now zombie without a working /proc) and reboot the Cluster node solved the problems, but has its painpoints...
Are we hitting any limit here?
Googling around brought "https://www.suse.com/support/kb/doc/?id=000019044" to my attention that suggests to add a higher/unlimited Tasks Setting.
we're actually running a four node Cluster with about 250 lxc containers on each node (evenly distributed). Primary Storage for almost all containers (except 4) is on the integrated ceph within proxmox.
| Kernel Version Linux 5.3.13-1-pve #1 SMP PVE 5.3.13-1 (Thu, 05 Dec 2019 07:18:14 +0100) |
| PVE Manager Version pve-manager/6.1-3/37248ce6 |
We've had 3 Outages within the last week, all due to lxcfs fubar*ing up:
Apr 27 03:10:16 lxc-prox1 kernel: [741590.180559] cgroup: fork rejected by pids controller in /system.slice/lxcfs.service
Apr 27 03:10:16 lxc-prox1 lxcfs[1771]: fuse: error creating thread: Resource temporarily unavailable
Apr 27 03:10:18 lxc-prox1 lxcfs[1771]: bindings.c: 2473: recv_creds: Timed out waiting for scm_cred: No such file or directory
Restarting lxcfs to properly shutdown running containers (now zombie without a working /proc) and reboot the Cluster node solved the problems, but has its painpoints...
Are we hitting any limit here?
Googling around brought "https://www.suse.com/support/kb/doc/?id=000019044" to my attention that suggests to add a higher/unlimited Tasks Setting.
Code:
[Service]
TasksMax=MAX_TASKS|infinity