Cannot open /proc/stat: Transport endpoint is not connected

SPQRInc · Dec 18, 2015

Hello,

since a few seconds I can not use htop or top in my LXC-containers:

Cannot open /proc/stat: Transport endpoint is not connected

Same problem on reloading php5-fpm

Dez 18 13:40:31 systemd[1]: Failed to create cgroup /lxc/103/system.slice/systemd-random-seed.service: Transport endpoint is not connected

Dez 18 13:40:31 systemd[1]: Failed to realize cgroups for queued unit systemd-random-seed.service: Transport endpoint is not connected

Dez 18 13:40:31 systemd[1]: Failed to create cgroup /lxc/103/system.slice/networking.service: Transport endpoint is not connected

Dez 18 13:40:31 systemd[1]: Failed to realize cgroups for queued unit networking.service: Transport endpoint is not connected

Dez 18 13:40:31 systemd[1]: Failed to create cgroup /lxc/103/system.slice/kbd.service: Transport endpoint is not connected

Dez 18 13:40:31 systemd[1]: Failed to realize cgroups for queued unit kbd.service: Transport endpoint is not connected

Dez 18 13:40:31 systemd[1]: Failed to create cgroup /lxc/103/system.slice/systemd-journald.service: Transport endpoint is not connected

Dez 18 13:40:31 systemd[1]: Failed to realize cgroups for queued unit systemd-journald.service: Transport endpoint is not connected

Dez 18 13:40:31 systemd[1]: php5-fpm.service: control process exited, code=exited status=219

Dez 18 13:40:31 systemd[1]: Reload failed for The PHP FastCGI Process Manager.

This is the syslog

Dec 18 13:41:13 systemd[30954]: Received SIGRTMIN+24 from PID 30962 (kill).

Dec 18 13:42:13 systemd[31685]: Starting Paths.

Dec 18 13:42:13 systemd[31685]: Reached target Paths.

Dec 18 13:42:13 systemd[31685]: Starting Timers.

Dec 18 13:42:13 systemd[31685]: Reached target Timers.

Dec 18 13:42:13 systemd[31685]: Starting Sockets.

Dec 18 13:42:13 systemd[31685]: Reached target Sockets.

Dec 18 13:42:13 systemd[31685]: Starting Basic System.

Dec 18 13:42:13 systemd[31685]: Reached target Basic System.

Dec 18 13:42:13 systemd[31685]: Starting Default.

Dec 18 13:42:13 systemd[31685]: Reached target Default.

Dec 18 13:42:13 systemd[31685]: Startup finished in 5ms.

Dec 18 13:42:13 systemd[31685]: Stopping Default.

Dec 18 13:42:13 systemd[31685]: Stopped target Default.

Dec 18 13:42:13 systemd[31685]: Stopping Basic System.

Dec 18 13:42:13 systemd[31685]: Stopped target Basic System.

Dec 18 13:42:13 systemd[31685]: Stopping Paths.

Dec 18 13:42:13 systemd[31685]: Stopped target Paths.

Dec 18 13:42:13 systemd[31685]: Stopping Timers.

Dec 18 13:42:13 systemd[31685]: Stopped target Timers.

Dec 18 13:42:13 systemd[31685]: Stopping Sockets.

Dec 18 13:42:13 systemd[31685]: Stopped target Sockets.

Dec 18 13:42:13 systemd[31685]: Starting Shutdown.

Dec 18 13:42:13 systemd[31685]: Reached target Shutdown.

Dec 18 13:42:13 systemd[31685]: Starting Exit the Session...

Dec 18 13:42:13 systemd[31685]: Received SIGRTMIN+24 from PID 31693 (kill).

Dec 18 13:42:54 systemd-timesyncd[535]: interval/delta/delay/jitter/drift 2048s/-0.000s/0.000s/0.003s/+3ppm

Dec 18 13:43:13 systemd[32540]: Starting Paths.

Dec 18 13:43:13 systemd[32540]: Reached target Paths.

Dec 18 13:43:13 systemd[32540]: Starting Timers.

Dec 18 13:43:13 systemd[32540]: Reached target Timers.

Dec 18 13:43:13 systemd[32540]: Starting Sockets.

Dec 18 13:43:13 systemd[32540]: Reached target Sockets.

Dec 18 13:43:13 systemd[32540]: Starting Basic System.

Dec 18 13:43:13 systemd[32540]: Reached target Basic System.

Dec 18 13:43:13 systemd[32540]: Starting Default.

Dec 18 13:43:13 systemd[32540]: Reached target Default.

Dec 18 13:43:13 systemd[32540]: Startup finished in 5ms.

Dec 18 13:43:13 systemd[32540]: Stopping Default.

Dec 18 13:43:13 systemd[32540]: Stopped target Default.

Dec 18 13:43:13 systemd[32540]: Stopping Basic System.

Dec 18 13:43:13 systemd[32540]: Stopped target Basic System.

Dec 18 13:43:13 systemd[32540]: Stopping Paths.

Dec 18 13:43:13 systemd[32540]: Stopped target Paths.

Dec 18 13:43:13 systemd[32540]: Stopping Timers.

Dec 18 13:43:13 systemd[32540]: Stopped target Timers.

Dec 18 13:43:13 systemd[32540]: Stopping Sockets.

Dec 18 13:43:13 systemd[32540]: Stopped target Sockets.

Dec 18 13:43:13 systemd[32540]: Starting Shutdown.

Dec 18 13:43:13 systemd[32540]: Reached target Shutdown.

Dec 18 13:43:13 systemd[32540]: Starting Exit the Session...

Dec 18 13:43:13 systemd[32540]: Received SIGRTMIN+24 from PID 32548 (kill).

Dec 18 13:43:27 kernel: [49158.991100] audit: type=1400 audit(1450442607.793:1547): apparmor="DENIED" operation="file_perm" profile="lxc-container-default" name="private/trace" pid=26896 comm="qmgr" requested_mask="r" denied_mask="r" fsuid=105 ouid=0

Dec 18 13:43:27 kernel: [49158.991104] audit: type=1400 audit(1450442607.793:1548): apparmor="DENIED" operation="file_perm" profile="lxc-container-default" name="private/trace" pid=26896 comm="qmgr" requested_mask="r" denied_mask="r" fsuid=105 ouid=0

This is my PVE version

proxmox-ve: 4.1-26 (running kernel: 4.2.6-1-pve)

pve-manager: 4.1-1 (running version: 4.1-1/2f9650d4)

pve-kernel-4.2.6-1-pve: 4.2.6-26

pve-kernel-2.6.32-43-pve: 2.6.32-166

pve-kernel-4.2.2-1-pve: 4.2.2-16

pve-kernel-2.6.32-26-pve: 2.6.32-114

pve-kernel-4.2.3-2-pve: 4.2.3-22

lvm2: 2.02.116-pve2

corosync-pve: 2.3.5-2

libqb0: 0.17.2-1

pve-cluster: 4.0-29

qemu-server: 4.0-41

pve-firmware: 1.1-7

libpve-common-perl: 4.0-41

libpve-access-control: 4.0-10

libpve-storage-perl: 4.0-38

pve-libspice-server1: 0.12.5-2

vncterm: 1.2-1

pve-qemu-kvm: 2.4-17

pve-container: 1.0-32

pve-firewall: 2.0-14

pve-ha-manager: 1.0-14

ksm-control-daemon: 1.2-1

glusterfs-client: 3.5.2-2+deb8u1

lxc-pve: 1.1.5-5

lxcfs: 0.13-pve1

cgmanager: 0.39-pve1

criu: 1.6.0-1

What can I do here?

SPQRInc · Dec 18, 2015

As workaround rebooting the LXC-containers helped. I will monitor if the error will come back again.

Edit: I mentioned, that the error appears even after rebooting the host machine. So I have to shutdown every machine and turn it on again.

SPQRInc · Dec 22, 2015

Bump: Is there any solution for this error?

The error appears 2-3x a week after the last upgrade. The problem is: I can not even restart single services:

Failed to kill control group: Transport endpoint is not connected

windinternet · Dec 22, 2015

The Transport error is very likely originated by lxcfs, which is a tool that can allow containers with systemd to still interact with the cgroup system and display correct uptime and memory for the container.

https://linuxcontainers.org/lxcfs/introduction/
https://s3hh.wordpress.com/2015/02/23/introducing-lxcfs/

Is there anything special displayed on the host with:

Code:

journalctl -u lxcfs

Also I see some qmgr errors with apparmor DENIED. There is only a few days now a newer kernel on the no subscriptions packages which may fix that error.

SPQRInc · Dec 22, 2015

Hi windinternet,

thanks a lot for your reply. I installed the new kernel - now I'm waiting for reboot and double-check if the problem persists.

This is the output of journalctl -u lxcfs

Dez 20 19:10:59 wirtssystem lxcfs[919]: send_creds: failed at sendmsg: No such process

Dez 20 19:11:01 wirtssystem lxcfs[919]: send_creds: Error getting reply from server over socketpair

Dez 20 19:11:01 wirtssystem lxcfs[919]: do_read_pids: failed to ask child to exit: No such process

Dez 20 19:11:01 wirtssystem lxcfs[919]: Failed to select for scm_cred: No such file or directory

Dez 20 19:11:04 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/102/system.slice/apache2.service: Device or resource busy

Dez 21 02:10:44 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/101/system.slice/apache2.service: Device or resource busy

Dez 21 02:14:20 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/103/system.slice/apache2.service: Device or resource busy

Dez 21 02:33:30 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/105/system.slice/apache2.service: Device or resource busy

Dez 21 02:35:40 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/104/system.slice/apache2.service: Device or resource busy

Dez 21 02:37:16 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/102/system.slice/apache2.service: Device or resource busy

Dez 21 03:04:42 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/102/system.slice/apache2.service: Device or resource busy

Dez 21 03:16:31 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/102/system.slice/apache2.service: Device or resource busy

Dez 21 03:16:35 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/102/system.slice/apache2.service: Device or resource busy

Dez 21 09:42:59 wirtssystem lxcfs[919]: send_creds: failed at sendmsg: No such process

Dez 21 09:43:01 wirtssystem lxcfs[919]: send_creds: Error getting reply from server over socketpair

Dez 21 09:43:01 wirtssystem lxcfs[919]: do_read_pids: failed to ask child to exit: No such process

Dez 21 09:43:01 wirtssystem lxcfs[919]: Failed to select for scm_cred: No such file or directory

Dez 21 09:51:09 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/101/system.slice/apache2.service: Device or resource busy

Dez 21 10:03:17 wirtssystem lxcfs[919]: send_creds: failed at sendmsg: No such process

Dez 21 10:03:19 wirtssystem lxcfs[919]: send_creds: Error getting reply from server over socketpair

Dez 21 10:03:19 wirtssystem lxcfs[919]: do_read_pids: failed to ask child to exit: No such process

Dez 21 10:03:19 wirtssystem lxcfs[919]: Failed to select for scm_cred: Success

Dez 21 10:03:19 wirtssystem lxcfs[919]: send_creds: failed at sendmsg: No such process

Dez 21 10:03:21 wirtssystem lxcfs[919]: send_creds: Error getting reply from server over socketpair

Dez 21 10:03:21 wirtssystem lxcfs[919]: Failed to select for scm_cred: Success

Dez 21 10:03:21 wirtssystem lxcfs[919]: do_read_pids: failed to ask child to exit: No such process

Dez 21 10:04:01 wirtssystem lxcfs[919]: send_creds: failed at sendmsg: No such process

Dez 21 10:04:03 wirtssystem lxcfs[919]: Failed to select for scm_cred: No such file or directory

Dez 21 10:04:03 wirtssystem lxcfs[919]: send_creds: Error getting reply from server over socketpair

Dez 21 10:33:55 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/102/system.slice/apache2.service: Device or resource busy

Dez 21 11:06:45 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/101/system.slice/apache2.service: Device or resource busy

Dez 21 11:20:01 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/104/system.slice/mysql.service: Device or resource busy

Dez 21 11:20:03 wirtssystem lxcfs[919]: send_creds: failed at sendmsg: No such process

Dez 21 11:20:05 wirtssystem lxcfs[919]: Failed to select for scm_cred: Success

Dez 21 11:20:05 wirtssystem lxcfs[919]: send_creds: Error getting reply from server over socketpair

Dez 21 11:20:05 wirtssystem lxcfs[919]: do_read_pids: failed to ask child to exit: No such process

Dez 21 12:20:07 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/105/system.slice/apache2.service: Device or resource busy

Dez 21 12:21:24 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/105/system.slice/apache2.service: Device or resource busy

Dez 21 12:31:49 wirtssystem lxcfs[919]: send_creds: failed at sendmsg: No such process

Dez 21 12:31:51 wirtssystem lxcfs[919]: send_creds: Error getting reply from server over socketpair

Dez 21 12:31:51 wirtssystem lxcfs[919]: do_read_pids: failed to ask child to exit: No such process

Dez 21 12:31:51 wirtssystem lxcfs[919]: Failed to select for scm_cred: No such file or directory

Dez 21 12:34:23 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/101/system.slice/apache2.service: Device or resource busy

Dez 21 12:35:22 wirtssystem lxcfs[919]: send_creds: failed at sendmsg: No such process

Dez 21 12:35:24 wirtssystem lxcfs[919]: send_creds: Error getting reply from server over socketpair

Dez 21 12:35:24 wirtssystem lxcfs[919]: Failed to select for scm_cred: Success

Dez 21 12:35:57 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/101/system.slice/apache2.service: Device or resource busy

Dez 21 12:46:59 wirtssystem lxcfs[919]: recursive_rmdir: failed to delete /run/lxcfs/controllers/name=systemd/lxc/101/system.slice/apache2.service: Device or resource busy

Dez 21 16:43:46 wirtssystem lxcfs[919]: send_creds: failed at sendmsg: No such process

Dez 21 16:43:48 wirtssystem lxcfs[919]: send_creds: Error getting reply from server over socketpair

Dez 21 16:43:48 wirtssystem lxcfs[919]: do_read_pids: failed to ask child to exit: No such process

Dez 21 16:43:48 wirtssystem lxcfs[919]: Failed to select for scm_cred: No such file or directory

Dez 21 16:43:48 wirtssystem lxcfs[919]: send_creds: failed at sendmsg: No such process

Dez 21 16:43:50 wirtssystem lxcfs[919]: send_creds: Error getting reply from server over socketpair

Dez 21 16:43:50 wirtssystem lxcfs[919]: do_read_pids: failed to ask child to exit: No such process

Dez 21 16:43:50 wirtssystem lxcfs[919]: Failed to select for scm_cred: No such file or directory

Dez 21 16:43:50 wirtssystem lxcfs[919]: send_creds: failed at sendmsg: No such process

Dez 21 16:43:52 wirtssystem lxcfs[919]: send_creds: Error getting reply from server over socketpair

Dez 21 16:43:52 wirtssystem lxcfs[919]: do_read_pids: failed to ask child to exit: No such process

Dez 21 16:43:52 wirtssystem lxcfs[919]: Failed to select for scm_cred: No such file or directory

Dez 21 17:44:18 wirtssystem lxcfs[919]: send_creds: failed at sendmsg: No such process

Dez 21 17:44:20 wirtssystem lxcfs[919]: send_creds: Error getting reply from server over socketpair

Dez 21 17:44:20 wirtssystem lxcfs[919]: do_read_pids: failed to ask child to exit: No such process

Dez 21 17:44:20 wirtssystem lxcfs[919]: Failed to select for scm_cred: No such file or directory

Dez 21 17:44:20 wirtssystem lxcfs[919]: send_creds: failed at sendmsg: No such process

Dez 21 17:44:22 wirtssystem lxcfs[919]: send_creds: Error getting reply from server over socketpair

Dez 21 17:44:22 wirtssystem lxcfs[919]: do_read_pids: failed to ask child to exit: No such process

Dez 21 17:44:22 wirtssystem lxcfs[919]: Failed to select for scm_cred: Success

Dez 21 17:44:22 wirtssystem lxcfs[919]: send_creds: failed at sendmsg: No such process

Dez 21 17:44:24 wirtssystem lxcfs[919]: send_creds: Error getting reply from server over socketpair

Dez 21 17:44:24 wirtssystem lxcfs[919]: do_read_pids: failed to ask child to exit: No such process

Dez 21 17:44:24 wirtssystem lxcfs[919]: Failed to select for scm_cred: Success

Dez 21 22:17:34 wirtssystem lxcfs[919]: send_creds: failed at sendmsg: No such process

Dez 21 22:17:36 wirtssystem lxcfs[919]: send_creds: Error getting reply from server over socketpair

Dez 21 22:17:36 wirtssystem lxcfs[919]: do_read_pids: failed to ask child to exit: No such process

Dez 21 22:17:36 wirtssystem lxcfs[919]: Failed to select for scm_cred: No such file or directory

Edit: Character-limitation - full log is pasted here: http://d.pr/18cna

windinternet · Dec 23, 2015

Can you tie the recursive rm_dir errors to moments of shutdown of that container?

It's probably a race condition on shutdown, where lxc-stop tries to remove there dirs while the container is still hanging on to them. It sometimes can happen on start too, but only if the container is improperly configured. It might damage later starting of the container. If all containers are stopped you should be able to refresh lxcfs by restarting the lxcfs service.

SPQRInc · Dec 23, 2015

Its possible that the error appears on these events. I never saw it "live" - I always mentioned it later when I tried to restart services like dovecot or php5-fpm.

So you would suggest to stop all containers, restart lxcfs (service lxcfs restart) and start containers again?

What makes me crazy is that I already restarted the whole system, started the containers and they all are working fine for hours - sometimes for days.

windinternet · Dec 23, 2015

It may be that one container shutdown ruins things for other running containers. Or it maybe that these error messages are actually totally unrelated. In any case, not being able to restart services and problems with top indicate that lxcfs is not responding anymore, and consequently systemd cannot function because it works with cgroups which are emulated by lxcfs inside the container, and the same with different files for top and uptime.

If that is the case, trying a service lxcfs restart wouldn't hurt.

windinternet · Dec 23, 2015

I looked up some logs, and actually the recursive_rmdir error seems 'normal'.

The send_creds and other errors not however.

SPQRInc · Dec 23, 2015

Hello windinternet,

unfortunately I'm still not able to reproduce this error. I'm also unable to fix it without rebooting the whole machine.

I just do not have any idea how to fix that.. :-(

windinternet · Dec 23, 2015

Looking back at your software list, I notice that you have or had lxcfs 0.13-pve1. The current version is lxcfs 0.13-pve2. There have been some fixes in it. Also the newest kernel patch may help with the postfix errors that were visible in the logs.

It may be that you can't reproduce anymore because you did the updates.

SPQRInc · Dec 23, 2015

Sorry, I forgot to give you an update. I upgraded yesterday night to the following packages:

proxmox-ve: 4.1-28 (running kernel: 4.2.6-1-pve) p
ve-manager: 4.1-2 (running version: 4.1-2/78c5f4a2)
pve-kernel-4.2.6-1-pve: 4.2.6-28
pve-kernel-2.6.32-43-pve: 2.6.32-166
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-2.6.32-26-pve: 2.6.32-114
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-42
pve-firmware: 1.1-7
libpve-common-perl: 4.0-42
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-18
pve-container: 1.0-35
pve-firewall: 2.0-14
pve-ha-manager: 1.0-16
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1

The server already rebooted but the error occurred again.

windinternet · Dec 23, 2015

Okay. Is it an error with one container while others continue running? Or do all your systemd lxc containers get stuck, not able to restart services or display top? Did you get the send_creds errors again in the syslog? Any other errors around the same time in the syslog?

In the stuck container:
What is the output of systemctl --version? What is the content of /etc/*-release or /etc/issue?

SPQRInc · Dec 23, 2015

Hello and thanks a lot for your reply

Well, all containers are running, but all are having the same problem at the same time with top/htop, reloading/starting services using systemd.

The output of systemctl --version on one of the containers (having this error now)

systemd 215

+PAM +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ -SECCOMP -APPARMOR

Output of /etc/*-release:

cat /etc/*-release

PRETTY_NAME="Debian GNU/Linux 8 (jessie)"

NAME="Debian GNU/Linux"

VERSION_ID="8"

VERSION="8 (jessie)"

ID=debian

HOME_URL="http://www.debian.org/"

SUPPORT_URL="http://www.debian.org/support/"

BUG_REPORT_URL="https://bugs.debian.org/"

12.5.30 debian8.0.build1205150826.19

Output of /etc/issue

Debian GNU/Linux 8 \n \l

The syslog shows these lines if I grep for send_creds:

Dec 23 09:40:43 wirtssystem lxcfs[917]: send_creds: failed at sendmsg: No such process
Dec 23 09:40:45 wirtssystem lxcfs[917]: send_creds: Error getting reply from server over socketpair
Dec 23 10:32:10 wirtssystem lxcfs[5841]: send_creds: failed at sendmsg: No such process
Dec 23 10:32:12 wirtssystem lxcfs[5841]: send_creds: Error getting reply from server over socketpair
Dec 23 14:26:26 wirtssystem lxcfs[5841]: send_creds: failed at sendmsg: No such process
Dec 23 14:26:28 wirtssystem lxcfs[5841]: send_creds: Error getting reply from server over socketpair
Dec 23 15:08:48 wirtssystem lxcfs[26620]: send_creds: failed at sendmsg: No such process
Dec 23 15:08:48 wirtssystem lxcfs[26620]: send_creds: failed at sendmsg: No such process
Dec 23 15:08:51 wirtssystem lxcfs[26620]: send_creds: failed at sendmsg: No such process
Dec 23 15:08:53 wirtssystem lxcfs[26620]: send_creds: Error getting reply from server over socketpair
Dec 23 15:09:34 wirtssystem lxcfs[26620]: send_creds: failed at sendmsg: No such process
Dec 23 15:09:36 wirtssystem lxcfs[26620]: send_creds: Error getting reply from server over socketpair
Dec 23 15:09:36 wirtssystem lxcfs[26620]: send_creds: failed at sendmsg: No such process
Dec 23 15:09:38 wirtssystem lxcfs[26620]: send_creds: Error getting reply from server over socketpair
Dec 23 15:09:39 wirtssystem lxcfs[26620]: send_creds: failed at sendmsg: No such process
Dec 23 15:09:39 wirtssystem lxcfs[26620]: send_creds: failed at sendmsg: No such process
Dec 23 15:09:41 wirtssystem lxcfs[26620]: send_creds: Error getting reply from server over socketpair
Dec 23 15:10:20 wirtssystem lxcfs[26620]: send_creds: failed at sendmsg: No such process
Dec 23 15:10:22 wirtssystem lxcfs[26620]: send_creds: Error getting reply from server over socketpair
Dec 23 15:15:20 wirtssystem lxcfs[26620]: send_creds: failed at sendmsg: No such process
Dec 23 15:15:56 wirtssystem lxcfs[26620]: send_creds: failed at sendmsg: No such process
Dec 23 15:15:58 wirtssystem lxcfs[26620]: send_creds: Error getting reply from server over socketpair
Dec 23 15:41:23 wirtssystem lxcfs[920]: send_creds: failed at sendmsg: No such process
Dec 23 15:41:25 wirtssystem lxcfs[920]: send_creds: Error getting reply from server over socketpair
Dec 23 15:41:37 wirtssystem lxcfs[920]: send_creds: failed at sendmsg: No such process
Dec 23 15:44:23 wirtssystem lxcfs[920]: send_creds: failed at sendmsg: No such process
Dec 23 15:44:25 wirtssystem lxcfs[920]: send_creds: Error getting reply from server over socketpair
Dec 23 16:01:06 wirtssystem lxcfs[12474]: send_creds: failed at sendmsg: No such process
Dec 23 16:01:08 wirtssystem lxcfs[12474]: send_creds: Error getting reply from server over socketpair

So I did not restart the containers for a while now - maybe that's the reason why there are no more entries in syslog.

I already thought about faulty mounting-options. This is the fstab-file in a container:

cat /etc/fstab

proc /proc proc defaults 0 0

none /dev/pts devpts rw,gid=5,mode=620 0 0

none /run/shm tmpfs defaults 0 0

windinternet · Dec 24, 2015

No, I think it is lxcfs not being able to keep up with quickly spawning deamons and a stressed process scheduler that is hurting you. It uses a process in the container to read process ids, and normal processes get scheduled out also.

Maybe it helps to give the busiest container more CPU priority.

SPQRInc · Dec 24, 2015

Hello windinternet,

At the moment the CPU-limit is set to 12 and there are 2048 CPU-units for all 5 containers.

I thought it should be enough - maybe I should increase it to 4000 units?

windinternet · Dec 24, 2015

It's a relative setting. If you increase all, it has zero effect. You must interpret this as the weight you give a container with respect to other containers. If they all have the same weight, then they all get the same amount of cpu time.

SPQRInc · Dec 24, 2015

Okay i did not express it correctly: I could increase the values for the most CPU-intensive container, Right?

CPU-limit means the percentage of all CPU-time, correct?

windinternet · Dec 24, 2015

Maybe it would also make a difference to add a:

Code:

Nice=-20

In the Service section of the /etc/systemd/system/multi-user.target.wants/lxcfs.service file and restart the lxcfs service.

SPQRInc · Dec 24, 2015

Is it important where it is added?

[Unit]

Description=FUSE filesystem for LXC

ConditionVirtualization=!container

Before=lxc.service

[Service]

ExecStart=/usr/bin/lxcfs -f -s -o allow_other /var/lib/lxcfs/

KillMode=none

Restart=on-failure

ExecStop=/bin/fusermount -u /var/lib/lxcfs

[Install]

WantedBy=multi-user.target

Cannot open /proc/stat: Transport endpoint is not connected

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member