Permission error w/ sockets inside CT since migration to PVE 4.1

I've migrated a number of hosts from PVE 3.4 to PVE 4.1 and I followed the instructions (stop CT, backup CT, copy backup, restore, reconfigure network).

Most of my hosts use an internal init script to start an application server. That application server creates a socket, to which an internal Nginx web server tries to connect. This worked before.

With PVE 4.1 the web server isn't able to read/write to that socket anymore. The socket gets created with permissions root:root 0755, whereas it should be root:root 0777. When I use pct exec <vid> chmod 777 <socket_path> it works again. I've tried every way I could think of to set umask and/or file permissions but wasn't able to successfully create that socket from the start.

This happens both with my own CTs which were based on a Debian 7 OpenVZ template, as well as a gitlab installation which uses different init scripts.

Since I've seen loads of AppArmor messages in /var/log/message reporting the denial of mounts from the CTs, I've added lxc.aa_profile: unconfined to LXC configs, but this did not help. I've also noticed that unlike my OpenVZ setup entering a LXC does not create a fresh login environment, eg set paths from .bash_profile.
  1. Do I need to have those lxc.aa_profile entries in my LXC config?
  2. Why are my UNIX sockets created with different permissions than before?
  3. Is it intended that LXC won't create a fresh environment when entering a container?
I need to fix this as soon as I can because none of my app servers will work after a restart right now, so any help is greatly appreciated. Regards

Christian
 
Can you try adding a line to your /etc/pve/lxc/{vmid}.conf file for testing this? If you add something yourself to a config file in /var/lib/lxc/{vmid}/config it will be erased by proxmox on startup.

Code:
lxc.aa_profile: unconfined

...And shutdown and start the container after that..

aa-status should show that it is not confined. If you lookup the pid of the init process of your container, it should not be listed in the enforced section.
 
I've already added the stance to /etc/pve/lxc/<vid>.conf and can confirm that is is included in /var/lib/lxc/<vid>/config, too.

I'm not sure about aa-status, though. My process list shows [lxc monitor] /var/lib/lxc 210, which is parent of an init process at run level 2. I assume thats init of VID 210? However, the output of aa-status does not show the process id of that init process, neither in enforced nor elsewhere.
 
Actually the init process is a child of the lxc-start process. It appears that the lxc-start process itself remains confined by the usr.bin.lxc-start profile.

Anyway, it doesn't seem to be an apparmor issue here after having done this test. I am wondering why the socket permissions are now created differently, if before they got set to 777. Maybe on OpenVZ the init process started with a different umask, and this is exposing some assumption in the init script of your application. If it sets its own umask, then you would not have a problem, me thinks.

What init system are you using in debian 7?
 
I have checked and the umask of init process did not change. Just to be sure, if you look in /var/log/syslog of the host, you are not seeing messages like this?

pve007 kernel: [ 3007.731120] audit: type=1400 audit(1450625064.487:69): apparmor="DENIED" operation="file_perm" profile="lxc-container-default" name="public/showq" pid=9083 comm="postqueue" requested_mask="r" denied_mask
 
I've seen AppArmor errors regarding postfix and others, but nothing about sockets. Here's /var/log/messages while starting a LXC:

Code:
Dec 20 17:41:25 s6 pct[23333]: <root@pam> starting task UPID:s6:00005B26:001E52C8:5676DA35:vzstart:106:root@pam:
Dec 20 17:41:25 s6 kernel: [19890.792165] IPv6: ADDRCONF(NETDEV_UP): veth106i0: link is not ready
Dec 20 17:41:25 s6 pct[23333]: <root@pam> end task UPID:s6:00005B26:001E52C8:5676DA35:vzstart:106:root@pam: OK
Dec 20 17:41:26 s6 kernel: [19891.169835] audit: type=1400 audit(1450629685.992:252): apparmor="ALLOWED" operation="mount" info="failed flags match" error=-13 profile="lxc-container-default" name="/sys/" pid=23494 comm="mount" flags="rw, nosuid, nodev, noexec, remount"

… after which I need to reset the permissions again on the socket used by gitlab/nginx:

Code:
root@host:~# pct exec $VID chmod 777 /var/opt/gitlab/gitlab-rails/sockets/gitlab.socket
 
I have tried a standard debian 7 from proxmox 4.1 (which is 64-bit). I can say with certainty that to run it correctly, or at least not to run into socket problems rightaway, you MUST set lxc.aa_profile to unconfined in your vm config.

If you make a new Debian 7 and don't change the apparmor profile, then utilizing postqueue you will see this:

Code:
/>postqueue -p
Mail queue is empty
#
/>postqueue -p
postqueue: warning: close: Permission denied

And in /var/log/syslog of the host you will see:
pve007 kernel: [ 3007.731120] audit: type=1400 audit(1450625064.487:69): apparmor="DENIED" operation="file_perm" profile="lxc-container-default" name="public/showq" pid=9083 comm="postqueue" requested_mask="r" denied_mask

So it seems you can only connect once to a socket with the default apparmor profiles running. :-(
 
This can certainly curdle your milk.

LXC depends heavily on apparmor to keep the inhabitants of the container confined to their litte inside world and not affect the host OS. If you are forced to disable apparmor protection AND like proxmox you run a privileged container then there is little protection left from manipulating the host system.

The problem with apparmor and sockets seems to be, that the current profile for lxc containers cannot deal with sockets that are created by programs that chroot themselves first. In the case of postfix it will chroot to /var/spool/postfix and make a socket in the path public/showq. Apparmor misses the / in front of the socket name and misunderstands that it is outside the filesystem space. This put me on the trace, after I looked up what a disconnected flag meant in an apparmor profile:

https://bugs.launchpad.net/apparmor/+bug/1378054

The disconnected flag is supposed so solve this it seems, but it does so only once or not both inside and outside the chroot or something like that.

See also:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1446906
 
Last edited:
One further observation,

If you store the LXC container on an Ext4 storage it behaves worse. I wanted to test the flag chroot_attach with the apparmor profile, but if your container storage is in a raw file on ext4, the socket cannot even get created with the active apparmor confinement.

My first test was on a ZFS storage, and there the apparmor profile at least seems to allow the creation of the socket file, apparently because it can compare the paths with the path on disk. If there is a solution then it may be easier with ZFS, which also allows easy snapshots for the container.
 
The same is true for socket based communication eg for mysql servers. This is ridiculous: Migrating perfectly working OpenVZ containers to LXC renders at least one service in each container unusable.

Does disabling AppArmor help? How would I do that, given the fact that PVE 4.1 lists AppArmor as a requirement?
 
I took another look at this problem.

Tested lxc on Ubuntu trusty, and the postfix test *just* worked. Also tried with a 4.2.6 kernel on Ubuntu 14.04 and it also worked. The apparmor profiles look pretty much the same. Starting on a loopback mounted file as disk also worked.

So it seems the problem should be fixable.
 
  • Like
Reactions: datenimperator
Tested lxc on Ubuntu trusty, and the postfix test *just* worked. Also tried with a 4.2.6 kernel on Ubuntu 14.04 and it also worked. The apparmor profiles look pretty much the same. Starting on a loopback mounted file as disk also worked.

So it seems the problem should be fixable.

Thanks for your tests, that sounds encouraging. Maybe Proxmox staff can spot a difference in the source code?
 
This happen to me.

After investigate I found Proxmox 4.1 created VM with acltype=posixacl

Code:
# zfs get -r acltype /rpool/vm
NAME                        PROPERTY  VALUE     SOURCE
rpool/vm                    acltype   off       default
rpool/vm/subvol-100-disk-1  acltype   off       default
rpool/vm/subvol-101-disk-1  acltype   off       default
rpool/vm/subvol-102-disk-1  acltype   off       default
rpool/vm/subvol-103-disk-1  acltype   off       default
rpool/vm/subvol-104-disk-1  acltype   off       default
rpool/vm/subvol-105-disk-1  acltype   off       default
rpool/vm/subvol-106-disk-1  acltype   off       default
rpool/vm/subvol-107-disk-1  acltype   posixacl  local

vm 107 jest created with Proxmox 4.1
So, I turn off acl by this command

#zfs inherit acltype rpool/vm/subvol-127-disk-1

restart vm and everything is working again.
 
@bank, interesting. I did try myself to remove the Posix acls in the /var/spool/postfix path in the debian install, but that didn't seem to matter in itself. Maybe with a ZFS install this would do the trick.

Also, I can reproduce the postqueue close warning problem on a clean Ubuntu Wily, which has about the same kernel and LXC stack as Proxmox, but with only Ubuntu customizations and not the specific Proxmox modifications. I don't know if the mysql socket problem is connected, but at least some of this seems to be upstream and appeared somewhere between the kernel 3.10, lxc 1.0.8 stack and kernel 4.2, lxc 1.1.5 with apparmor confinement enabled. Also tested kernel 4.2.6 with lxc 1.0.8 stack which worked.
 
Last edited:
@bank,

What problems were you having? I tested your acltype suggestion on a ZFS version of the debian 7 template, and I still get the postqueue: close: warning problem. Only with ZFS it doesn't happen on the first call, like it does with a LXC raw image, but from the second call onward.
 
Hi,

I'me having exactly the same problem as @datenimperator with an lxc (containing a gitlab instance) container that was migrated to proxmox. The file /var/opt/gitlab/gitlab-rails/sockets/gitlab.socket did not get the correct permission (it was working before the migration). I tried to add the lxc.aa_profile: unconfined with no luck.

I'm running the storage as a ext4 folder. So no ZFS.

What is strange is I can change the permission using root from inside the running container. But for some reason, it does not work automatically.
 
  • Like
Reactions: datenimperator

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!