LXC Containers not mounting local mounts

memyselfandi

New Member
Nov 9, 2021
2
0
1
49
We have several ProxmoxVE 6.2-6 hosts which are running LXC containers (Centos 7) which house an application. Each of the containers has four mounts from the underlying storage. Some of these mounts are common between the containers (e.g. for data which they only read), others are individual (e.g. logging directories which are defined with their container ID in the underlying mounted path). The storage is all local, it is not mounted onto the server from elsewhere. It is not clustered. Servers are Dell R740XD and storage is all SSD.

On a regular basis we want need to replace the container image (for the usual sorts of reasons, e.g. patching). At which point the application is stopped across these containers. The containers themselves are stopped and destroyed (using pct commands). The underlying data disk structures which serve the mounts is not touched. Nothing is deleted or removed from there.

The containers are then created using a script which does the following:

pct create ${ID} ${IMAGE} -cores ${CPU_CORES} -hostname ${HOSTNAME} -memory ${RAM_MB} -net0 ${NET0} -onboot 1 -ostype ${OSTYPE} -ssh-public-keys ${KEY}
pct resize ${ID} rootfs ${DISK_GB}G

mkdir -p ${DATA_HOST_MOUNT}
pct set ${ID} -mp0 ${DATA_HOST_MOUNT},mp=${DATA_CONT_MOUNT}
sleep 1
mkdir -p ${MIRROR_HOST_MOUNT}
pct set ${ID} -mp1 ${MIRROR_HOST_MOUNT},mp=${MIRROR_CONT_MOUNT}
sleep 1
mkdir -p ${QUER_HOST_MOUNT}
pct set ${ID} -mp2 ${QUER_HOST_MOUNT},mp=${QUER_CONT_MOUNT}
sleep 1
mkdir -p ${LOGS_HOST_MOUNT}
pct set ${ID} -mp3 ${LOGS_HOST_MOUNT},mp=${LOGS_CONT_MOUNT}

This is sequence is run at 1 min intervals to create each container in turn. The variables are being passed correctly and I would note that the mkdir is unnecessary really after the first intiial creation as, as said before, the directory will already exist in subsequent creations. The mount points within the containers already exist in the image.

The problem is that if we bring up a set of 21 containers in this way then ~17 of them will fail to mount one or more of the mounts, frequently mp2, and we don't know why. If you subsequently stop the container. Wait two minutes. and then start it again. Then on the restart it will mount all of its mounts correctly.

Does anyone have any ideas what is going on here and what logs we can look into find more information about why this is failing in the hope of stopping it happening?
 
hi,

couple of questions:

The problem is that if we bring up a set of 21 containers in this way then ~17 of them will fail to mount one or more of the mounts, frequently mp2, and we don't know why. If you subsequently stop the container. Wait two minutes. and then start it again. Then on the restart it will mount all of its mounts correctly.
* does it make a difference if you start the containers sequentially with a delay? e.g. wait 30 seconds for next container

Some of these mounts are common between the containers (e.g. for data which they only read)
* have you tried passing ro flag to the mountpoint? e.g.:
mp0: local:105/vm-105-disk-1.raw,mp=/foo,backup=1,ro=1,size=8G

Does anyone have any ideas what is going on here and what logs we can look into find more information about why this is failing in the hope of stopping it happening?
for the failing container(s) it would be helpful to see the configuration file and the debug logs from container start.

to get debug logs run the container with:
* pct start CTID --debug

configuration files are located in /etc/pve/lxc/ directory

hope this helps
 
HI,

Putting a a longer delay between the creation of the containers does not appear to change what happens

We can't mark the mounts as read only as it causes issues with the application. Why they only read the data during normal usage the application assumes it has write access to update the data (and some of the areas need to be specifically writeable for updates (for instance the queries one (mp2) which is one which tends to have an issue and is specific to the container id.

An example (slightly redacted) configuration file is as follows:

arch: amd64
cores: 4
hostname: redacted
memory: 32768
mp0: /var/lib/vz/data/prod,mp=/mnt/disk1/var/lib/application/app-data
mp1: /var/lib/vz/mirror/prod,mp=/mnt/disk1/var/lib/application/app-mirror
mp2: /var/lib/vz/app/6110,mp=/mnt/disk1/var/lib/application/queries
mp3: /var/lib/vz/logs/6110,mp=/var/log/application
net0: name=eth0,bridge=vmbr0,gw=10.xxx.xxx.254,hwaddr=36:74:xx:xx:xx:xx,ip=10.xxx.xxx.110/24,rate=512,tag=xxx,type=veth
onboot: 1
ostype: centos
rootfs: local:6110/vm-6110-disk-0.raw,size=80G
swap: 512
lxc.cgroup.cpuset.cpus: 9,53,11,55
 
Putting a a longer delay between the creation of the containers does not appear to change what happens
i meant for starting the containers, it can make a difference when it's mounting the directories as it starts. try with a 30 second delay while starting a container before the next one starts.

once you reproduce the issue (container not starting) please post the debug log.

We can't mark the mounts as read only as it causes issues with the application. Why they only read the data during normal usage the application assumes it has write access to update the data (and some of the areas need to be specifically writeable for updates (for instance the queries one (mp2) which is one which tends to have an issue and is specific to the container id.
what about the shared mount where they read the data? can't it be marked ro or would that break the app?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!