Nothing works anymore "can't lock file '/run/lock/lxc/pve-config-xxx.lock"

Noobie

Member
Aug 4, 2023
32
2
13
Hello everyone,

I have integrated an SMB share into an lxc. I then deleted this network folder using gui. However, this folder is still visible as unused disk 0 under the resources.

Deleting it does not work, then the error message appears:
Code:
can't lock file '/run/lock/lxc/pve-config-190.lock' - got timeout (500)

Then I shut down the container and wanted to delete this unused disk 0 again via GUI. Error message:
Code:
can't lock file '/run/lock/lxc/pve-config-190.lock' - got timeout (500)

Then I tried to restart the container and the error message appears:
Code:
TASK ERROR: can't lock file '/run/lock/lxc/pve-config-190.lock' - got timeout

Restoring a backup does not work either
Code:
unable to restore CT 190 - can't lock file '/run/lock/lxc/pve-config-190.lock' - got timeout (500)

Deleting the container does not work either
Code:
TASK ERROR: can't lock file '/run/lock/lxc/pve-config-190.lock' - got timeout

So now nothing works anymore... Do you have any ideas to solve this issue?
 
Last edited:
Hi,
i had a similar issue with smb in a debian lxc. After upgrading to the latest version, it just hang with 100% cpu usage after a while.
Only deleting the .lock file and rebooting the proxmox node helped.
I'm still a bit dissapointed in proxmox (8.2.4) here, that it cannot forcefully stop a container that does not play rules.

Will revert back to an older (05/2023) smb version which seemed to be running finer (only crashes after a few days which needed a restart of the lxc)
 
Last edited:
I just had this happen to me.

It wasn't very straightforward, but this seems to work:
  1. Kill the lxc-start Process that started the Container
  2. Manually remove the Lock file
  3. Use lxc-stop with --kill and --nolock Arguments to (try) to stop the Container (most likely it already stopped)
  4. Use pct unmount - will likely fail due to no lock file existing anymore
  5. Mount using pct mount, then pct unmount again
However there is still something in /sys/fs/cgroup/lxc/103/ns/user.slice that prevents the container from starting.

I tried killing manually the Processes indicated by ps, but they are already defunct, but not getting removed :( .

Code:
#!/bin/bash

# Get Container ID from Arguments
ctid="$1"

# Ask User Interactively
if [[ -z "${ctid}" ]]
then
    read -p "Enter the Container ID to Debug: " ctid
fi

# Get PID of the "/usr/bin/lxc-start" Process that started the Container
# This is the only Thing that seems to kill the Container, even "lxc-stop --kill --nolock" just hanged without doing anything
LXC_START_PID=$(ps aux | grep lxc | grep "/usr/bin/lxc-start -F -n ${ctid}" | grep -v grep | head -n1 | awk '{print $2}')

# Kill Process
kill -9 "${LXC_START_PID}"

# Remove Lock
rm -f "/run/lock/lxc/pve-config-${ctid}.lock"

# Stop while killing Container (might already be killed)
lxc-stop --kill --nolock -n "${ctid}" --logfile "/tmp/${ctid}_stop_debug.log" -l trace

# Try to unmount first of all
pct unmount "${ctid}"

# Try to mount, then unmount, which seems to "clean" the Situation by recreating the Lock File, which then allows the unmount Command to succeed
pct mount "${ctid}"
pct unmount "${ctid}"