Should there always be lock files in the /var/lock/* folders?

bqq100

New Member
Jun 8, 2021
9
0
1
38
I was doing some server maintenance today and migrating some containers/VMs and twice I ran into issues with migration due to the following error:

Code:
TASK ERROR: can't lock file '/var/lock/pve-manager/pve-migrate-xxx' - got timeout

This happened with 2 different servers and 2 different containers. I went in and cleared out the pve-migrate-xxx.lock files and the migration completed without any issues.

Now I'm reviewing all of my proxmox servers and they all have quite a few lock files in place even though there are no active tasks. Some of the lock files may have even been in place for containers/VMs that did migrate successfully, but I'm not 100% sure of this.

Is it normal to see these lock files existing in the lxc/pve-manager/qemu-server folders even when there are no active migrations/config changes happening? If it's not normal, any idea why these files aren't being cleaned up? If it is normal, any idea why one might get stuck in a state that would prevent a migration?

Appreciate any insight! Thanks!
 

bqq100

New Member
Jun 8, 2021
9
0
1
38
Well another migration failed for the same issue.

Can someone please at least do an ls on the /var/lock/pve-manager folder and let me know if you see any files? It would be helpful to see this on nodes with and without replication running.

Thanks!
 

UdoB

Well-Known Member
Nov 1, 2016
249
61
48
Germany
Good morning,

actually I do find artefacts of finished replication from last night:
Code:
~# ls -al /var/lock/pve-manager/
total 0
drwxr-xr-x 2 root root 160 Jan 26 03:20 .
drwxrwxrwt 7 root root 280 Jan 25 14:21 ..
-rw-r--r-- 1 root root   0 Jan 26 03:20 pve-migrate-1103
-rw-r--r-- 1 root root   0 Jan 25 13:51 pve-migrate-1114
-rw-r--r-- 1 root root   0 Jan 26 01:50 pve-migrate-1122

In the Gui the State is "OK" and the task log states:
Code:
Header
Proxmox
Virtual Environment 7.1-10
Virtual Machine 1103 (cloud) on node 'pvee'
Logs
()
2022-01-27 03:20:01 1103-0: start replication job
2022-01-27 03:20:01 1103-0: guest => VM 1103, running => 126610
...
...
2022-01-27 03:21:42 1103-0: (remote_finalize_local_job) delete stale replication snapshot '__replicate_1103-0_1643163602__' on ssd1:vm-1103-disk-1
2022-01-27 03:21:42 1103-0: end replication job


The dates are strange: the lockfile is from the night before while the task log is new...
Best regards
 
Last edited:
  • Like
Reactions: bqq100

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
7,292
1,340
164
the lockfile will be created the first time that lock is obtained - it's never removed, and shouldn't be removed manually either - as removing it would allow a second process to obtain the lock (the first one still holds it, but with a reference to the removed file!). this particular lock file is for keeping replication and migration from running simultaneously - so likely you had a replication running in the background that took too long for migration to wait.
 
  • Like
Reactions: bqq100

bqq100

New Member
Jun 8, 2021
9
0
1
38
the lockfile will be created the first time that lock is obtained - it's never removed, and shouldn't be removed manually either - as removing it would allow a second process to obtain the lock (the first one still holds it, but with a reference to the removed file!). this particular lock file is for keeping replication and migration from running simultaneously - so likely you had a replication running in the background that took too long for migration to wait.

Replication for the container that failed migration typically runs < 15 seconds and at least one of the times it failed I tried a few minutes later with the same error message.

In any case, I just removed replication for the container and there are no pending tasks for it, but there is still a lock file. If there is no replication and no migration, shouldn't the lock file be removed?
 

fabian

Proxmox Staff Member
Staff member
Jan 7, 2016
7,292
1,340
164
no - like I said, the lock file is never removed (well, some of them are in paths that are gone upon reboot ;)). this is okay and expected.

edit: to make this more clear - the lock file existing doesn't mean anything is locked (it might be, or might just have been at some point in the past), locking is an extra operation that just uses that file path and requires the file to be there.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!