trying to acquire cfs lock 'file-replication_cfg'

Are you running a cluster? That message might be related to that the corosync between the nodes has communication issues. Can you check with `journalctl -u corosync` whether there are messages in there, and potentially restart corosync on the nodes?
 
Hi,

Yes I have two differents clusters on two differents networks. I see this line in my logs for nodes in both clusters. Unfortunately, there is nothing in the corosync log at the times where I see the error message.

Thank you!
 
Hi
I'am new here.

I have the same message as above, so I let myself post information about my problem in this post. When executing the command: pvesr status

I get information:

Code:
root @ pve2: ~ # pvesr status
trying to aquire cfs lock 'file-replication_cfg' ...
trying to aquire cfs lock 'file-replication_cfg' ...
trying to aquire cfs lock 'file-replication_cfg' ...
trying to aquire cfs lock 'file-replication_cfg' ...
trying to aquire cfs lock 'file-replication_cfg' ...
trying to aquire cfs lock 'file-replication_cfg' ...
error with cfs lock 'file-replication_cfg': got lock request timeout


Interesting logs with strace
Code:
root @ pve2: ~ # cat log.txt | grep -i permi
utimes ("/ etc / pve / priv / lock / file-replication_cfg", [{tv_sec = 0, tv_usec = 0}, {tv_sec = 0, tv_usec = 0}]) = -1 EACCES (Permission denied)
utimes ("/ etc / pve / priv / lock / file-replication_cfg", [{tv_sec = 0, tv_usec = 0}, {tv_sec = 0, tv_usec = 0}]) = -1 EACCES (Permission denied)
utimes ("/ etc / pve / priv / lock / file-replication_cfg", [{tv_sec = 0, tv_usec = 0}, {tv_sec = 0, tv_usec = 0}]) = -1 EACCES (Permission denied)


I currently have two nodes. Replication worked for several weeks. I do not know why and when it stopped and errors appeared as above.

Does anyone have any idea what I could check before I take more radical steps? Thank you in advance

I solved this by removing the contents of the replication.cfg file (I copied the content as a backup) and created replication entries in the GUI
 
Last edited:
I have the same problem now. Could this be a bug that came in with the latest updates? Now running 4.15.18-7-pve #1 SMP PVE 4.15.18-27 (Wed, 10 Oct 2018 10:50:11 +0200)

I have a quorum, Corosync is not reporting any problems, but I am not seeing any GUI status on replication and also getting

# pvesr status
trying to acquire cfs lock 'file-replication_cfg' ...
trying to acquire cfs lock 'file-replication_cfg' ...
trying to acquire cfs lock 'file-replication_cfg' ...
trying to acquire cfs lock 'file-replication_cfg' ...
error with cfs lock 'file-replication_cfg': got lock timeout - aborting command

Checking for file-replcation_cfg manually, it looks like this:

root@carrier:~# cd /etc/pve/priv/lock/
root@carrier:/etc/pve/priv/lock# ls -l
total 0
drwx------ 2 root www-data 0 Oct 28 20:05 file-replication_cfg


I am worried because I don't know whether my replication still works :-(

As a take hourly snapshots using znapzend (and these get replicated), I can tell that after 3am last night, replication seems to have stopped.

Who can help? I don't want to reconfigure my complex replication scenario as it contains 40 configured replications...
 
Last edited:
Hi,

Same issue for me here

Code:
root@tco-prox-03:~# systemctl status pvesr.service
● pvesr.service - Proxmox VE replication runner
   Loaded: loaded (/lib/systemd/system/pvesr.service; static; vendor preset: enabled)
   Active: failed (Result: exit-code) since Mon 2018-10-29 00:04:09 CET; 27s ago
  Process: 8355 ExecStart=/usr/bin/pvesr run --mail 1 (code=exited, status=17)
 Main PID: 8355 (code=exited, status=17)
      CPU: 320ms

Oct 29 00:04:04 tco-prox-03 pvesr[8355]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 29 00:04:05 tco-prox-03 pvesr[8355]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 29 00:04:06 tco-prox-03 pvesr[8355]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 29 00:04:07 tco-prox-03 pvesr[8355]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 29 00:04:08 tco-prox-03 pvesr[8355]: trying to acquire cfs lock 'file-replication_cfg' ...
Oct 29 00:04:09 tco-prox-03 pvesr[8355]: error with cfs lock 'file-replication_cfg': got lock request timeout
Oct 29 00:04:09 tco-prox-03 systemd[1]: pvesr.service: Main process exited, code=exited, status=17/n/a
Oct 29 00:04:09 tco-prox-03 systemd[1]: Failed to start Proxmox VE replication runner.
Oct 29 00:04:09 tco-prox-03 systemd[1]: pvesr.service: Unit entered failed state.
Oct 29 00:04:09 tco-prox-03 systemd[1]: pvesr.service: Failed with result 'exit-code'.

proxmox-ve: 5.2-2 (running kernel: 4.15.18-7-pve)
pve-manager: 5.2-9 (running version: 5.2-9/4b30e8f9)
pve-kernel-4.15: 5.2-10
pve-kernel-4.15.18-7-pve: 4.15.18-27
pve-kernel-4.15.17-1-pve: 4.15.17-9
ceph: 12.2.8-pve1
corosync: 2.4.2-pve5
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.0-8
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-40
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-30
libqb0: 1.0.1-1
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-2
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-20
pve-cluster: 5.0-30
pve-container: 2.0-28
pve-docs: 5.2-8
pve-firewall: 3.0-14
pve-firmware: 2.0-5
pve-ha-manager: 2.0-5
pve-i18n: 1.0-6
pve-libspice-server1: 0.12.8-3
pve-qemu-kvm: 2.11.2-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-36
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.11-pve1~bpo1

As fitful did, I have to clear what is inside replication.cfg and recreate replication tasks in the GUI. Unfortunately, after a while, it crashs again.
Thanks in advanced for your help.

Antoine
 
I confirm, the problem unfortunately, but it has appeared again. Recreate task helps for a moment, if at all