pvescheduler: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout

ecolaizzi

New Member
May 4, 2023
4
0
1
Good morning,

I have a Proxmox VE cluster with 3 nodes in hyperconvergence (Ceph).
lately I'm starting to get some strange syslog errors that refer to file locks timeout and ceph mon failures:

Code:
[2023-05-05 06:54:15.000] pve02 pvescheduler[232656]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout
[2023-05-05 06:32:12.000] pve03 pvescheduler[256736]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout

[2023-05-05 04:15:22.000] pve01 pvescheduler[200412]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
[2023-05-05 04:15:15.000] pve03 pvescheduler[251472]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
[2023-05-05 04:15:11.000] pve02 pvescheduler[227323]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
[2023-05-05 04:00:13.000] pve03 pvescheduler[247824]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout

[2023-05-05 01:14:07.000] pve01 ceph-mgr[28824]: 2023-05-05T01:14:07.732+0200 7fc102be6500 -1 failed for service _ceph-mon._tcp
[2023-05-05 01:14:07.000] pve01 systemd[1]: ceph-mgr@pve-bgp-rm01.service: Failed with result 'exit-code'.

[2023-05-05 00:28:15.000] pve01 pvescheduler[145218]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout

Is this something to worry about? Can anyone tell me what's causing this?

I specify that the servers are not suffering in terms of resources: all three nodes have a constant load and less than 10% compared to their resources (RAM, Disk, CPU). I also tried to carry out failover and migration tests, everything ends successfully in less than 5 seconds.

At the network level, each cluster has a 20Gbit (2x10 bond) uplink to the core switch and vlan-based traffic segmentation (HA has a dedicated VLAN).
Proxmox version is 7.4.3. We had the issue with 7.3.6 and we upgraded thinking about kernel bugs or similar in this version.

Thank you in advance,
Edwin.
 
Last edited:
New syslogs:

Code:
[2023-05-07 22:00:25.000] pve02 ceph-mon[1800]: 2023-05-08T00:00:34.696+0200 7f601cfff700 -1 Fail to read '/proc/1211621/cmdline' error = (3) No such process
[2023-05-07 22:00:25.000] pve02 ceph-osd[2247]: 2023-05-08T00:00:34.696+0200 7f8f20910700 -1 Fail to open '/proc/1211621/cmdline' error = (2) No such file or directory
[2023-05-08 05:46:18.000] pve01 ceph-mgr[1282602]: 2023-05-08T07:46:18.302+0200 7f75512a6500 -1 failed for service _ceph-mon._tcp
[2023-05-08 05:46:18.000] pve01 systemd[1]: ceph-mgr@pve-bgp-rm01.service: Failed with result 'exit-code'.
[2023-05-08 05:46:18.000] pve01 systemd[1]: ceph-mgr@pve-bgp-rm01.service: Main process exited, code=exited, status=1/FAILURE
[2023-05-08 05:46:18.000] pve01 ceph-mgr[1282602]: failed to fetch mon config (--no-mon-config to skip)
 
Last edited:
Has anyone encountered similar errors? I would like to understand if this is a normal behaviour or not.
Thank you.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!