Good morning,
I have a Proxmox VE cluster with 3 nodes in hyperconvergence (Ceph).
lately I'm starting to get some strange syslog errors that refer to file locks timeout and ceph mon failures:
Is this something to worry about? Can anyone tell me what's causing this?
I specify that the servers are not suffering in terms of resources: all three nodes have a constant load and less than 10% compared to their resources (RAM, Disk, CPU). I also tried to carry out failover and migration tests, everything ends successfully in less than 5 seconds.
At the network level, each cluster has a 20Gbit (2x10 bond) uplink to the core switch and vlan-based traffic segmentation (HA has a dedicated VLAN).
Proxmox version is 7.4.3. We had the issue with 7.3.6 and we upgraded thinking about kernel bugs or similar in this version.
Thank you in advance,
Edwin.
I have a Proxmox VE cluster with 3 nodes in hyperconvergence (Ceph).
lately I'm starting to get some strange syslog errors that refer to file locks timeout and ceph mon failures:
Code:
[2023-05-05 06:54:15.000] pve02 pvescheduler[232656]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout
[2023-05-05 06:32:12.000] pve03 pvescheduler[256736]: replication: cfs-lock 'file-replication_cfg' error: got lock request timeout
[2023-05-05 04:15:22.000] pve01 pvescheduler[200412]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
[2023-05-05 04:15:15.000] pve03 pvescheduler[251472]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
[2023-05-05 04:15:11.000] pve02 pvescheduler[227323]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
[2023-05-05 04:00:13.000] pve03 pvescheduler[247824]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
[2023-05-05 01:14:07.000] pve01 ceph-mgr[28824]: 2023-05-05T01:14:07.732+0200 7fc102be6500 -1 failed for service _ceph-mon._tcp
[2023-05-05 01:14:07.000] pve01 systemd[1]: ceph-mgr@pve-bgp-rm01.service: Failed with result 'exit-code'.
[2023-05-05 00:28:15.000] pve01 pvescheduler[145218]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Is this something to worry about? Can anyone tell me what's causing this?
I specify that the servers are not suffering in terms of resources: all three nodes have a constant load and less than 10% compared to their resources (RAM, Disk, CPU). I also tried to carry out failover and migration tests, everything ends successfully in less than 5 seconds.
At the network level, each cluster has a 20Gbit (2x10 bond) uplink to the core switch and vlan-based traffic segmentation (HA has a dedicated VLAN).
Proxmox version is 7.4.3. We had the issue with 7.3.6 and we upgraded thinking about kernel bugs or similar in this version.
Thank you in advance,
Edwin.
Last edited: