pvesr.service failed "error with cfs lock 'file-replication_cfg': no quorum!"

encore

Well-Known Member
May 4, 2018
108
1
58
37
Hi,

we are facing issues with one Node for some days now.
It has a red X beside the Nodename. When I restart corosync, it works again for ~ 1 minute. Then it fails to the raid X state again.

When that happens, pvesr.service fails:
systemctl status pvesr
● pvesr.service - Proxmox VE replication runner
Loaded: loaded (/lib/systemd/system/pvesr.service; static; vendor preset: enabled)
Active: failed (Result: exit-code) since Tue 2018-07-17 11:44:09 CEST; 6s ago
Process: 7757 ExecStart=/usr/bin/pvesr run --mail 1 (code=exited, status=13)
Main PID: 7757 (code=exited, status=13)
CPU: 516ms

Jul 17 11:44:04 captive005-74001 pvesr[7757]: trying to aquire cfs lock 'file-replication_cfg' ...
Jul 17 11:44:05 captive005-74001 pvesr[7757]: trying to aquire cfs lock 'file-replication_cfg' ...
Jul 17 11:44:06 captive005-74001 pvesr[7757]: trying to aquire cfs lock 'file-replication_cfg' ...
Jul 17 11:44:07 captive005-74001 pvesr[7757]: trying to aquire cfs lock 'file-replication_cfg' ...
Jul 17 11:44:08 captive005-74001 pvesr[7757]: trying to aquire cfs lock 'file-replication_cfg' ...
Jul 17 11:44:09 captive005-74001 pvesr[7757]: error with cfs lock 'file-replication_cfg': no quorum!
Jul 17 11:44:09 captive005-74001 systemd[1]: pvesr.service: Main process exited, code=exited, status=13/n/a
Jul 17 11:44:09 captive005-74001 systemd[1]: Failed to start Proxmox VE replication runner.
Jul 17 11:44:09 captive005-74001 systemd[1]: pvesr.service: Unit entered failed state.
Jul 17 11:44:09 captive005-74001 systemd[1]: pvesr.service: Failed with result 'exit-code'.

Any ideas what can cause that issue?
 
Hi
I guess you have a network problem with multicast may be igmp snooping is not working correctly.
 
Hi,

They are correct when they say it is a multicast issue. Please reference the link below and follow the instructions for you network switch equipment.

https://pve.proxmox.com/wiki/Multicast_notes

My switch equipment is a Netgear M4300 stack and even though the defaults say it is enabled "globally" there are a few more config items that need to be made. Once I did those my issues disappear.