Node now in wait_for_agent_lock state no matter what I do

mgiammarco · Nov 3, 2019

Hello,
I have rebooted a node of proxmox 5.4 cluster.
Now it is always in wait_for_agent_lock state.
Some virtual machines are in fence state.
Corosync works, I have looked in many logs without finding errors.
I receive via mail every minute these messages:

FENCE: Try to fence node 'pvehp2'
SUCCEED: fencing: acknowledged - got agent lock for node 'pvehp2'

and then nothing happens (no reboot)

What can I do?
Thanks,
Mario

t.lamprecht · Nov 5, 2019

mgiammarco said:
What can I do?

Can you post

Code:

ha-manager status
pvecm status

Of the "fenced" node and another node?

mgiammarco · Jan 8, 2021

It is doing it again.

Fenced node and other nodes gave the same result:

Code:

quorum OK
master pvedell (active, Fri Jan  8 18:19:10 2021)
lrm pvedell (active, Fri Jan  8 18:19:13 2021)
lrm pvedell1 (active, Fri Jan  8 18:19:06 2021)
lrm pvehpbig (active, Fri Jan  8 18:19:15 2021)
service vm:100 (pvedell, started)
service vm:101 (pvedell, started)
service vm:102 (pvedell, started)
service vm:103 (pvedell, started)
service vm:111 (pvedell, started)
service vm:113 (pvedell, started)
service vm:115 (pvedell1, started)
service vm:119 (pvehpbig, started)
service vm:120 (pvehpbig, started)
service vm:121 (pvedell, started)
service vm:122 (pvedell1, started)
service vm:124 (pvedell1, started)
service vm:131 (pvedell1, started)
service vm:140 (pvehpbig, started)
service vm:146 (pvedell1, started)
service vm:164 (pvedell1, started)
service vm:169 (pvedell, started)
service vm:179 (pvedell1, started)

Code:

Cluster information
-------------------
Name:             giammar1
Config Version:   24
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Jan  8 18:20:14 2021
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000005
Ring ID:          1.1210
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.1.0.6
0x00000003          1 10.1.0.7
0x00000005          1 10.1.0.4 (local)

t.lamprecht · Jan 12, 2021

mgiammarco said:
It is doing it again.

The original post was made well over a year ago, did you upgrade to Proxmox VE 6.3 during that time? What is your pveversion -v

Also, where there any changes made in the environment (e.g., network switch maintenance), seems odd that this happens now again (out of the blue)?

mgiammarco · Jan 24, 2021

Upgraded to latest version. It seems that if I have many vm to migrate ( I set policy type migration for the cluster) it probably use too much time to reboot and so ha-manager gets crazy.

Search

Search

Node now in wait_for_agent_lock state no matter what I do

mgiammarco

Renowned Member

t.lamprecht

Proxmox Staff Member

mgiammarco

Renowned Member

t.lamprecht

Proxmox Staff Member

mgiammarco

Renowned Member

We value your privacy