Node now in wait_for_agent_lock state no matter what I do

mgiammarco

Renowned Member
Feb 18, 2010
164
8
83
Hello,
I have rebooted a node of proxmox 5.4 cluster.
Now it is always in wait_for_agent_lock state.
Some virtual machines are in fence state.
Corosync works, I have looked in many logs without finding errors.
I receive via mail every minute these messages:

FENCE: Try to fence node 'pvehp2'
SUCCEED: fencing: acknowledged - got agent lock for node 'pvehp2'

and then nothing happens (no reboot)

What can I do?
Thanks,
Mario
 
It is doing it again.

Fenced node and other nodes gave the same result:

Code:
quorum OK
master pvedell (active, Fri Jan  8 18:19:10 2021)
lrm pvedell (active, Fri Jan  8 18:19:13 2021)
lrm pvedell1 (active, Fri Jan  8 18:19:06 2021)
lrm pvehpbig (active, Fri Jan  8 18:19:15 2021)
service vm:100 (pvedell, started)
service vm:101 (pvedell, started)
service vm:102 (pvedell, started)
service vm:103 (pvedell, started)
service vm:111 (pvedell, started)
service vm:113 (pvedell, started)
service vm:115 (pvedell1, started)
service vm:119 (pvehpbig, started)
service vm:120 (pvehpbig, started)
service vm:121 (pvedell, started)
service vm:122 (pvedell1, started)
service vm:124 (pvedell1, started)
service vm:131 (pvedell1, started)
service vm:140 (pvehpbig, started)
service vm:146 (pvedell1, started)
service vm:164 (pvedell1, started)
service vm:169 (pvedell, started)
service vm:179 (pvedell1, started)

Code:
Cluster information
-------------------
Name:             giammar1
Config Version:   24
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Jan  8 18:20:14 2021
Quorum provider:  corosync_votequorum
Nodes:            3
Node ID:          0x00000005
Ring ID:          1.1210
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.1.0.6
0x00000003          1 10.1.0.7
0x00000005          1 10.1.0.4 (local)
 
It is doing it again.
The original post was made well over a year ago, did you upgrade to Proxmox VE 6.3 during that time? What is your pveversion -v

Also, where there any changes made in the environment (e.g., network switch maintenance), seems odd that this happens now again (out of the blue)?
 
Upgraded to latest version. It seems that if I have many vm to migrate ( I set policy type migration for the cluster) it probably use too much time to reboot and so ha-manager gets crazy.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!