HA vm always starting status

I don't understand, I know that you have a subscription, the banner under your member name (on the right) shows this.


What do you mean by that?

Hm, do you maybe refer to my signature under my posts? The signature is added to all my posts with the same text.

What does it say in the logs? Can you please post the HA configuration? Is the 'pve-ha-crm.service' running on the node?

Sorry, I'd answered with a screen very small and I saw that the banner ;).

For the problem, I updated proxmox with corosync: 2.4.4-pve1. This time, the HA was no longer working on any nodes. Yet, the monitoring was green. But syslog new message : pve-ha-crm master seems offline ... So, I removed "/etc/pve/ha/manager_status" then "pve-ha-crm stop" then "pve-ha-crm start". The problem has been corrected.

Honestly, I do not understand the reason. I know that during the previous update the servers restarted spontaneously. I think the previous update put the HA in an unstable state.

This problem is disconcerting because the indicators are green width very little information in the logs.

You may be able to make a more logical explanation than mine.

Sincerely
 
You may be able to make a more logical explanation than mine.

not without any information and logs from the time this happened...
Code:
journalctl -u pve-ha-crm -u pve-ha-lrm -u corosync
or syslog(s) in /var/log (if, e.g., persistent journal isn't enabled)
 
I'd attached the logs. Not persistant with the message : master seems offline.

so I put the syslogs of 30/11 and 1/12

Sincerely
 

Attachments

  • dump-of-journalctl.zip
    2.9 KB · Views: 0
  • syslog.4.gz
    17 KB · Views: 1
  • syslog.5.gz
    14.5 KB · Views: 0
Hi, I always have this problem I can't fix:

root@vs1:~# ha-manager status --verbose
quorum OK
master vs3 (active, Thu Mar 28 11:05:13 2019)
lrm vs1 (idle, Thu Mar 28 11:05:17 2019)
lrm vs2 (active, Thu Mar 28 11:05:17 2019)
lrm vs3 (active, Thu Mar 28 11:05:13 2019)
full cluster state:
{
"lrm_status" : {
"vs1" : {
"mode" : "active",
"results" : {},
"state" : "wait_for_agent_lock",
"timestamp" : 1553767517
},
"vs2" : {
"mode" : "active",
"results" : {
"HjEHP6tjyxqCk8IiqQTCGg" : {
"exit_code" : 0,
"sid" : "vm:125",
"state" : "started"
},
"M1pCy1AopoS3sfjuGv5GVA" : {
"exit_code" : 0,
"sid" : "vm:127",
"state" : "started"
},
"g5xPisRUOupFl30WEwtiLg" : {
"exit_code" : 7,
"sid" : "vm:124",
"state" : "started"
}
},
"state" : "active",
"timestamp" : 1553767517
},
"vs3" : {
"mode" : "active",
"results" : {
"ZUMGXbrIgu72xX8C5ChnIw" : {
"exit_code" : 0,
"sid" : "vm:126",
"state" : "started"
},
"bIEKaH6FyIvJxQFinmshxA" : {
"exit_code" : 0,
"sid" : "vm:114",
"state" : "started"
}
},
"state" : "active",
"timestamp" : 1553767513
}
},
"manager_status" : {
"master_node" : "vs3",
"node_status" : {

"vs1" : "fence",
"vs2" : "online",
"vs3" : "online"
},
"service_status" : {},
"timestamp" : 1553767513
},
"quorum" : {
"node" : "vs1",
"quorate" : "1"
}
}

I don't know how can I fix vs1 as online instead of fence.
Can someone help me, please?
 
Sorry but for me this doesn't solve the problem..

Could you please try a slightly altered fix like @jdelbecque proposed?

Code:
# on all nodes:
systemctl stop pve-ha-crm
# on a single node
rm /etc/pve/ha/manager_status
# again on all nodes
systemctl start pve-ha-crm

all CRMs need to be stopped before resetting the manager status, as else a running one may get to be the master and write it out again after you removed it from its memory (race condition).
 
  • Like
Reactions: kifeo
Sorry @t.lamprecht but I've read your answer just now. I solved the problem reinstalling the node and this fix my problem.
May be your solution could have solved my problem; surely I would have spent less time.
Tnx
 
Could you please try a slightly altered fix like @jdelbecque proposed?

Code:
# on all nodes:
systemctl stop pve-ha-crm
# on a single node
rm /etc/pve/ha/manager_status
# again on all nodes
systemctl start pve-ha-crm

all CRMs need to be stopped before resetting the manager status, as else a running one may get to be the master and write it out again after you removed it from its memory (race condition).
Solved here using this! ;-)
 
  • Like
Reactions: t.lamprecht

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!