HA problems

jms

New Member
Jul 21, 2017
5
0
1
61
Hi all

I have just picked up supporting a proxmox setup with no prior experience and am having an issue with HA.

We have a master and two slaves and if I create a new VM from a template, migrate to one of the slaves etc. I can start it up when it is not add to HA. If I stop the VM, add to HA, and then try to start the VM it doesn't work although in the web interface it says it returns an OK status.

The only change I have made recently was to add a NFS share for backups, did a vzump of a VM, deleted the VM and then did a restore of the VM on the slave it was on. I can't see how this would break it but then what do I know :)

Any pointers would be much appreciated
 
Just to update this the master and two slaves are showing that the pve-ha-crm has failed. I restarted on all nodes but it fails again.

Any hints ?
 
Just to update this the master and two slaves are showing that the pve-ha-crm has failed. I restarted on all nodes but it fails again.

Any hints ?

there is no master/slave concept. I suggest you read some docs, maybe this helps to find the right settings and help to understand the issue.

https://pve.proxmox.com/wiki/Cluster_Manager
https://pve.proxmox.com/wiki/Proxmox_Cluster_File_System_(pmxcfs)
https://pve.proxmox.com/wiki/High_Availability

also make sure you run latest version (4 or 5)
 
Thanks for the clarification.

So I read the links and a bit more and am still not sure how to proceed. The status of pve-ha-crm is stopped on all 3 nodes and pve-ha-lrm is stopped on one of them. A "ha-manager status" shows services that are not there anymore eg VM 250. So when I do a "ha-manager disable 250" it quite rightly tells me there is no such service.

Can you manually edit /etc/pve/ha/manager_status to remove any stale information and do you need to stop both pve-ha-lrm and pve-ha-crm before doing this.

And would this fix the problem ?

If it would what effect does stopping and starting those daemons have on the running VMs ?
 
For those who may be still wondering, the short answer is yes, you need to manually write the current state to /etc/pve/ha/manager_status, stop pve-ha-lrm then pve-ha-crm and start them again in the same order on each node.

I had a similar issue, with 1/3 nodes marked "fence" while the other nodes were correctly marked "online", no matter how many times and in which order I restarted nodes and services.
HA wasn't working no matter the configuration - some VMs/containers were marked "starting", other "started", but they weren't moving anywhere neither by group preference, manually or by fencing.

I even went through an upgrade from Proxmox 5.4 to 6.0 and corosync 2.x to 3.x and added the fourth node while having this issue, fortunately it didn't seem to bother anything else than the HA.

To be on the safe side I emptied the HA configuration and moved all VMs/containers to a single node before starting.
While two nodes where marked "idle" the other two where "active".
All of the VMs and containers were marked as "ignored" both in the Web GUI and "ha-manager status".

After fixing the file (put "online" instead of "fence" for the stale node and removed the stale service_status object entirely) and restarting all the services, all nodes were "idle" and "ha-manager status" finally returned empty.

I was then able to restore the HA configuration and finally see the VMs and containers being moved to the right locations by preference.
This was the only way to have "ha-manager status" match "ha-manager config" and have a correct HA behaviour.

Hope this helps anyone in a similar situation.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!