[SOLVED] Reset Cluster HA-Resources

Nov 23, 2022
5
0
6
Hello there,

we are having troubles with our proxmox-Clusters and need your help.

We are running a PVE cluster with 29 Nodes and approxmiate 1500 VMs with configured HA. After a not yet understood failure the complete cluster rebooted and tried to start every Resource after quorum was reached. As the rebooting of the nodes took quite different amounts of time, the quorum returned and some resources were relocated to other nodes.
These Nodes then in regard restarted again due to high memory usage and the circle starts all over.

We now have all nodes of the cluster up and running and all VMs stopped, ceph is clean.

The Problem is now that the HA-State is kind of broken - see attached picture "HA-state-1.png"

What we have tried so far: As we are scared that if we fix the HA-Master node all Resources were started again so we tried to delete all HA-Resources via API - This worked for some resources but not for all.

At this moment it is not possible to stop,start,migrate any resource with HA configured. The Task "HA 123 - stop" is visible but the VM never gets stopped...

The Cluster is running pve-manager/7.4-3/9002ab8a at this moment. If there is any information i could provide just note it here and i will post it

Can someone provide a solution to completely remove all HA-Resources and get the cluster to a clean state?

Thank you in advance for any help
 

Attachments

  • HA-state-1.png
    HA-state-1.png
    165.6 KB · Views: 19

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!