I've some question about HA.

totae

Member
May 27, 2023
16
1
8
Hello,

I have a question regarding HA.

When the network is lost(worst case), nodes in the cluster are rebooted (I understand this is part of the HA process and watchdog behavior).

If I want to immediately disable HA to prevent nodes from rebooting, could you please suggest the correct command? Since I have many VMs in HA groups, I’d like to know the safest way.

For example, can I stop crm or lrm on all nodes?

I would appreciate your guidance.

Best Regards,
 
When the network is lost(worst case), nodes in the cluster are rebooted (I understand this is part of the HA process and watchdog behavior).
A node will fence and reboot itself only, if all corosync connections to the remaining nodes are lost. Build a setup with redundancy, with least two (and up to eight) corosync links and redundant switches and this isn't a problem.

The CRM is responsible for migrating resources from one node to another, but won't do anything regarding the automatic reboot.
The LRM executes the commands of the CRM and resets the watchdog. If you stop the LRM (systemctl stop pve-ha-lrm), the watchdog should be stopped to, so no reboot will happen.

But honestly: fix the network setup first place
 
Hello,

I have a question regarding HA.

When the network is lost(worst case), nodes in the cluster are rebooted (I understand this is part of the HA process and watchdog behavior).

If I want to immediately disable HA to prevent nodes from rebooting, could you please suggest the correct command? Since I have many VMs in HA groups, I’d like to know the safest way.

For example, can I stop crm or lrm on all nodes?

I would appreciate your guidance.

Best Regards,
1) on each node : systemctl stop pve-ha-lrm

then (when all lrm are down)

2) on each node: systemctl stop pve-ha-crm
 
  • Like
Reactions: totae
  • Like
Reactions: totae and fba