Proxmox4-HA-not-working...Feedback

Hi All!

Tonight air conditioner broke down in the server room, one of the three servers to make an emergency shutdown. In the morning I saw that all the VMs from this server migrated on the other two servers corresponding HA-groups setting.
Real situation occured!
Proxmox HA cluster has shown that working as expected!

P.S.
But after booting that server - VMs failback not occured.


---
Best regards!
Gosha
 
Last edited:
But after booting that server - VMs failback not occured.

You must explicitly configure the VM to let it automatically fallback, we didn't want to init a mass migration when the server comes back online, this probably would do more harm than good.

What you can do to setup this behaviour is to add the VM to a group which has only the node in it, namely the one where you want to run the VM.
Be sure that 'nofailback' and 'restricted' are _NOT_ ticked. Then the VM will still be migrated and when the failed Node comes back online a failback will be executed.
 
Be sure that 'nofailback' and 'restricted' are _NOT_ ticked. Then the VM will still be migrated and when the failed Node comes back online a failback will be executed.

All my VMs HA-groups without 'nofailback' and 'restricted'.

---
Best regards!
Gosha
 
All my VMs without 'nofailback' and 'restricted'.

Yes, but the option purposely is called "nofailback" and not "failback", meaning that when not ticked a fail-back won't automatically happen but if ticked a fail-back will be prevented.

You need to configure the group to only have one preferred Node also to be sure that the VM always fail-backs to this one if possible, at the moment.
 
Yes, but the option purposely is called "nofailback" and not "failback", meaning that when not ticked a fail-back won't automatically happen but if ticked a fail-back will be prevented.

You need to configure the group to only have one preferred Node also to be sure that the VM always fail-backs to this one if possible, at the moment.

All my HA-groups not ticked both 'nofailback' and 'restricted'. This default setting.
 
All my HA-groups not ticked both 'nofailback' and 'restricted'. This default setting.

Yes, I know :)

I was only saying that if you want failback of a service you need to additionally add it to a group where only the preferred node is in it (additional to the default settings).

Like: ha_failback_group.png
 
...you need to additionally add it to a group where only the preferred node is in it (additional to the default settings).

I created the new group for failback to node 1:

pic1.png

and try to add vm:101 to this group:

pic2.png

hm... maybe I somehow misunderstood... o_O
one resource can not be placed in two groups?
 
Try:
Code:
ha-manager disable vm:102
ha-manager enable vm:102

This worked.

Also can you please attach the logs from the CRM master at that time (from your post I guess its pve20).
Maybe filter it a bit, something like:
Code:
journalctl -u pve-ha-crm.service -u pve-ha-lrm.service -u pve-cluster.service > journal-`date +%Y-%m-%d-%H%M%S`.log

I cannot reproduce such issues so it's important to have the info so we can find and fix an eventual bug or help you with the configuration, thanks.

ran on both pve20 (the survivor) and pve22 (the victim).
 

Attachments

  • journals.zip
    8 KB · Views: 0
Update: I had a power outage this morning. I had one VM stuck the same way as described above, and I had to use Thomas's suggestion to remove and reenable HA on it to bring it back.

Why is this happening?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!