Understanding LXC Migration Behavior in a Proxmox HA Cluster with 3 Nodes

ensnare · May 3, 2024

I'm currently managing a Proxmox cluster with three nodes configured for high availability (HA). I've observed some behaviors regarding LXC container management and failover mechanisms, and I'd appreciate any insights or clarifications you might offer.

Preventing Duplicate LXC Instances: In our HA setup, I'm curious about the safeguards that Proxmox has in place to prevent an LXC container from starting on two nodes simultaneously. How does the system ensure that the same container does not accidentally run on multiple nodes at the same time? Particularly in the event a node fails, then comes back online quickly, etc... Is the duplicate instance scenario possible? Anything I can do to prevent it?
Behavior During Sequential Node Failures: In a scenario where we have three nodes, if the node hosting an LXC container fails, the system successfully starts the container on a second node. However, if this second node also fails shortly after, the container does not attempt to migrate to the third node. Is this behavior expected? Are there specific configurations or limitations that prevent the container from migrating to the remaining operational node?

Understanding these aspects will help me better manage our resources and ensure maximum uptime. Thanks in advance for your help and guidance!

UdoB · May 3, 2024

I do not use containers, but:

a service (being a VM or a Container) can only run on one node at a given point in time. This is fundamental and the core of the PVE software takes care of this
when the second node fails (and both stay off/dead) the third one is alone. It loses Quorum and it will fence itself --> it will reboot. Now, when it starts, the Quorum is still not present (as I assume the first two nodes are still dead). For this reason it will not try to start any LXC or VM
only when one second node starts up successfully they can build a Quorum (two out of three votes) and theses two will start all HA services as expected

See also https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum and several threads in this forum...

ensnare · May 3, 2024

Interesting, thanks. If I could ask more specifically, what might happen in this situation:

A VM is running on Node 1. Nodes 1,2,3 are all connected to the same VLAN. HA is enabled such that the VM can migrate to nodes 2 or 3 if node 1 goes down.

Now, Node 1 is partitioned to a different VLAN and can no longer communicate with Nodes 2 or 3, but still maintains internet access.

Presumably, the VM would begin to run on Node 2 or 3.

My questions are:
- What happens to the running instance of the VM on Node 1?
- Will there ever be any overlap of the VM instance running on Node 1, or Nodes 2/3?
- What actually happens here?

UdoB · May 4, 2024

ensnare said:
- What happens to the running instance of the VM on Node 1?

The Node will reboot - it will "fence" itself. The VM on this Node will not start again. Node 1 will never quorate without the correct network connectivity to reach Node 2/3.

ensnare said:
- Will there ever be any overlap of the VM instance running on Node 1, or Nodes 2/3?

No.
It would be possible if the VM on 2/3 would start immediately and/or Node 1 would wait too long to shut down. I did not test this scenario by myself, but I am fairly sure the Proxmox developers knew this and prepared the timing accordingly.

Search

Search

Understanding LXC Migration Behavior in a Proxmox HA Cluster with 3 Nodes

ensnare

Member

UdoB

Distinguished Member

ensnare

Member

UdoB

Distinguished Member

We value your privacy