We had an interesting situation this morning. For some reason one node in our cluster was not showing as active (green "running" arrows on the guest icon on the tree) and all the LXC's were not responding. We managed to address the issue as quickly as possible by simply resetting the node and all came back up. (If this happens again, we will have to investigate the cause more closely)
However, a number of those non-responsive containers are under HA management. Of course, if the node is taken down orderly, they will migrate to other nodes, but in this case (as has happened in previous instances), the node was responding, but the guests not. Is there a way in which we can tell HA to restart the LXC's (and VM's for that matter) on another node if they are not responding for an extended (definable) period of time? Personally I've almost never had node failures, but I've had hanging guests for various reasons. HA should be able to address that, not?
This is what it looks like, but no services are not running?
However, a number of those non-responsive containers are under HA management. Of course, if the node is taken down orderly, they will migrate to other nodes, but in this case (as has happened in previous instances), the node was responding, but the guests not. Is there a way in which we can tell HA to restart the LXC's (and VM's for that matter) on another node if they are not responding for an extended (definable) period of time? Personally I've almost never had node failures, but I've had hanging guests for various reasons. HA should be able to address that, not?
This is what it looks like, but no services are not running?
Last edited: