Proxmox 4 HA VM Freeze State

Discussion in 'Proxmox VE: Installation and configuration' started by adamb, Nov 16, 2015.

  1. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    16,322
    Likes Received:
    285
    This is still the same behavior in 4.x
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  2. AhmedF

    AhmedF New Member

    Joined:
    Dec 26, 2012
    Messages:
    26
    Likes Received:
    1
    So what's about this freeze state , shouldn't the HA CTs moved to another node(s) when the node fails ?
     
  3. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    16,322
    Likes Received:
    285
    That state is only entered when you manually shutdown a node

    Note: shutdown != failure

    Sure - this is exactly what happens when a node fail.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  4. AhmedF

    AhmedF New Member

    Joined:
    Dec 26, 2012
    Messages:
    26
    Likes Received:
    1
    Agree but in proxmox 3.x when I shutdown a node manually and I did that a lot :)
    RGMANAGER will relocate all HA CTs to other nodes first then stop and then the node complete it's manual shutdown.

    is there a difference in proxmox 4.x ?
     
  5. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    16,322
    Likes Received:
    285
    Yes, this is different.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  6. tatyrza

    tatyrza New Member
    Proxmox VE Subscriber

    Joined:
    Nov 15, 2015
    Messages:
    11
    Likes Received:
    1
    Hello, I tested the HA in a way that reset (by IPMI) one of the hosts. VM running on this host wasn't relocated and put in error state. Worse, after returning this host on-line virtual machine could not run. It stuck in error state!. Only after a clean reboot of host, affected machine was start.




    I do not like it. PVE 3.4 was better.
     
  7. AhmedF

    AhmedF New Member

    Joined:
    Dec 26, 2012
    Messages:
    26
    Likes Received:
    1

    I was afraid of such cases :(
    In my cluster 3.x , if one node goes down for any reason it's being fenced and all CTs are moved to other nodes during the downtime automatically and that's what I call HA.

    don't we all agree on that ?
     
    AlexLup likes this.
  8. t.lamprecht

    t.lamprecht Proxmox Staff Member
    Staff Member

    Joined:
    Jul 28, 2015
    Messages:
    947
    Likes Received:
    97
    First the error state was also in PVE 3.4 we adapted it from rgmanager with the same triggers.
    It gets placed in an error state when the following things happen:
    * The VM fails to stop - this is highly unlikely as we do a normal shutdown with a 60s timeout which then stops the VM via Qemu (and thats normally a secured stop)
    * The VM cannot get started after all relocate and restart tries, read http://pve.proxmox.com/wiki/Manual:_ha-manager Especially the "RECOVERY POLICY" and "ERROR RECOVERY"

    But you already should know that, I hope nobody who wants to do HA related stuff uses software without reading it's documentation first ;)

    So please check the logs and show what really happened, also describe your setup (how many nodes, which shared storage, ...).
    At first I would suspect a misconfigured (at least in the sense of HA) VM, maybe with local storage or something other which binds the VM to a host.
    And if it's a bug we also need this information so that we can reproduce it and fix it, thanks.

    But a graceful shutdown is not "any reason", it's by no means a failure and it's planned we agree also on that? So we should not by default start automatic actions. I could imagine a ha-manager command which handles such a case would be better.
    Or at least an option. I understand that is inconvenient, but there are workarounds (scripting the relocate, killing the lrm) and it's simply an opinion thing, where the lazy site argues for automatic relocation.

    The solution to kill the VM and restart it on another host automatically is not a really clean solution, in my opinion.
    Doing online migration, a quite clean solution, with possible hundreds of gigabyte ram would put a huge load on the infrastructure and should not be triggered automatically.

    This is why we have made the freeze state, to not trigger some automatic action on a planned downtime event, with the though behind that human intelligence is far better to act and plan such downtimes and that and automatic relocate of a lot of VMs in an unnecessary case could do more harm than good.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
    #28 t.lamprecht, Nov 22, 2015
    Last edited: Nov 22, 2015
  9. tatyrza

    tatyrza New Member
    Proxmox VE Subscriber

    Joined:
    Nov 15, 2015
    Messages:
    11
    Likes Received:
    1
    Hello! I still use PVE4. What about UPS? I have three hosts, each with its own UPS. NUT program installed on each host, monitors the UPS. If one of the UPS loses power, the VMs will be moved?
     
  10. t.lamprecht

    t.lamprecht Proxmox Staff Member
    Staff Member

    Joined:
    Jul 28, 2015
    Messages:
    947
    Likes Received:
    97
    Not automatically for now. You could write a small script which the NUT program calls on an failure event so that all VMs are migrated to another host.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  11. adamb

    adamb Member
    Proxmox VE Subscriber

    Joined:
    Mar 1, 2012
    Messages:
    955
    Likes Received:
    19
    I do understand the logic to a degree but I honestly don't see an issue with the logic behind 3.4. We support these cluster remotely. Monthly reboots of HA nodes has proven to be very valuable, including the failover of the VM when the reboot happens. These reboots weed out any random issues which could arise when a legit fail over is needed.

    Obviously live migrating a VM with well over 700G of ram without human intervention just sounds crazy.

    It would be great to have the option to choose how we want the VM's to be handled when a planned shutdown takes effect. 4.0 was a big enough change in itself, I don't think changing the logic was the best idea.

    If I kill pve-ha-lrm, the node typically gets fenced. If I simply stop the service it puts the VM in a freeze state.
     
    #31 adamb, Dec 1, 2015
    Last edited: Dec 1, 2015
  12. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    16,322
    Likes Received:
    285
    We work on a patch with the following behavior:

    system shutdown: stops VMs, then move them to other nodes

    system reboot: stops VMs, put the into freeze state
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  13. adamb

    adamb Member
    Proxmox VE Subscriber

    Joined:
    Mar 1, 2012
    Messages:
    955
    Likes Received:
    19
    I appreciate the offer, but it wouldn't do us much good. We need the ability to choose this option for reboot more so than a shutdown.
     
  14. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    16,322
    Likes Received:
    285
    I guess we can also make it configurable.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  15. adamb

    adamb Member
    Proxmox VE Subscriber

    Joined:
    Mar 1, 2012
    Messages:
    955
    Likes Received:
    19
    You guys are the best! Let me know if there is anything I can do.
     
  16. AhmedF

    AhmedF New Member

    Joined:
    Dec 26, 2012
    Messages:
    26
    Likes Received:
    1
    I tried to stop rgmanager manually , it stopped all CTs in the node successfully but they didn't start on other nodes like configured with failover domains , can you please tell me how to fix this ?
     
  17. Belokan

    Belokan Member

    Joined:
    Apr 27, 2016
    Messages:
    149
    Likes Received:
    10
    Hello all,

    I'm quite new to proxmox and I directly started using 4.x (pvetest) for my lab@home few weeks ago.
    I have a 2 nodes HA cluster with NFS/iSCSI storage provided by 2 NAS.

    I've been able to configure my VMs to be live migrated in a few clicks (including storage), PM is quite impressive I admit.
    I've added two VMs as HA resources, first one is a VDI used to remotely connect @office for support, and the second one provides DHCP/DNS/VPN/HTTP/... services to my local network ... So those two have to be up and running all the time.

    But I've been quite "disappointed" when I first stopped (reboot then shutdown in order to be sure that the host reboots correctly before the maintenance) a PVE node hosting one of those HA VM for hardware upgrade. I was thinking that the HA VM should first be live migrated to the second node before shutting down. But it did not. I've been trained to VMware 4.x in the past but never had the chance to really work with it but part of my BAU is to configure/administer/maintain Veritas Clusters and in both cases (memories from VMW training and actual VCS behaviors) clusters always move resources before shutting down. Is it possible to implement this behavior in PM (with/without automatic fail-back option) ?

    Another point is if I try to migrate a HA VM to another Node, it just does nothing:

    Executing HA migrate for VM 104 to node pve1
    TASK OK

    104 is running on pve2 and even with an OK status, nothing happens. I have to remove the VM from HA and then I can migrate it manually.

    Am I missing something here ?

    Thanks !

    Olivier
     
  18. adamb

    adamb Member
    Proxmox VE Subscriber

    Joined:
    Mar 1, 2012
    Messages:
    955
    Likes Received:
    19
    If your using proxmox4 with two nodes, that is your issue. For HA you need 3 nodes total.
     
  19. Belokan

    Belokan Member

    Joined:
    Apr 27, 2016
    Messages:
    149
    Likes Received:
    10
    Ok thanks, got the point (fencing) !
    It explains the "with at least 3 nodes" requirement ...

    But in my current situation, what will happen if a node crashes ? Will HA VMs restarted on the second node despite the "2 nodes" configuration or just nothing ?

    Can I add a 3rd "virtual node" (installed on a VM hosted on one NAS for instance) ? And add it to the cluster together with the 2 physical nodes and then make sure that no VM will migrate on it ?
     
  20. adamb

    adamb Member
    Proxmox VE Subscriber

    Joined:
    Mar 1, 2012
    Messages:
    955
    Likes Received:
    19
    I believe nothing will happen as you don't have quorum. Would this third node be running on a VM as the two current hosts? If so that wouldn't do much, if its running on another host separate from these two it would be ok, but definitely not anything I would rely on.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice