Cluster Node reboot vm mode freeze

Discussion in 'Proxmox VE: Installation and configuration' started by HFernandez, May 20, 2019.

  1. HFernandez

    HFernandez New Member
    Proxmox Subscriber

    May 20, 2019
    Likes Received:
    Hello everyone, I have 3 nodes in cluster. h1, h3, h6
    Virtual Environment 5.4-5

    All Vm are in a shared resource that all nodes access.
    A few days ago the node h1 was restarted but never returned.
    Wait for vm to be passed to the other nodes but that did not happen.

    The node h1 retained the vm and placed it in freeze mode.

    I could not take control of the vm in that node.

    When I went to the HA sector, I put them in Ignored mode, but when I wanted to start them from the console of the node h3, I said that there was no VM or the .conf file of the VM.

    At this moment I have some Vm in Fence mode

    In the syslog it shows me this error:
    May 20 09:26:34 h3 pve-ha-crm [2083]: recover service 'vm: 117' from fenced node 'h1' to node 'h3'
    May 20 09:26:34 h3 pve-ha-crm [2083]: got unexpected error - Configuration file 'nodes / h1 / qemu-server / 117.conf' does not exist

    The result of omping I think is correct:
    unicast, xmt/rcv/%loss = 9983/9983/0%, min/avg/max/std-dev = 0.045/0.084/0.281/0.018
    multicast, xmt/rcv/%loss = 9983/9983/0%, min/avg/max/std-dev = 0.054/0.094/0.187/0.021

    In my switches I do not have IGMP enabled, they have to be enabled if or if so that HA works?

    What could have happened? What do I need to configure or test?

    Attachment screenshot
  2. Richard

    Richard Proxmox Staff Member
    Staff Member

    Mar 6, 2015
    Likes Received:
    Try to restart HA as follows
    systemctl restart pve-ha-lrm
    If this does not help post a pvereport about all your nodes and check what

    systemctl status pve-ha-lrm
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
    HFernandez likes this.
  3. HFernandez

    HFernandez New Member
    Proxmox Subscriber

    May 20, 2019
    Likes Received:
    Hi Richard, thnks, is every ok now!
  4. klowet

    klowet New Member

    Jun 22, 2018
    Likes Received:

    Same problem here. I restarted 1 of 3 nodes. That node is now in the status "wair_for_agent_lock".
    Restarting pve-ha-lrm did not work.

    root@anr1-a-pve02:/home/klowet# systemctl status pve-ha-lrm
    ● pve-ha-lrm.service - PVE Local HA Ressource Manager Daemon
       Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled; vendor preset: enabled)
       Active: active (running) since Tue 2019-05-28 12:13:09 CEST; 10min ago
      Process: 41801 ExecStop=/usr/sbin/pve-ha-lrm stop (code=exited, status=0/SUCCESS)
      Process: 41850 ExecStart=/usr/sbin/pve-ha-lrm start (code=exited, status=0/SUCCESS)
     Main PID: 41944 (pve-ha-lrm)
        Tasks: 1 (limit: 4915)
       Memory: 80.7M
          CPU: 459ms
       CGroup: /system.slice/pve-ha-lrm.service
               └─41944 pve-ha-lrm
    mei 28 12:13:08 anr1-a-pve02 systemd[1]: Starting PVE Local HA Ressource Manager Daemon...
    mei 28 12:13:09 anr1-a-pve02 pve-ha-lrm[41944]: starting server
    mei 28 12:13:09 anr1-a-pve02 pve-ha-lrm[41944]: status change startup => wait_for_agent_lock
    mei 28 12:13:09 anr1-a-pve02 systemd[1]: Started PVE Local HA Ressource Manager Daemon.
    # pvecm nodes
    Membership information
        Nodeid      Votes Name
             1          1
             2          1 (local)
             3          1
    # pvecm status
    Quorum information
    Date:             Tue May 28 12:13:42 2019
    Quorum provider:  corosync_votequorum
    Nodes:            3
    Node ID:          0x00000002
    Ring ID:          1/25372
    Quorate:          Yes
    Votequorum information
    Expected votes:   3
    Highest expected: 3
    Total votes:      3
    Quorum:           2 
    Flags:            Quorate 
    Membership information
        Nodeid      Votes Name
    0x00000001          1
    0x00000002          1 (local)
    0x00000003          1
    # cat /etc/pve/corosync.conf 2>/dev/null
    logging {
      debug: off
      to_syslog: yes
    nodelist {
      node {
        name: anr1-a-pve01
        nodeid: 1
        quorum_votes: 1
      node {
        name: anr1-a-pve02
        nodeid: 2
        quorum_votes: 1
      node {
        name: anr1-a-pve03
        nodeid: 3
        quorum_votes: 1
    quorum {
      provider: corosync_votequorum
    totem {
      cluster_name: pveclu01
      config_version: 3
      interface {
        ringnumber: 0
      ip_version: ipv4
      secauth: on
      version: 2
    root@anr1-a-pve02:/home/klowet# pvereport 
    Process hostname...OK
    Process pveversion --verbose...OK
    Process cat /etc/hosts...OK
    Process top -b -n 1  | head -n 15...OK
    Process pvesubscription get...OK
    Process lscpu...OK
    Process pvesh get /cluster/resources --type node --output-format=yaml...OK
    Process cat /etc/pve/storage.cfg...OK
    Process pvesm status...OK
    Process cat /etc/fstab...OK
    Process findmnt --ascii...OK
    Process df --human...OK
    Process qm list...OK
    Process pct list...OK
    Process ip -details -statistics address...OK
    Process ip -details -4 route show...OK
    Process ip -details -6 route show...OK
    Process cat /etc/network/interfaces...OK
    Process cat /etc/pve/local/host.fw...OK
    Process iptables-save...OK
    Process pvecm nodes...OK
    Process pvecm status...OK
    Process cat /etc/pve/corosync.conf 2>/dev/null...OK
    Process dmidecode -t bios...OK
    Process lspci -nnk...OK
    Process lsblk --ascii...OK
    Process ls -l /dev/disk/by-*/...OK
    Process iscsiadm -m node...OK
    Process iscsiadm -m session...OK
    Process pvs...OK
    Process lvs...OK
    Process vgs...OK
    Process zpool status...OK
    Process zpool list -v...OK
    Process zfs list...OK
    Process ceph status...OK
    Process ceph osd status...OK
    Process ceph df...OK
    Process pveceph status...OK
    Process pveceph lspools...OK
    Process echo rbd-vms
    rbd ls rbd-vms
    ==== general system info ====
    # hostname
    # pveversion --verbose
    proxmox-ve: 5.4-1 (running kernel: 4.15.18-14-pve)
    pve-manager: 5.4-6 (running version: 5.4-6/aa7856c5)
    pve-kernel-4.15: 5.4-2
    pve-kernel-4.15.18-14-pve: 4.15.18-39
    pve-kernel-4.15.18-13-pve: 4.15.18-37
    pve-kernel-4.15.18-11-pve: 4.15.18-34
    pve-kernel-4.15.18-10-pve: 4.15.18-32
    pve-kernel-4.15.18-9-pve: 4.15.18-30
    pve-kernel-4.15.18-8-pve: 4.15.18-28
    pve-kernel-4.15.17-1-pve: 4.15.17-9
    ceph: 12.2.12-pve1
    corosync: 2.4.4-pve1
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: 1.2-2
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.1-10
    libpve-apiclient-perl: 2.0-5
    libpve-common-perl: 5.0-52
    libpve-guest-common-perl: 2.0-20
    libpve-http-server-perl: 2.0-13
    libpve-storage-perl: 5.0-43
    libqb0: 1.0.3-1~bpo9
    lvm2: 2.02.168-pve6
    lxc-pve: 3.1.0-3
    lxcfs: 3.0.3-pve1
    novnc-pve: 1.0.0-3
    proxmox-widget-toolkit: 1.0-28
    pve-cluster: 5.0-37
    pve-container: 2.0-39
    pve-docs: 5.4-2
    pve-edk2-firmware: 1.20190312-1
    pve-firewall: 3.0-21
    pve-firmware: 2.0-6
    pve-ha-manager: 2.0-9
    pve-i18n: 1.1-4
    pve-libspice-server1: 0.14.1-2
    pve-qemu-kvm: 3.0.1-2
    pve-xtermjs: 3.12.0-1
    qemu-server: 5.0-51
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3
    zfsutils-linux: 0.7.13-pve1~bpo2
  5. HFernandez

    HFernandez New Member
    Proxmox Subscriber

    May 20, 2019
    Likes Received:
    Hello, I rebooted the node that had problems, then I started a vm on that node and it was solved. Luck.
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice