High Availability

aneubau

Renowned Member
Sep 5, 2008
44
1
73
Vienna/Austria
I have here a strange phenomen regarding the HA in my proxmox environment.
I am using three server, called now PVE, PVE02 and PVE04 running proxmox 4.4.
All three have been reinstalled by migrating all VMs to a remaining node, removing the node correctly after shutting it down, installing proxmox from scratch and adding to the cluster again.
Before doing this I had a forth server (pve03) in the cluster which I removed permanently.
All is working correctly (see pvecm output) and migration of VMs is working, but HA shows following output:

root@pve:~# ha-manager status

quorum OK
master pve03 (idle, Thu Oct 13 08:37:38 2016)
lrm pve (idle, Wed Jan 11 15:50:34 2017)
lrm pve02 (idle, Wed Jan 11 15:50:34 2017)
lrm pve03 (old timestamp - dead?, Mon Dec 19 12:33:59 2016)

root@pve:~# pvecm status

Quorum information
------------------
Date: Wed Jan 11 15:46:14 2017
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000002
Ring ID: 1/48072
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.1.31
0x00000003 1 192.168.1.61
0x00000002 1 192.168.1.75 (local)

root@pve:~# pvecm nodes

Membership information
----------------------
Nodeid Votes Name
1 1 pve04
3 1 pve02
2 1 pve (local)

How can I reset the HA environment or change the master ?
 
what is the output of
Code:
systemctl status pve-ha-crm pve-ha-lrm

on all nodes?
 
Here is the output:
root@pve:~# systemctl status pve-ha-crm pve-ha-lrm
● pve-ha-crm.service - PVE Cluster Ressource Manager Daemon
Loaded: loaded (/lib/systemd/system/pve-ha-crm.service; enabled)
Active: active (running) since Thu 2017-01-05 09:15:18 CET; 6 days ago
Process: 1205 ExecStart=/usr/sbin/pve-ha-crm start (code=exited, status=0/SUCCESS)
Main PID: 1211 (pve-ha-crm)
CGroup: /system.slice/pve-ha-crm.service
└─1211 pve-ha-crm

Jan 05 09:15:18 pve pve-ha-crm[1211]: starting server
Jan 05 09:15:18 pve pve-ha-crm[1211]: status change startup => wait_for_quorum
Jan 05 09:15:18 pve systemd[1]: Started PVE Cluster Ressource Manager Daemon.

● pve-ha-lrm.service - PVE Local HA Ressource Manager Daemon
Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled)
Active: active (running) since Thu 2017-01-05 09:15:18 CET; 6 days ago
Process: 1212 ExecStart=/usr/sbin/pve-ha-lrm start (code=exited, status=0/SUCCESS)
Main PID: 1222 (pve-ha-lrm)
CGroup: /system.slice/pve-ha-lrm.service
└─1222 pve-ha-lrm

Jan 05 09:15:18 pve pve-ha-lrm[1222]: starting server
Jan 05 09:15:18 pve pve-ha-lrm[1222]: status change startup => wait_for_agent_lock
Jan 05 09:15:18 pve systemd[1]: Started PVE Local HA Ressource Manager Daemon.

root@pve02:~# systemctl status pve-ha-crm pve-ha-lrm
pve-ha-crm.service - PVE Cluster Ressource Manager Daemon
Loaded: loaded (/lib/systemd/system/pve-ha-crm.service; enabled)
Active: active (running) since Thu 2017-01-05 16:11:59 CET; 6 days ago
Process: 1567 ExecStart=/usr/sbin/pve-ha-crm start (code=exited, status=0/SUCCESS)
Main PID: 1571 (pve-ha-crm)
CGroup: /system.slice/pve-ha-crm.service
└─1571 pve-ha-crm

Jan 05 16:11:59 pve02 pve-ha-crm[1571]: starting server
Jan 05 16:11:59 pve02 pve-ha-crm[1571]: status change startup => wait_for_quorum
Jan 05 16:11:59 pve02 systemd[1]: Started PVE Cluster Ressource Manager Daemon.

pve-ha-lrm.service - PVE Local HA Ressource Manager Daemon
Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled)
Active: active (running) since Thu 2017-01-05 16:11:59 CET; 6 days ago
Process: 1572 ExecStart=/usr/sbin/pve-ha-lrm start (code=exited, status=0/SUCCESS)
Main PID: 1583 (pve-ha-lrm)
CGroup: /system.slice/pve-ha-lrm.service
└─1583 pve-ha-lrm

Jan 05 16:11:59 pve02 pve-ha-lrm[1583]: starting server
Jan 05 16:11:59 pve02 pve-ha-lrm[1583]: status change startup => wait_for_agent_lock
Jan 05 16:11:59 pve02 systemd[1]: Started PVE Local HA Ressource Manager Daemon.

root@pve04:~# systemctl status pve-ha-crm pve-ha-lrm
pve-ha-crm.service - PVE Cluster Ressource Manager Daemon
Loaded: loaded (/lib/systemd/system/pve-ha-crm.service; enabled)
Active: active (running) since Thu 2017-01-05 14:33:34 CET; 6 days ago
Main PID: 1588 (pve-ha-crm)
CGroup: /system.slice/pve-ha-crm.service
└─1588 pve-ha-crm

Jan 05 14:33:34 pve04 pve-ha-crm[1588]: starting server
Jan 05 14:33:34 pve04 pve-ha-crm[1588]: status change startup => wait_for_quorum
Jan 05 14:33:34 pve04 systemd[1]: Started PVE Cluster Ressource Manager Daemon.

pve-ha-lrm.service - PVE Local HA Ressource Manager Daemon
Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled)
Active: active (running) since Thu 2017-01-05 14:33:34 CET; 6 days ago
Main PID: 1600 (pve-ha-lrm)
CGroup: /system.slice/pve-ha-lrm.service
└─1600 pve-ha-lrm

Jan 05 14:33:34 pve04 pve-ha-lrm[1600]: starting server
Jan 05 14:33:34 pve04 pve-ha-lrm[1600]: status change startup => wait_for_agent_lock
Jan 05 14:33:34 pve04 systemd[1]: Started PVE Local HA Ressource Manager Daemon.
 
The problem solved itself magically, after creating a HA-Group with Ressource and simulating a server fail.
The retired server entry (pve03) was removed and another sever (pve02) replaced it as master.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!