High Availability

aneubau

Renowned Member
Sep 5, 2008
44
1
73
Vienna/Austria
I have here a strange phenomen regarding the HA in my proxmox environment.
I am using three server, called now PVE, PVE02 and PVE04 running proxmox 4.4.
All three have been reinstalled by migrating all VMs to a remaining node, removing the node correctly after shutting it down, installing proxmox from scratch and adding to the cluster again.
Before doing this I had a forth server (pve03) in the cluster which I removed permanently.
All is working correctly (see pvecm output) and migration of VMs is working, but HA shows following output:

root@pve:~# ha-manager status

quorum OK
master pve03 (idle, Thu Oct 13 08:37:38 2016)
lrm pve (idle, Wed Jan 11 15:50:34 2017)
lrm pve02 (idle, Wed Jan 11 15:50:34 2017)
lrm pve03 (old timestamp - dead?, Mon Dec 19 12:33:59 2016)

root@pve:~# pvecm status

Quorum information
------------------
Date: Wed Jan 11 15:46:14 2017
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000002
Ring ID: 1/48072
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.1.31
0x00000003 1 192.168.1.61
0x00000002 1 192.168.1.75 (local)

root@pve:~# pvecm nodes

Membership information
----------------------
Nodeid Votes Name
1 1 pve04
3 1 pve02
2 1 pve (local)

How can I reset the HA environment or change the master ?
 
what is the output of
Code:
systemctl status pve-ha-crm pve-ha-lrm

on all nodes?
 
Here is the output:
root@pve:~# systemctl status pve-ha-crm pve-ha-lrm
● pve-ha-crm.service - PVE Cluster Ressource Manager Daemon
Loaded: loaded (/lib/systemd/system/pve-ha-crm.service; enabled)
Active: active (running) since Thu 2017-01-05 09:15:18 CET; 6 days ago
Process: 1205 ExecStart=/usr/sbin/pve-ha-crm start (code=exited, status=0/SUCCESS)
Main PID: 1211 (pve-ha-crm)
CGroup: /system.slice/pve-ha-crm.service
└─1211 pve-ha-crm

Jan 05 09:15:18 pve pve-ha-crm[1211]: starting server
Jan 05 09:15:18 pve pve-ha-crm[1211]: status change startup => wait_for_quorum
Jan 05 09:15:18 pve systemd[1]: Started PVE Cluster Ressource Manager Daemon.

● pve-ha-lrm.service - PVE Local HA Ressource Manager Daemon
Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled)
Active: active (running) since Thu 2017-01-05 09:15:18 CET; 6 days ago
Process: 1212 ExecStart=/usr/sbin/pve-ha-lrm start (code=exited, status=0/SUCCESS)
Main PID: 1222 (pve-ha-lrm)
CGroup: /system.slice/pve-ha-lrm.service
└─1222 pve-ha-lrm

Jan 05 09:15:18 pve pve-ha-lrm[1222]: starting server
Jan 05 09:15:18 pve pve-ha-lrm[1222]: status change startup => wait_for_agent_lock
Jan 05 09:15:18 pve systemd[1]: Started PVE Local HA Ressource Manager Daemon.

root@pve02:~# systemctl status pve-ha-crm pve-ha-lrm
pve-ha-crm.service - PVE Cluster Ressource Manager Daemon
Loaded: loaded (/lib/systemd/system/pve-ha-crm.service; enabled)
Active: active (running) since Thu 2017-01-05 16:11:59 CET; 6 days ago
Process: 1567 ExecStart=/usr/sbin/pve-ha-crm start (code=exited, status=0/SUCCESS)
Main PID: 1571 (pve-ha-crm)
CGroup: /system.slice/pve-ha-crm.service
└─1571 pve-ha-crm

Jan 05 16:11:59 pve02 pve-ha-crm[1571]: starting server
Jan 05 16:11:59 pve02 pve-ha-crm[1571]: status change startup => wait_for_quorum
Jan 05 16:11:59 pve02 systemd[1]: Started PVE Cluster Ressource Manager Daemon.

pve-ha-lrm.service - PVE Local HA Ressource Manager Daemon
Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled)
Active: active (running) since Thu 2017-01-05 16:11:59 CET; 6 days ago
Process: 1572 ExecStart=/usr/sbin/pve-ha-lrm start (code=exited, status=0/SUCCESS)
Main PID: 1583 (pve-ha-lrm)
CGroup: /system.slice/pve-ha-lrm.service
└─1583 pve-ha-lrm

Jan 05 16:11:59 pve02 pve-ha-lrm[1583]: starting server
Jan 05 16:11:59 pve02 pve-ha-lrm[1583]: status change startup => wait_for_agent_lock
Jan 05 16:11:59 pve02 systemd[1]: Started PVE Local HA Ressource Manager Daemon.

root@pve04:~# systemctl status pve-ha-crm pve-ha-lrm
pve-ha-crm.service - PVE Cluster Ressource Manager Daemon
Loaded: loaded (/lib/systemd/system/pve-ha-crm.service; enabled)
Active: active (running) since Thu 2017-01-05 14:33:34 CET; 6 days ago
Main PID: 1588 (pve-ha-crm)
CGroup: /system.slice/pve-ha-crm.service
└─1588 pve-ha-crm

Jan 05 14:33:34 pve04 pve-ha-crm[1588]: starting server
Jan 05 14:33:34 pve04 pve-ha-crm[1588]: status change startup => wait_for_quorum
Jan 05 14:33:34 pve04 systemd[1]: Started PVE Cluster Ressource Manager Daemon.

pve-ha-lrm.service - PVE Local HA Ressource Manager Daemon
Loaded: loaded (/lib/systemd/system/pve-ha-lrm.service; enabled)
Active: active (running) since Thu 2017-01-05 14:33:34 CET; 6 days ago
Main PID: 1600 (pve-ha-lrm)
CGroup: /system.slice/pve-ha-lrm.service
└─1600 pve-ha-lrm

Jan 05 14:33:34 pve04 pve-ha-lrm[1600]: starting server
Jan 05 14:33:34 pve04 pve-ha-lrm[1600]: status change startup => wait_for_agent_lock
Jan 05 14:33:34 pve04 systemd[1]: Started PVE Local HA Ressource Manager Daemon.
 
The problem solved itself magically, after creating a HA-Group with Ressource and simulating a server fail.
The retired server entry (pve03) was removed and another sever (pve02) replaced it as master.