HA ressources/groups test : OK

ghusson · Feb 25, 2016

Hello,

Hereunder are my tests of HA functions. Before to get it smooth I had some problems, certainly because I didn't let enough time for the HA algorith to react. But this last test is OK. I paste it here as reference for the "restricted" and "no failback" options. Feel free to reveal any illogical sequence (i didn't reread my wrotes and a mistake in a number identifying a node is easy to make).

storage : iSCSI Synology, target with 2 LUNs of 100G, "use directly" unticked, "shared" ticked
VMs are restored on LUN0

3 nodes : proxmox1, proxmox2, proxmox3
2 ethernet links on each server : 1 admin/services and 1 storage
fesh install of nodes
via pve web interface, set eth1 IP for each nodes
reboot nodes until eth1 active and OK

proxmox1
admin (vmbr0) : 192.168.250.51/24
data (eth1) : 192.168.210.51/24
proxmox2
admin (vmbr0) : 192.168.250.52/24
data (eth1) : 192.168.210.52/24
proxmox3
admin (vmbr0) : 192.168.250.53/24
data (eth1) : 192.168.210.53/24

clusterisation :
proxmox1 : pvecm create pvecluster1
proxmox 2 : pvecm add 192.168.250.51 and waited until "successfully added node 'proxmox2' to cluster."
proxmox 2 : pvecm add 192.168.250.51 and waited until "successfully added node 'proxmox3' to cluster."

cluster OK

adding iSCSI storage
adding LVM over iSCSI : error, LVg already exists (maped by system)
removing iSCSI storage
on synology : destroy target and luns
VMs : syslog shows error : proxmox1 iscsid: connect to 192.168.210.10:3260 failed (Connection refused)
reboot each nodes, one by one, waiting each time for cluster rejoining and stabilisation before doing so on next node
nodes are "happy", and don't ask anymore the old target

on synology : recreate a LUN and a target
on pve web interface : create iSCSI storage (not used diretly) and LVM over it (active/shared)

pve web : enable backups storage type over local storage
filezilla copy of debian template (installed with standard BIOS, debian 8.2 AMD64, default partitioning, desktop environment with lxde, ssh server, usual sys tools, qemu-guest-agent)

pve web : restore debian template (VM100)
pve web : full clone ov VM 100, storage LVM, new vmid : 101, name : vm1 - wait until finished
pve web : full clone ov VM 100, storage LVM, new vmid : 102, name : vm2 - wait until finished
pve web : adding qemu agent option to vm100, vm 101, vm 102
pve web : full clone ov VM 100, storage LVM, new vmid : 103, name : vm3 - wait until finished
pve web : start vm1, vm2, vm3
pve web : migrate vm1 to proxmox2 : live migration OK
pve web : migrate vm1 to proxmox3 : live migration OK
pve web : migrate vm1 to proxmox1 : live migration OK
pve web : migrate vm2 to proxmox2 : live migration OK
pve web : migrate vm3 to proxmox3 : live migration OK
pve web : migrate vm2 to proxmox1 : live migration OK
pve web : migrate vm3 to proxmox1 : live migration OK
(all vms back to proxmox1 node)

pve web : create 1 HA Group : g123 with proxmox1, proxmox2, proxmox3 nodes (no option ticked)
create ressource : the 3 started vms vm1,vm2,vm3 in g123 group and enabled
wait 2 mn

root@proxmox1:~# ha-manager status
quorum OK
master proxmox2 (active, Thu Feb 25 14:13:28 2016)
lrm proxmox1 (active, Thu Feb 25 14:13:28 2016)
lrm proxmox2 (active, Thu Feb 25 14:13:29 2016)
lrm proxmox3 (active, Thu Feb 25 14:13:29 2016)
service vm:101 (proxmox1, started)
service vm:102 (proxmox1, started)
service vm:103 (proxmox1, started)

[14:14:30] unplug admin interface on proxmox1 :
on PVE interface, the proxmox1 icon light changes from green to red
connecting back to proxmox1 : ssh proxmox 2 -> ssh 192.168.210.51
proxmox1 is rebooted after 60s (fence/watchdog)
VMs are NOT restarted on proxmox1 (quorum No quorum on node 'proxmox1'!)
[14:17:00] ==> VMs are restarted on others nodes (vm1 and vm2 on proxmox2 and vm3 on proxmox3 in this test)
[14:19:00] replug admin interface on proxmox1 (after proxmox1 has booted) :
proxmox1 rejoins the cluster, the proxmox1 icon light changes from red to green
VMs are kept on proxmox2 and proxmox3

root@proxmox1:~# ha-manager status
quorum OK
master proxmox2 (active, Thu Feb 25 14:19:28 2016)
lrm proxmox1 (active, Thu Feb 25 14:19:33 2016)
lrm proxmox2 (active, Thu Feb 25 14:19:29 2016)
lrm proxmox3 (active, Thu Feb 25 14:19:29 2016)
service vm:101 (proxmox2, started)
service vm:102 (proxmox2, started)
service vm:103 (proxmox3, started)

pve web : migrate vm1 to proxmox1 : live migration OK
pve web : migrate vm2 to proxmox1 : live migration OK
pve web : migrate vm3 to proxmox1 : live migration OK

[14:22:00] unplug admin interface on proxmox1 :
on PVE interface, the proxmox1 icon light changes from green to red
connecting back to proxmox1 : ssh proxmox 2 -> ssh 192.168.210.51
proxmox1 is rebooted after 60s (fence/watchdog)
VMs are NOT restarted on proxmox1 (quorum No quorum on node 'proxmox1'!)
[14:24:18] ==> VMs are restarted on others nodes (vm1 and vm2 on proxmox2 and vm3 on proxmox3 in this test)
[14:26:32] replug admin interface on proxmox1 (after proxmox1 has booted) :
proxmox1 rejoins the cluster, the proxmox1 icon light changes from red to green
VMs are kept on proxmox2 and proxmox3

pve web : migrate vm1 to proxmox1 : live migration OK
pve web : migrate vm2 to proxmox1 : live migration OK
pve web : migrate vm3 to proxmox1 : live migration OK

pve web : remove ressources and groups

root@proxmox1:~# ha-manager status
quorum OK
master proxmox2 (active, Thu Feb 25 14:28:48 2016)
lrm proxmox1 (active, Thu Feb 25 14:28:50 2016)
lrm proxmox2 (active, Thu Feb 25 14:28:50 2016)
lrm proxmox3 (active, Thu Feb 25 14:28:49 2016)

pve web : create group : g1{proxmox1}, "restricted" ticked "no failback" unticked
pve web : create group : g2{proxmox2}, "restricted" unticked "no failback" ticked
pve web : create group : g3{proxmox3}, "restricted" ticked "no failback" ticked

pve web : create ressource : vm1 in g1 enabled
pve web : create ressource : vm1 in g2 enabled
pve web : create ressource : vm1 in g3 enabled

vm3 is live migrated to proxmox3

[14:34:00] unplug admin interface on proxmox1 :
on PVE interface, the proxmox1 icon light changes from green to red
connecting back to proxmox1 : ssh proxmox 2 -> ssh 192.168.210.51
proxmox1 is rebooted after 60s (fence/watchdog)
VMs are NOT restarted on proxmox1 (quorum No quorum on node 'proxmox1'!)
[14:36:18] ==> vm2 is restarted on proxmox 2
[14:38:00] replug admin interface on proxmox1 (after proxmox1 has booted) :
proxmox1 rejoins the cluster, the proxmox1 icon light changes from red to green
vm1 is stopped on proxmox1
vm2 is running on proxmox2
vm3 is running on proxmox3
[14:40:06]
vm1 is started on proxmox1
==> "restricted" options sticks the VM in a group. If no more avilable node on the group, the service is kept stopped. Use it for example if you have an USB dongle as a key for a software (no need to restart the VM on an other node without the dongle).

[14:42:00] unplug admin interface on proxmox2 :
on PVE interface, the proxmox2 icon light changes from green to red
connecting back to proxmox2 : ssh proxmox3 -> ssh 192.168.210.52
proxmox2 is rebooted after 60s (fence/watchdog)
VMs are NOT restarted on proxmox2 (quorum No quorum on node 'proxmox2'!)
[14:45:27] ==> vm2 is restarted on proxmox 1
[14:48:00] replug admin interface on proxmox2 (after proxmox2 has booted) :
proxmox2 rejoins the cluster, the proxmox2 icon light changes from red to green
vm1 is running on proxmox1
vm2 is running on proxmox1
vm3 is running on proxmox3
==> "no failback" option permits a VM to respawn on a node which is not on the group. But if so, and after one node of the group is good, the VM is not migrated back on the group. Use if for example if you want to migrate back your VMs by hand in case you fear a second saturation of the infrastructure.

pve web : migrate vm2 to proxmox2 : live migration OK
vm1 is running on proxmox1
vm2 is running on proxmox2
vm3 is running on proxmox3

[15:06:00] unplug admin interface on proxmox3 :
on PVE interface, the proxmox3 icon light changes from green to red
connecting back to proxmox3 : ssh proxmox2 -> ssh 192.168.210.53
proxmox3 is rebooted after 60s (fence/watchdog)
VMs are NOT restarted on proxmox3 (quorum No quorum on node 'proxmox3'!)
vm3 is stopped
[14:11:52] replug admin interface on proxmox3 (after proxmox3 has booted) :
proxmox3 rejoins the cluster, the proxmox3 icon light changes from red to green
[15:11:57] vm3 is started on proxmox3
vm1 is running on proxmox1
vm2 is running on proxmox2
vm3 is running on proxmox3
==> I can't imagine a case when "restricted" ticked "no failback" ticked could be intersting.

Search

Search

HA ressources/groups test : OK

ghusson

Renowned Member