Hi,
I have setup a 3-node-cluster that is working like charm, means I can migrate any VM or CT from one node to the other.
The same nodes are using a shared storage provided by Ceph storage.
I followed instructions and created HA groups + resources:
root@ld4257:~# more /etc/pve/ha/groups.cfg
group: web
comment PVE HA for Web Applications
nodes ld4465,ld4464
nofailback 0
restricted 0
group: lve
comment PVE HA for LVE Services
nodes ld4465,ld4257,ld4464
nofailback 0
restricted 0
root@ld4257:~# more /etc/pve/ha/resources.cfg
ct: 206
group lve
state started
ct: 204
group lve
state started
ct: 200
group lve
state started
vm: 113
group web
state started
vm: 114
group web
state started
vm: 115
group web
state started
For maintenance I triggered reboot of node ld4465 and was expecting that all CTs (204 + 206) running on this node would be migrated to node ld4464.
However this failover did not work.
instead all VMs were stopped and CTs were running but not accessible.
Finally node ld4465 was rebooting and all VMs + CTs remains there.
Please check the attached screenshots documenting this.
Why was HA failover not working?
I have setup a 3-node-cluster that is working like charm, means I can migrate any VM or CT from one node to the other.
The same nodes are using a shared storage provided by Ceph storage.
I followed instructions and created HA groups + resources:
root@ld4257:~# more /etc/pve/ha/groups.cfg
group: web
comment PVE HA for Web Applications
nodes ld4465,ld4464
nofailback 0
restricted 0
group: lve
comment PVE HA for LVE Services
nodes ld4465,ld4257,ld4464
nofailback 0
restricted 0
root@ld4257:~# more /etc/pve/ha/resources.cfg
ct: 206
group lve
state started
ct: 204
group lve
state started
ct: 200
group lve
state started
vm: 113
group web
state started
vm: 114
group web
state started
vm: 115
group web
state started
For maintenance I triggered reboot of node ld4465 and was expecting that all CTs (204 + 206) running on this node would be migrated to node ld4464.
However this failover did not work.
instead all VMs were stopped and CTs were running but not accessible.
Finally node ld4465 was rebooting and all VMs + CTs remains there.
Please check the attached screenshots documenting this.
Why was HA failover not working?