Hello,
my current task is to setup the HA for my company as the old one is not performing well anymore. Here is some data:
- Corosync Cluster Engine, version '3.1.6'
- Pacemaker 2.1.2
- Ubuntu 22.04.2 LTS
It's supposed to be a HA with 2 nodes. All the resources should be primary on both nodes. Fencing should be enabled. The topology looks like this:
zvol -> drbd -> vm
Right now i got to the point where most of the stuff is set up and i can test my nodes for failover.
Cluster Summary:
* Stack: corosync
* Current DC: s1 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Tue Apr 11 15:18:12 2023
* Last change: Tue Apr 11 15:09:25 2023 by root via crm_attribute on s0
* 2 nodes configured
* 30 resource instances configured
Node List:
* Online: [ s0 s1 ]
Full List of Resources:
* sto-ipmi-s0 (stonith:external/ipmi): Started s1
* sto-ipmi-s1 (stonith:external/ipmi): Started s0
* Clone Set: clo-pri-zfs-drbd_storage [pri-zfs-drbd_storage]:
* Started: [ s0 s1 ]
* Clone Set: mas-drbd-pluto [pri-drbd-pluto] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-poserver [pri-drbd-poserver] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-webserver [pri-drbd-webserver] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-dhcp [pri-drbd-dhcp] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-wawi [pri-drbd-wawi] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-wawius [pri-drbd-wawius] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-saturn [pri-drbd-saturn] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-openvpn [pri-drbd-openvpn] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-asterisk [pri-drbd-asterisk] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-alarmanlage [pri-drbd-alarmanlage] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-jabber [pri-drbd-jabber] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-TESTOPTIXXX [pri-drbd-TESTOPTIXXX] (promotable):
* Promoted: [ s0 s1 ]
* pri-vm-jabber (ocf:heartbeat:VirtualDomain): Started s1
* pri-vm-alarmanlage (ocf:heartbeat:VirtualDomain): Started s1
Here comes the problem:
I am able to migrate vm's with a virsh command. I am able to migrate vm's with a crm resource move command. But when i put a node into standby-node instead of performing a migrate the Cluster just shuts the vm's down and starts then on the other node.
Apr 11 14:54:38 s1 pacemaker-controld[3245707]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: On loss of quorum: Ignore
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop sto-ipmi-s0 ( s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-zfs-drbd_storage:0 ( s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-pluto:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-poserver:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-webserver:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-dhcp:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-wawi:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-wawius:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-saturn:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-openvpn:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-asterisk:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-alarmanlage:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-jabber:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-TESTOPTIXXX:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Move pri-vm-jabber ( s1 -> s0 ) due to unrunnable mas-drbd-jabber demote
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Move pri-vm-alarmanlage ( s1 -> s0 ) due to unrunnable mas-drbd-alarmanlage demote
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Calculated transition 179, saving inputs in /var/lib/pacemaker/pengine/pe-input-2427.bz2
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Initiating stop operation sto-ipmi-s0_stop_0 locally on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Requesting local execution of stop operation for sto-ipmi-s0 on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Initiating stop operation pri-vm-jabber_stop_0 locally on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Requesting local execution of stop operation for pri-vm-jabber on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Initiating stop operation pri-vm-alarmanlage_stop_0 locally on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Requesting local execution of stop operation for pri-vm-alarmanlage on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Result of stop operation for sto-ipmi-s0 on s1: ok
As i read from the documentation this is not supposed to happen. Correct me on that if i am mistaken. Here are some rules from my corosync configuration regarding jabber.
primitive pri-vm-jabber VirtualDomain \
params config="/etc/libvirt/qemu/jabber.xml" hypervisor="qemu:///system" migration_transport=ssh \
meta allow-migrate=true target-role=Started is-managed=true \
op monitor interval=0 timeout=30 \
op start interval=0 timeout=120 \
op stop interval=0 timeout=120 \
op migrate_to interval=0 timeout=120 \
op migrate_from interval=0 timeout=120 \
utilization cpu=1 hv_memory=1024
colocation colo_mas_drbd_alarmanlage_with_clo_pri_zfs_drbd-storage inf: mas-drbd-alarmanlage clo-pri-zfs-drbd_storage
location location-pri-vm-alarmanlage-s0-200 pri-vm-alarmanlage 200: s1
order ord_mas_drbd_alarmanlage_after_clo-zfs-drbd_storage Mandatory: clo-pri-zfs-drbd_storage mas-drbd-alarmanlage
order ord_pri-jabber-after-mas-drbd-jabber Mandatory: mas-drbd-jabber: promote pri-vm-jabber: start
I am thankful for any kind of help. If u need more information please let me know.
Kind regards Philip.
my current task is to setup the HA for my company as the old one is not performing well anymore. Here is some data:
- Corosync Cluster Engine, version '3.1.6'
- Pacemaker 2.1.2
- Ubuntu 22.04.2 LTS
It's supposed to be a HA with 2 nodes. All the resources should be primary on both nodes. Fencing should be enabled. The topology looks like this:
zvol -> drbd -> vm
Right now i got to the point where most of the stuff is set up and i can test my nodes for failover.
Cluster Summary:
* Stack: corosync
* Current DC: s1 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Tue Apr 11 15:18:12 2023
* Last change: Tue Apr 11 15:09:25 2023 by root via crm_attribute on s0
* 2 nodes configured
* 30 resource instances configured
Node List:
* Online: [ s0 s1 ]
Full List of Resources:
* sto-ipmi-s0 (stonith:external/ipmi): Started s1
* sto-ipmi-s1 (stonith:external/ipmi): Started s0
* Clone Set: clo-pri-zfs-drbd_storage [pri-zfs-drbd_storage]:
* Started: [ s0 s1 ]
* Clone Set: mas-drbd-pluto [pri-drbd-pluto] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-poserver [pri-drbd-poserver] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-webserver [pri-drbd-webserver] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-dhcp [pri-drbd-dhcp] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-wawi [pri-drbd-wawi] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-wawius [pri-drbd-wawius] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-saturn [pri-drbd-saturn] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-openvpn [pri-drbd-openvpn] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-asterisk [pri-drbd-asterisk] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-alarmanlage [pri-drbd-alarmanlage] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-jabber [pri-drbd-jabber] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-TESTOPTIXXX [pri-drbd-TESTOPTIXXX] (promotable):
* Promoted: [ s0 s1 ]
* pri-vm-jabber (ocf:heartbeat:VirtualDomain): Started s1
* pri-vm-alarmanlage (ocf:heartbeat:VirtualDomain): Started s1
Here comes the problem:
I am able to migrate vm's with a virsh command. I am able to migrate vm's with a crm resource move command. But when i put a node into standby-node instead of performing a migrate the Cluster just shuts the vm's down and starts then on the other node.
Apr 11 14:54:38 s1 pacemaker-controld[3245707]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: On loss of quorum: Ignore
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop sto-ipmi-s0 ( s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-zfs-drbd_storage:0 ( s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-pluto:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-poserver:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-webserver:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-dhcp:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-wawi:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-wawius:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-saturn:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-openvpn:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-asterisk:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-alarmanlage:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-jabber:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-TESTOPTIXXX:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Move pri-vm-jabber ( s1 -> s0 ) due to unrunnable mas-drbd-jabber demote
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Move pri-vm-alarmanlage ( s1 -> s0 ) due to unrunnable mas-drbd-alarmanlage demote
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Calculated transition 179, saving inputs in /var/lib/pacemaker/pengine/pe-input-2427.bz2
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Initiating stop operation sto-ipmi-s0_stop_0 locally on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Requesting local execution of stop operation for sto-ipmi-s0 on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Initiating stop operation pri-vm-jabber_stop_0 locally on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Requesting local execution of stop operation for pri-vm-jabber on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Initiating stop operation pri-vm-alarmanlage_stop_0 locally on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Requesting local execution of stop operation for pri-vm-alarmanlage on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Result of stop operation for sto-ipmi-s0 on s1: ok
As i read from the documentation this is not supposed to happen. Correct me on that if i am mistaken. Here are some rules from my corosync configuration regarding jabber.
primitive pri-vm-jabber VirtualDomain \
params config="/etc/libvirt/qemu/jabber.xml" hypervisor="qemu:///system" migration_transport=ssh \
meta allow-migrate=true target-role=Started is-managed=true \
op monitor interval=0 timeout=30 \
op start interval=0 timeout=120 \
op stop interval=0 timeout=120 \
op migrate_to interval=0 timeout=120 \
op migrate_from interval=0 timeout=120 \
utilization cpu=1 hv_memory=1024
colocation colo_mas_drbd_alarmanlage_with_clo_pri_zfs_drbd-storage inf: mas-drbd-alarmanlage clo-pri-zfs-drbd_storage
location location-pri-vm-alarmanlage-s0-200 pri-vm-alarmanlage 200: s1
order ord_mas_drbd_alarmanlage_after_clo-zfs-drbd_storage Mandatory: clo-pri-zfs-drbd_storage mas-drbd-alarmanlage
order ord_pri-jabber-after-mas-drbd-jabber Mandatory: mas-drbd-jabber: promote pri-vm-jabber: start
I am thankful for any kind of help. If u need more information please let me know.
Kind regards Philip.
Last edited: