HA Problem when putting node in standby-mode

DonMugg

New Member
Apr 11, 2023
1
0
1
Hello,
my current task is to setup the HA for my company as the old one is not performing well anymore. Here is some data:
- Corosync Cluster Engine, version '3.1.6'
- Pacemaker 2.1.2
- Ubuntu 22.04.2 LTS
It's supposed to be a HA with 2 nodes. All the resources should be primary on both nodes. Fencing should be enabled. The topology looks like this:
zvol -> drbd -> vm

Right now i got to the point where most of the stuff is set up and i can test my nodes for failover.

Cluster Summary:
* Stack: corosync
* Current DC: s1 (version 2.1.2-ada5c3b36e2) - partition with quorum
* Last updated: Tue Apr 11 15:18:12 2023
* Last change: Tue Apr 11 15:09:25 2023 by root via crm_attribute on s0
* 2 nodes configured
* 30 resource instances configured

Node List:
* Online: [ s0 s1 ]

Full List of Resources:
* sto-ipmi-s0 (stonith:external/ipmi): Started s1
* sto-ipmi-s1 (stonith:external/ipmi): Started s0
* Clone Set: clo-pri-zfs-drbd_storage [pri-zfs-drbd_storage]:
* Started: [ s0 s1 ]
* Clone Set: mas-drbd-pluto [pri-drbd-pluto] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-poserver [pri-drbd-poserver] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-webserver [pri-drbd-webserver] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-dhcp [pri-drbd-dhcp] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-wawi [pri-drbd-wawi] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-wawius [pri-drbd-wawius] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-saturn [pri-drbd-saturn] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-openvpn [pri-drbd-openvpn] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-asterisk [pri-drbd-asterisk] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-alarmanlage [pri-drbd-alarmanlage] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-jabber [pri-drbd-jabber] (promotable):
* Promoted: [ s0 s1 ]
* Clone Set: mas-drbd-TESTOPTIXXX [pri-drbd-TESTOPTIXXX] (promotable):
* Promoted: [ s0 s1 ]
* pri-vm-jabber (ocf:heartbeat:VirtualDomain): Started s1
* pri-vm-alarmanlage (ocf:heartbeat:VirtualDomain): Started s1

Here comes the problem:
I am able to migrate vm's with a virsh command. I am able to migrate vm's with a crm resource move command. But when i put a node into standby-node instead of performing a migrate the Cluster just shuts the vm's down and starts then on the other node.

Apr 11 14:54:38 s1 pacemaker-controld[3245707]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: On loss of quorum: Ignore
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop sto-ipmi-s0 ( s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-zfs-drbd_storage:0 ( s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-pluto:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-poserver:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-webserver:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-dhcp:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-wawi:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-wawius:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-saturn:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-openvpn:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-asterisk:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-alarmanlage:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-jabber:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Stop pri-drbd-TESTOPTIXXX:0 ( Promoted s1 ) due to node availability
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Move pri-vm-jabber ( s1 -> s0 ) due to unrunnable mas-drbd-jabber demote
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Actions: Move pri-vm-alarmanlage ( s1 -> s0 ) due to unrunnable mas-drbd-alarmanlage demote
Apr 11 14:54:38 s1 pacemaker-schedulerd[3245706]: notice: Calculated transition 179, saving inputs in /var/lib/pacemaker/pengine/pe-input-2427.bz2
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Initiating stop operation sto-ipmi-s0_stop_0 locally on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Requesting local execution of stop operation for sto-ipmi-s0 on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Initiating stop operation pri-vm-jabber_stop_0 locally on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Requesting local execution of stop operation for pri-vm-jabber on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Initiating stop operation pri-vm-alarmanlage_stop_0 locally on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Requesting local execution of stop operation for pri-vm-alarmanlage on s1
Apr 11 14:54:39 s1 pacemaker-controld[3245707]: notice: Result of stop operation for sto-ipmi-s0 on s1: ok

As i read from the documentation this is not supposed to happen. Correct me on that if i am mistaken. Here are some rules from my corosync configuration regarding jabber.

primitive pri-vm-jabber VirtualDomain \
params config="/etc/libvirt/qemu/jabber.xml" hypervisor="qemu:///system" migration_transport=ssh \
meta allow-migrate=true target-role=Started is-managed=true \
op monitor interval=0 timeout=30 \
op start interval=0 timeout=120 \
op stop interval=0 timeout=120 \
op migrate_to interval=0 timeout=120 \
op migrate_from interval=0 timeout=120 \
utilization cpu=1 hv_memory=1024

colocation colo_mas_drbd_alarmanlage_with_clo_pri_zfs_drbd-storage inf: mas-drbd-alarmanlage clo-pri-zfs-drbd_storage
location location-pri-vm-alarmanlage-s0-200 pri-vm-alarmanlage 200: s1
order ord_mas_drbd_alarmanlage_after_clo-zfs-drbd_storage Mandatory: clo-pri-zfs-drbd_storage mas-drbd-alarmanlage
order ord_pri-jabber-after-mas-drbd-jabber Mandatory: mas-drbd-jabber: promote pri-vm-jabber: start

I am thankful for any kind of help. If u need more information please let me know.
Kind regards Philip.
 
Last edited: