[SOLVED] HA with Open vSwitch

mfriedel · Dec 16, 2014

Hello,
I'm testing out an HA Proxmox environment and seem to be having issues with the floating IP since switching the nodes to OVS. I have a working fence and cluster configuration and have also confirmed that multicast is working with the ssmpingd/asmping. Host IPs are 10.22.2.101-103 and I'm trying to use 10.22.2.100 as the floating IP. I've searched the threads but haven't seen any issues with the floating IP if fencing and clustering are both working. Does anyone have ideas on what I might look into next? Any help is much appreciated!
Thanks,
-Mike

Fence/PVE Output and cluster.conf contents below:

fence_tool ls output:
fence domain
member count 3
victim count 0
victim now 0
master nodeid 1
wait state none
members 1 2 3

pve_cm nodes output:
Node Sts Inc Joined Name
1 M 656 2014-12-05 13:27:08 host1
2 M 660 2014-12-05 13:27:23 host2
3 M 676 2014-12-05 16:12:23 host3

pve_cm status output:
root@central1:~# pvecm status
Version: 6.2.0
Config Version: 44
Cluster Name: CLTest
Cluster Id: 4489
Cluster Member: Yes
Cluster Generation: 676
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 3
Flags:
Ports Bound: 0
Node name: host1
Node ID: 1
Multicast addresses: 239.192.17.154
Node addresses: 10.22.2.101

/etc/pve/cluster.conf:
<?xml version="1.0"?>
<cluster config_version="44" name="CLTest">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="10.77.1.11" login="CHANGEME" name="host1-ipmi" passwd="CHANGEME" power_wait="10"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.77.1.12" login="CHANGEME" name="host2-ipmi" passwd="CHANGEME" power_wait="10"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.77.1.13" login="CHANGEME" name="host3-ipmi" passwd="CHANGEME" power_wait="10"/>
</fencedevices>
<clusternodes>
<clusternode name="host1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="host1-ipmi"/>
</method>
</fence>
</clusternode>
<clusternode name="host2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="host2-ipmi"/>
</method>
</fence>
</clusternode>
<clusternode name="host3" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="host3-ipmi"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<service autostart="1" exclusive="0" name="PVEIP1" recovery="relocate">
<ip address="10.22.2.100"/>
</service>
</rm>
</cluster>

/etc/network/interface:
# network interface settings
allow-vmbr0 zzz_core
iface zzz_core inet static
address 10.22.2.101
netmask 255.255.252.0
gateway 10.22.0.1
ovs_type OVSIntPort
ovs_bridge vmbr0
ovs_options tag=722

allow-vmbr1 zzz_corestor
iface zzz_corestor inet static
address 10.44.2.101
netmask 255.255.252.0
ovs_type OVSIntPort
ovs_bridge vmbr1
ovs_options tag=744

auto lo
iface lo inet loopback

allow-vmbr0 eth0
iface eth0 inet manual
ovs_type OVSPort
ovs_bridge vmbr0

allow-vmbr1 eth1
iface eth1 inet manual
ovs_type OVSPort
ovs_bridge vmbr1

auto vmbr0
iface vmbr0 inet manual
ovs_type OVSBridge
ovs_ports eth0 zzz_core

auto vmbr1
iface vmbr1 inet manual
ovs_type OVSBridge
ovs_ports eth1 zzz_corestor

auto vmbr11
iface vmbr11 inet manual
ovs_type OVSBridge

auto vmbr12
iface vmbr12 inet manual
ovs_type OVSBridge

Mr.Holmes · Dec 18, 2014

Hello Mike

mfriedel said:
I'm testing out an HA Proxmox environment and seem to be having issues with the floating IP since switching the nodes to OVS. I have a working fence and cluster configuration and have also confirmed that multicast is working with the ssmpingd/asmping. Host IPs are 10.22.2.101-103 and I'm trying to use 10.22.2.100 as the floating IP. I've searched the threads but haven't seen any issues with the floating IP if fencing and clustering are both working. Does anyone have ideas on what I might look into next?

Looking at your .conf file I think simply the "resources" tag is missed.

I verified the follwing (extremely simple) cluster.conf:

Code:

<?xml version="1.0"?>
<cluster name="holmes" config_version="5">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>

  <clusternodes>
    <clusternode name="holmes1" votes="1" nodeid="1"/>
    <clusternode name="holmes2" votes="1" nodeid="2"/>
    <clusternode name="holmes3" votes="1" nodeid="3"/>
 
  </clusternodes>

  <rm>
   <resources>
     <ip address="10.22.2.100" monitor_link="1"/>
   </resources>
   
   <service name="floatip" autostart="1" recovery="relocate">
       <ip ref="10.22.2.100"/>
   </service>

  </rm>

</cluster>

Works successfully! Note the floating address is not shown by "ifconfig" but by "ip addr show".

Kind regards

Mr.Holmes

mfriedel · Dec 19, 2014

Mr. Holmes: Thanks for the suggestion!

I tried it but unfortunately things still aren't working correctly. I double checked ccs_config_validate for both configs and everything still passes. I think I'm going to try and force a node to be master and enable some debug logging to try and track things down.

mfriedel · Dec 19, 2014

I've noticed a few strange things that may be related. When running clustat, no services show. When running service rgmanager stop, the process just hangs even though there are no VMs running on the host machine.
clustat:
Cluster Status for CLTest @ Fri Dec 19 02:51:41 2014
Member Status: Quorate

Member Name ID Status
------ ---- ---- ------
host1 1 Online
host2 2 Online
host3 3 Online, Local

I'm in the process of migrating VMs, killing rgmanager, and doing a cman_tool leave force to test fencing and make sure each host VM joins successfully as another test...

mfriedel · Dec 19, 2014

Things are working now after the (migrate VM, killall -9 rgmanager, cman_tool leave force) cycle on each host (don't forget to wait for your clustered FS to synch! ). One host was hanging things up. It looks like rgmanager just started logging on all hosts and started the floating IP up after it came back online! Thanks for the help, hopefully this will help someone else.

FYI: Once the offending host was reset, rgmanager shutdown normally on other hosts without having to kill -9 it. rgmanager log output is similar on all hosts except the node #, node/host 3 which is holding the floating IP had some additional logging for the started floating IP.

host1-3:
Dec 19 03:09:26 rgmanager I am node #X
Dec 19 03:09:26 rgmanager Resource Group Manager Starting
Dec 19 03:09:26 rgmanager Loading Service Data
Dec 19 03:09:28 rgmanager Initializing Services
Dec 19 03:09:28 rgmanager [ip] 10.22.2.100 is not configured
Dec 19 03:09:28 rgmanager Services Initialized
~cut~
host3 only:
Dec 19 03:09:28 rgmanager Starting stopped service service

vefloat
Dec 19 03:09:28 rgmanager [ip] Link for zzz_core: Detected
Dec 19 03:09:28 rgmanager [ip] Adding IPv4 address 10.22.2.100/22 to zzz_core
Dec 19 03:09:28 rgmanager [ip] Pinging addr 10.22.2.100 from dev zzz_core
Dec 19 03:09:30 rgmanager [ip] Sending gratuitous ARP: 10.22.2.100 <MACADDRHERE> brd ff:ff:ff:ff:ff:ff
Dec 19 03:09:31 rgmanager Service service

vefloat started

Search

Search

[SOLVED] HA with Open vSwitch

mfriedel

Renowned Member

Mr.Holmes

Active Member

mfriedel

Renowned Member

mfriedel

Renowned Member

mfriedel

Renowned Member

We value your privacy