[SOLVED] HA with Open vSwitch

mfriedel

Member
Aug 11, 2014
4
0
21
Hello,
I'm testing out an HA Proxmox environment and seem to be having issues with the floating IP since switching the nodes to OVS. I have a working fence and cluster configuration and have also confirmed that multicast is working with the ssmpingd/asmping. Host IPs are 10.22.2.101-103 and I'm trying to use 10.22.2.100 as the floating IP. I've searched the threads but haven't seen any issues with the floating IP if fencing and clustering are both working. Does anyone have ideas on what I might look into next? Any help is much appreciated!
Thanks,
-Mike

Fence/PVE Output and cluster.conf contents below:

fence_tool ls output:
fence domain
member count 3
victim count 0
victim now 0
master nodeid 1
wait state none
members 1 2 3

pve_cm nodes output:
Node Sts Inc Joined Name
1 M 656 2014-12-05 13:27:08 host1
2 M 660 2014-12-05 13:27:23 host2
3 M 676 2014-12-05 16:12:23 host3


pve_cm status output:
root@central1:~# pvecm status
Version: 6.2.0
Config Version: 44
Cluster Name: CLTest
Cluster Id: 4489
Cluster Member: Yes
Cluster Generation: 676
Membership state: Cluster-Member
Nodes: 3
Expected votes: 3
Total votes: 3
Node votes: 1
Quorum: 2
Active subsystems: 3
Flags:
Ports Bound: 0
Node name: host1
Node ID: 1
Multicast addresses: 239.192.17.154
Node addresses: 10.22.2.101



/etc/pve/cluster.conf:
<?xml version="1.0"?>
<cluster config_version="44" name="CLTest">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" ipaddr="10.77.1.11" login="CHANGEME" name="host1-ipmi" passwd="CHANGEME" power_wait="10"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.77.1.12" login="CHANGEME" name="host2-ipmi" passwd="CHANGEME" power_wait="10"/>
<fencedevice agent="fence_ipmilan" ipaddr="10.77.1.13" login="CHANGEME" name="host3-ipmi" passwd="CHANGEME" power_wait="10"/>
</fencedevices>
<clusternodes>
<clusternode name="host1" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="host1-ipmi"/>
</method>
</fence>
</clusternode>
<clusternode name="host2" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="host2-ipmi"/>
</method>
</fence>
</clusternode>
<clusternode name="host3" nodeid="3" votes="1">
<fence>
<method name="1">
<device name="host3-ipmi"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<service autostart="1" exclusive="0" name="PVEIP1" recovery="relocate">
<ip address="10.22.2.100"/>
</service>
</rm>
</cluster>

/etc/network/interface:
# network interface settings
allow-vmbr0 zzz_core
iface zzz_core inet static
address 10.22.2.101
netmask 255.255.252.0
gateway 10.22.0.1
ovs_type OVSIntPort
ovs_bridge vmbr0
ovs_options tag=722


allow-vmbr1 zzz_corestor
iface zzz_corestor inet static
address 10.44.2.101
netmask 255.255.252.0
ovs_type OVSIntPort
ovs_bridge vmbr1
ovs_options tag=744


auto lo
iface lo inet loopback


allow-vmbr0 eth0
iface eth0 inet manual
ovs_type OVSPort
ovs_bridge vmbr0


allow-vmbr1 eth1
iface eth1 inet manual
ovs_type OVSPort
ovs_bridge vmbr1


auto vmbr0
iface vmbr0 inet manual
ovs_type OVSBridge
ovs_ports eth0 zzz_core


auto vmbr1
iface vmbr1 inet manual
ovs_type OVSBridge
ovs_ports eth1 zzz_corestor


auto vmbr11
iface vmbr11 inet manual
ovs_type OVSBridge


auto vmbr12
iface vmbr12 inet manual
ovs_type OVSBridge
 
Last edited:
Hello Mike

I'm testing out an HA Proxmox environment and seem to be having issues with the floating IP since switching the nodes to OVS. I have a working fence and cluster configuration and have also confirmed that multicast is working with the ssmpingd/asmping. Host IPs are 10.22.2.101-103 and I'm trying to use 10.22.2.100 as the floating IP. I've searched the threads but haven't seen any issues with the floating IP if fencing and clustering are both working. Does anyone have ideas on what I might look into next?

Looking at your .conf file I think simply the "resources" tag is missed.

I verified the follwing (extremely simple) cluster.conf:

Code:
<?xml version="1.0"?>
<cluster name="holmes" config_version="5">

  <cman keyfile="/var/lib/pve-cluster/corosync.authkey">
  </cman>

  <clusternodes>
    <clusternode name="holmes1" votes="1" nodeid="1"/>
    <clusternode name="holmes2" votes="1" nodeid="2"/>
    <clusternode name="holmes3" votes="1" nodeid="3"/>
 
  </clusternodes>

  <rm>
   <resources>
     <ip address="10.22.2.100" monitor_link="1"/>
   </resources>
   
   <service name="floatip" autostart="1" recovery="relocate">
       <ip ref="10.22.2.100"/>
   </service>

  </rm>

</cluster>

Works successfully! Note the floating address is not shown by "ifconfig" but by "ip addr show".

Kind regards

Mr.Holmes
 
Mr. Holmes: Thanks for the suggestion!

I tried it but unfortunately things still aren't working correctly. I double checked ccs_config_validate for both configs and everything still passes. I think I'm going to try and force a node to be master and enable some debug logging to try and track things down.
 
I've noticed a few strange things that may be related. When running clustat, no services show. When running service rgmanager stop, the process just hangs even though there are no VMs running on the host machine.
clustat:
Cluster Status for CLTest @ Fri Dec 19 02:51:41 2014
Member Status: Quorate


Member Name ID Status
------ ---- ---- ------
host1 1 Online
host2 2 Online
host3 3 Online, Local


I'm in the process of migrating VMs, killing rgmanager, and doing a cman_tool leave force to test fencing and make sure each host VM joins successfully as another test...
 
Things are working now after the (migrate VM, killall -9 rgmanager, cman_tool leave force) cycle on each host (don't forget to wait for your clustered FS to synch! ). One host was hanging things up. It looks like rgmanager just started logging on all hosts and started the floating IP up after it came back online! Thanks for the help, hopefully this will help someone else.

FYI: Once the offending host was reset, rgmanager shutdown normally on other hosts without having to kill -9 it. rgmanager log output is similar on all hosts except the node #, node/host 3 which is holding the floating IP had some additional logging for the started floating IP.

host1-3:
Dec 19 03:09:26 rgmanager I am node #X
Dec 19 03:09:26 rgmanager Resource Group Manager Starting
Dec 19 03:09:26 rgmanager Loading Service Data
Dec 19 03:09:28 rgmanager Initializing Services
Dec 19 03:09:28 rgmanager [ip] 10.22.2.100 is not configured
Dec 19 03:09:28 rgmanager Services Initialized
~cut~
host3 only:
Dec 19 03:09:28 rgmanager Starting stopped service service:pvefloat
Dec 19 03:09:28 rgmanager [ip] Link for zzz_core: Detected
Dec 19 03:09:28 rgmanager [ip] Adding IPv4 address 10.22.2.100/22 to zzz_core
Dec 19 03:09:28 rgmanager [ip] Pinging addr 10.22.2.100 from dev zzz_core
Dec 19 03:09:30 rgmanager [ip] Sending gratuitous ARP: 10.22.2.100 <MACADDRHERE> brd ff:ff:ff:ff:ff:ff
Dec 19 03:09:31 rgmanager Service service:pvefloat started
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!