Hello all. Below is what I have. My overall goal is to have the ability to have a module from my web site still communicate within the cluster if a node goes down. Also utilizing dnat I can still use the proxmox firewall effectively. I have already reached out to support for the zen load balancer, but wanted to see if anyone here could offer anything else I might be missing to get this 100% working
External block of IP's coming in through Charter (nat'ed to subinterfaces/vlans)
Cisco 1841 with sub interface 10 and 11
Cisco 3560G vlans 10 and 11
4 proxmox servers trunked to 3560G for vlans 10 and 11
prox1 (eth0 10.10.10.201 / no gateway mgmt network) (eth1.11 10.10.11.201 / gw 10.10.11.254 backend)
prox2 (eth0 10.10.10.202 / no gateway mgmt network) (eth1.11 10.10.11.202 / gw 10.10.11.254 backend)
prox3 (eth0 10.10.10.203 / no gateway mgmt network) (eth1.11 10.10.11.203 / gw 10.10.11.254 backend)
prox4 (eth0 10.10.10.204 / no gateway mgmt network) (eth1.11 10.10.11.204 / gw 10.10.11.254 backend)
2 zen load balancers setup in cluster mode (virtual machines inside the proxmox cluster)
proxlb1 10.10.10.211 eth0 mgmt ip
proxlb2 10.10.10.212 eth0 mgmt ip
10.10.10.213 vip for cluster services
proxlb1 eth1 10.10.11.252
proxlb2 eth1 10.10.11.253
10.10.11.254 vip for gateway for backend computers
10.10.10.221 vip for farm lx4nat with dnat
I have tried to use 10.10.10.221 and 10.10.11.254 to access my web gui. Both pass me to https://proxmoxip:8006 and I can log in and do things. The fact that I had to set a gateway of the VIP from the LB (10.10.11.254) on the nics for proxmox has caused so much trouble. HA does not seem to be working at all and when i simulate a failed server and turn it back on, it refuses to communicate in the cluster. Below are some commands for troubleshooting.
root@prox1:~# clustat
Cluster Status for StratoCluster1 @ Fri Nov 21 13:45:55 2014
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
prox1 1 Online, Local, rgmanager
prox2 2 Online, rgmanager
prox3 3 Online, rgmanager
prox4 4 Online
Service Name Owner (Last) State
------- ---- ----- ------ -----
pvevm:100 (prox4) stopped
pvevm:103 (prox4) stopped
cluster.conf
<?xml version="1.0"?>
<cluster config_version="16" name="StratoCluster1">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<rm>
<failoverdomains>
<failoverdomain name="proxfailover" nofailback="0" ordered="0" restricted="1">
<failoverdomainnode name="prox1"/>
<failoverdomainnode name="prox2"/>
<failoverdomainnode name="prox3"/>
<failoverdomainnode name="prox4"/>
</failoverdomain>
</failoverdomains>
<pvevm autostart="1" vmid="103" domain="proxfailover" recovery="relocate"/>
<pvevm autostart="1" vmid="100" domain="proxfailover" recovery="relocate"/>
</rm>
<clusternodes>
<clusternode name="prox1" nodeid="1" votes="1"/>
<clusternode name="prox2" nodeid="2" votes="1"/>
<clusternode name="prox3" nodeid="3" votes="1"/>
<clusternode name="prox4" nodeid="4" votes="1"/>
</clusternodes>
</cluster>
root@prox1:~# fence_tool ls
fence domain
member count 4
victim count 0
victim now 0
master nodeid 1
wait state none
members 1 2 3 4
root@prox4:~# service rgmanager status
rgmanager (pid 3388 3387) is running...
root@prox4:~# ps 3388
PID TTY STAT TIME COMMAND
3388 ? D<l 0:00 rgmanager
root@prox4:~# ps 3387
PID TTY STAT TIME COMMAND
3387 ? S<Ls 0:00 rgmanager
The machine prox4 has been rebooted multiple times. rgmanager shows that it is running in one place but not the other. I can ssh to it from prox1 but in the gui i get connection refused for prox4
Any help is greatly appreciated!
External block of IP's coming in through Charter (nat'ed to subinterfaces/vlans)
Cisco 1841 with sub interface 10 and 11
Cisco 3560G vlans 10 and 11
4 proxmox servers trunked to 3560G for vlans 10 and 11
prox1 (eth0 10.10.10.201 / no gateway mgmt network) (eth1.11 10.10.11.201 / gw 10.10.11.254 backend)
prox2 (eth0 10.10.10.202 / no gateway mgmt network) (eth1.11 10.10.11.202 / gw 10.10.11.254 backend)
prox3 (eth0 10.10.10.203 / no gateway mgmt network) (eth1.11 10.10.11.203 / gw 10.10.11.254 backend)
prox4 (eth0 10.10.10.204 / no gateway mgmt network) (eth1.11 10.10.11.204 / gw 10.10.11.254 backend)
2 zen load balancers setup in cluster mode (virtual machines inside the proxmox cluster)
proxlb1 10.10.10.211 eth0 mgmt ip
proxlb2 10.10.10.212 eth0 mgmt ip
10.10.10.213 vip for cluster services
proxlb1 eth1 10.10.11.252
proxlb2 eth1 10.10.11.253
10.10.11.254 vip for gateway for backend computers
10.10.10.221 vip for farm lx4nat with dnat
I have tried to use 10.10.10.221 and 10.10.11.254 to access my web gui. Both pass me to https://proxmoxip:8006 and I can log in and do things. The fact that I had to set a gateway of the VIP from the LB (10.10.11.254) on the nics for proxmox has caused so much trouble. HA does not seem to be working at all and when i simulate a failed server and turn it back on, it refuses to communicate in the cluster. Below are some commands for troubleshooting.
root@prox1:~# clustat
Cluster Status for StratoCluster1 @ Fri Nov 21 13:45:55 2014
Member Status: Quorate
Member Name ID Status
------ ---- ---- ------
prox1 1 Online, Local, rgmanager
prox2 2 Online, rgmanager
prox3 3 Online, rgmanager
prox4 4 Online
Service Name Owner (Last) State
------- ---- ----- ------ -----
pvevm:100 (prox4) stopped
pvevm:103 (prox4) stopped
cluster.conf
<?xml version="1.0"?>
<cluster config_version="16" name="StratoCluster1">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<rm>
<failoverdomains>
<failoverdomain name="proxfailover" nofailback="0" ordered="0" restricted="1">
<failoverdomainnode name="prox1"/>
<failoverdomainnode name="prox2"/>
<failoverdomainnode name="prox3"/>
<failoverdomainnode name="prox4"/>
</failoverdomain>
</failoverdomains>
<pvevm autostart="1" vmid="103" domain="proxfailover" recovery="relocate"/>
<pvevm autostart="1" vmid="100" domain="proxfailover" recovery="relocate"/>
</rm>
<clusternodes>
<clusternode name="prox1" nodeid="1" votes="1"/>
<clusternode name="prox2" nodeid="2" votes="1"/>
<clusternode name="prox3" nodeid="3" votes="1"/>
<clusternode name="prox4" nodeid="4" votes="1"/>
</clusternodes>
</cluster>
root@prox1:~# fence_tool ls
fence domain
member count 4
victim count 0
victim now 0
master nodeid 1
wait state none
members 1 2 3 4
root@prox4:~# service rgmanager status
rgmanager (pid 3388 3387) is running...
root@prox4:~# ps 3388
PID TTY STAT TIME COMMAND
3388 ? D<l 0:00 rgmanager
root@prox4:~# ps 3387
PID TTY STAT TIME COMMAND
3387 ? S<Ls 0:00 rgmanager
The machine prox4 has been rebooted multiple times. rgmanager shows that it is running in one place but not the other. I can ssh to it from prox1 but in the gui i get connection refused for prox4
Any help is greatly appreciated!