wierd network problem running proxmox in lacp with switch failover scenario

offerlam

Renowned Member
Dec 30, 2012
218
0
81
Denmark
Hi all,

Ok so active followers on this forum knows i have some major network problems with my 3 node proxmox installation using two HP 1810-24G v2 switches..

All nodes are configured with LACP and connected too each switch with only one network cable for each switch..

The switches are also trunked togeather with one network cable..

I have done some more troupleshooting..

my problem is this..

Say i need to boot my switches.. implementing a new firmware..

I update switch 3 and 2 and than reboot switch 3 and when it comes online reboots switch 2 and waits for it to come online..

when everything is up and i can ping all 3 nodes fine..

I log on to node0 and ping the two other nodes..

I log on to node 2 and ping the two other nodes..

I log on to node 1 and can't ping anything.. - I didn't test if i could ping my mashine over the VPN but i could connect to it using putty..

Ok so i decide to reboot node 1.. but this does not take effect right away.. so i try again to ping and now i CAN ping node 0 and node 2.. short after my reboot command takes effect and node 1 reboots..

When node 1 comes back online i recive major packet loose from node 0 and 2 AND my own mashine..

I can connect with a SSH from node 2 to node 1 and do a "service networking restart" command.. but i can clearly feel the packet loose as i type the command and wait for it effect..

Imidiatly after my service restart node 1 answers ping from all other nodes and my mashine over vpn..

ANYONE know why that happens..

I have seen this happen many times during switch failover test and network card failover test on each node.. testing network card mean i unpluck one network cable from one of the network cards and expect LACP to fail over correctly.. during these test this can happen to any number of nodes.. I have seen it happen too two nodes at a time... but usually it only happens to one... and it doesn't have to be node 1..

HELP!!!!

I will gladly provide ANY logs if you guys would tell where they are :)
 
Are there any VM's with identical MAC? Since MAC is part of the algorithm used to distribute IP packages it is important that no two nics shares the same MAC.
Default route is also important. VM's on the same network must share default gateway.
 
Hi Mir,

AND THANKS FOR ANSWERING!! .. this is a real problem for me.

mir
Re: wierd network problem running proxmox in lacp with switch failover scenario

Forgot one thing. The switches used for the bonds share the same default gateway on identical networks?

the are connected to two cyberoam firewalls in a HA configuration... I have had talks with Cyberoam and they assuered me that according to other network device including the switches themselves they would appear as
one ip
one mac
and therefore one GW..

Even in failover scenarios..

mir
Re: wierd network problem running proxmox in lacp with switch failover scenario

Are there any VM's with identical MAC? Since MAC is part of the algorithm used to distribute IP packages it is important that no two nics shares the same MAC.
Default route is also important. VM's on the same network must share default gateway.

I'm not really sure what you mean here ..

Here is a screenshot of my proxmox dashboard

https://www.flickr.com/photos/122448727@N03/13658906804/

as you can see i have only VMs running on proxmox00

and I have a issue with a VM existing on both promox01 and 02.. if you have a quick fix for that it would be welcommed.. i haven't had the time to look into it yet and its not a important vm.. YET :P

As i don't really understand you question i will provide

ifconfig details for proxmox00

Code:
root@proxmox00:~# ifconfig
bond0     Link encap:Ethernet  HWaddr 80:c1:6e:64:8d:3c
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:103654462 errors:0 dropped:0 overruns:0 frame:0
          TX packets:79952147 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:31817846362 (29.6 GiB)  TX bytes:21676458680 (20.1 GiB)

bond0.2   Link encap:Ethernet  HWaddr 80:c1:6e:64:8d:3c
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:9971843 errors:0 dropped:0 overruns:0 frame:0
          TX packets:57812 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:2156669732 (2.0 GiB)  TX bytes:26658656 (25.4 MiB)

bond0.3   Link encap:Ethernet  HWaddr 80:c1:6e:64:8d:3c
          UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:28278 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:1866356 (1.7 MiB)

eth2      Link encap:Ethernet  HWaddr 80:c1:6e:64:8d:3c
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:9000  Metric:1
          RX packets:80765401 errors:0 dropped:0 overruns:0 frame:0
          TX packets:78179308 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:23536764930 (21.9 GiB)  TX bytes:21456382176 (19.9 GiB)

eth3      Link encap:Ethernet  HWaddr 80:c1:6e:64:8d:3c
          UP BROADCAST RUNNING SLAVE MULTICAST  MTU:9000  Metric:1
          RX packets:22889061 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1772839 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:8281081432 (7.7 GiB)  TX bytes:220076504 (209.8 MiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:1462169 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1462169 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:11530933099 (10.7 GiB)  TX bytes:11530933099 (10.7 GiB)

tap102i0  Link encap:Ethernet  HWaddr f2:90:32:42:fa:b7
          inet6 addr: fe80::f090:32ff:fe42:fab7/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:1 carrier:0
          collisions:0 txqueuelen:500
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

tap112i0  Link encap:Ethernet  HWaddr 56:c5:6d:d0:5c:64
          inet6 addr: fe80::54c5:6dff:fed0:5c64/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:1 carrier:0
          collisions:0 txqueuelen:500
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

tap113i0  Link encap:Ethernet  HWaddr e6:6d:d3:67:53:c8
          inet6 addr: fe80::e46d:d3ff:fe67:53c8/64 Scope:Link
          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
          RX packets:27586 errors:0 dropped:0 overruns:0 frame:0
          TX packets:76863 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500
          RX bytes:24586691 (23.4 MiB)  TX bytes:69837859 (66.6 MiB)

venet0    Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          inet6 addr: fe80::1/128 Scope:Link
          UP BROADCAST POINTOPOINT RUNNING NOARP  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:3 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

vmbr0     Link encap:Ethernet  HWaddr 80:c1:6e:64:8d:3c
          inet addr:10.10.99.20  Bcast:10.10.99.255  Mask:255.255.255.0
          inet6 addr: fe80::82c1:6eff:fe64:8d3c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:73836031 errors:0 dropped:0 overruns:0 frame:0
          TX packets:71290997 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:20638545217 (19.2 GiB)  TX bytes:20879675396 (19.4 GiB)

vmbr1     Link encap:Ethernet  HWaddr 80:c1:6e:64:8d:3c
          inet6 addr: fe80::82c1:6eff:fe64:8d3c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:297409 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:13677802 (13.0 MiB)  TX bytes:750 (750.0 B)

vmbr3     Link encap:Ethernet  HWaddr 80:c1:6e:64:8d:3c
          inet6 addr: fe80::82c1:6eff:fe64:8d3c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:9 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:750 (750.0 B)

root@proxmox00:~#

and interface content of proxmox00

Code:
root@proxmox00:~# cat /etc/network/interfaces
# network interface settings
auto lo
iface lo inet loopback

iface eth0 inet manual

auto eth1
iface eth1 inet manual

iface eth2 inet manual

iface eth3 inet manual

auto bond0
iface bond0 inet manual
        slaves eth2 eth3
        bond_miimon 100
        bond_mode 802.3ad

auto vmbr0
iface vmbr0 inet static
        address  10.10.99.20
        netmask  255.255.255.0
        gateway  10.10.99.1
        bridge_ports bond0
        bridge_stp off
        bridge_fd 0
        bridge_maxage 0
        bridge_ageing 0
        bridge_maxwait 0
        pre-up ifconfig eth2 mtu 9000
        pre-up ifconfig eth3 mtu 9000

auto vmbr1
iface vmbr1 inet manual
        bridge_ports bond0.2
        bridge_stp off
        bridge_fd 0
        bridge_maxage 0
        bridge_ageing 0
        bridge_maxwait 0

auto vmbr3
iface vmbr3 inet manual
        bridge_ports bond0.3
        bridge_stp off
        bridge_fd 0
        bridge_maxage 0
        bridge_ageing 0
        bridge_maxwait 0

If none of this information gave you what you asked for please elaborate :) sorry :)

Thank for your help so fare.. !!

Casper


EDIT:

As you can see i have only VMs running on proxmox00 and my issue this time appeared on promox01.. so im not sure its VM mac related.. also what throws me from your question is why Vm MAC is related since this is a host problem? TEACH ME WISE ONE! :P
 
Last edited:
More information..

I noticed i hadn't put bond0.3 on all the nodes so i did that rebooted them one by one... this time it was proxmox00 who experinced packet loose after reboot.. and was fixed with the "service networking restart" command.. but i noticed this time... and it did this with proxmox01 when it had the problem too... that i do get something that might look loke a error message when i do it..

Code:
Linux proxmox00 2.6.32-23-pve #1 SMP Tue Aug 6 07:04:06 CEST 2013 x86_64

The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
Last login: Sun Apr  6 06:37:55 2014 from 10.81.234.7
root@proxmox00:~# service networking restart
Running /etc/init.d/networking restart is deprecated because it may not re-enable some interfaces ... (warning).
Reconfiguring network interfaces...Removed VLAN -:bond0.3:-
Removed VLAN -:bond0.2:-
Set name-type for VLAN subsystem. Should be visible in /proc/net/vlan/config
Added VLAN with VID == 2 to IF -:bond0:-
Set name-type for VLAN subsystem. Should be visible in /proc/net/vlan/config
Added VLAN with VID == 3 to IF -:bond0:-
grep: unrecognized option '--all'
Usage: grep [OPTION]... PATTERN [FILE]...
Try 'grep --help' for more information.
done.

grep: unrecongnized option '--all'

I don't know if thats important or not..
 
"Linux proxmox00 2.6.32-23-pve #1 SMP Tue Aug 6 07:04:06 CEST 2013 x86_64"

I can see that your installation is very old so before doing any more debugging upgrade to latest version since a lot has changed between your running version and current version.
 
this is a typical install from the proxmox iso image?

excuse me for perhaps asking stupid but how would you expect me to upgrade? apt-get upgrade or ?

Thanks

Casper
 
Hi

All nodes are configured with LACP and connected too each switch with only one network cable for each switch..


So you are doing lacp, each proxmox node with 1 cable on switch1 and 1 cable on switch2, right ?

The switches are also trunked togeather with one network cable..
what do you mean by trunked ? because, if you want to do lacp with 2 switches, you need to have a stack. (both switchs are seen like one)
 
Hi Spirit!

Actually thats no the case..

Yes im doing lacp using only two network ports per save and that is actually the key why its working..

My switches are HP 1810-24G v2 which are not stakble .. I have trunked them togeather using lacp and port 24 on both switches.. I not really sure this is even nessesary... but im not sure..

It would NOT work if I had more than two interfaces in a LACP group..well maybe yes if i had 3 switches... but for it to work with unstackble switche im only allowed one cable per server per switch..
 
but im puzzled why mir says my proxmox is outdated... its just a typical install from the iso provided by the proxmox site .. installed it december 2013.. im not sure what he means by my linux being outdated or even how to update it correctly
 
Hi Spirit!

Actually thats no the case..

Yes im doing lacp using only two network ports per save and that is actually the key why its working..

My switches are HP 1810-24G v2 which are not stakble .. I have trunked them togeather using lacp and port 24 on both switches.. I not really sure this is even nessesary... but im not sure..

It would NOT work if I had more than two interfaces in a LACP group..well maybe yes if i had 3 switches... but for it to work with unstackble switche im only allowed one cable per server per switch..

you can't do lacp with 2 non-stackable switches, it's impossible.

use active-backup, should works out of the box.
 
See this: https://pve.proxmox.com/wiki/Package_repositories

Current version of proxmox: dpkg -s proxmox-ve-2.6.32
Package: proxmox-ve-2.6.32
Status: install ok installed
Priority: optional
Section: admin
Maintainer: Proxmox Support Team <support@proxmox.com>
Architecture: all
Version: 3.2-121
Replaces: proxmox-ve, pve-kernel, proxmox-virtual-environment
Provides: proxmox-virtual-environment
Depends: libc6 (>= 2.7-18), pve-kernel-2.6.32-27-pve, pve-firmware, pve-manager, qemu-server, pve-qemu-kvm, openssh-client, openssh-server, apt, vncterm, vzctl (>= 3.0.29)
Conflicts: proxmox-ve, pve-kernel, proxmox-virtual-environment
Description: The Proxmox Virtual Environment
The Proxmox Virtual Environment is an easy to use Open Source
virtualization platform for running Virtual Appliances and Virtual
Machines. This is a virtual package which will install everything
needed. This package also depends on the latest available proxmox
kernel from the 2.6.32 series.
 
you can't do lacp with 2 non-stackable switches, it's impossible.

use active-backup, should works out of the box.

I was told by proliant second level support that it would work.. it was a Orthodox way of doing it but it should work..

@Mir,

I have just updated to proxmox 3.2.. i will test tonite if im a happy camper or not