1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Odd bridge/VLAN/bond behaviour having upgraded two of three cluster machines from 4.4 to 5.0

Discussion in 'Proxmox VE: Networking and Firewall' started by Piers Dawson-Damer, Jul 13, 2017.

  1. Piers Dawson-Damer

    Piers Dawson-Damer New Member
    Proxmox VE Subscriber

    Joined:
    Jul 13, 2017
    Messages:
    2
    Likes Received:
    0
    With 4.4, I could assign IP addresses to two bridges, on different VLANs interfaces, on the same bond. Something, possibly the bridge configuration in Debian Stretch, is not happy with my setup.

    All machines use bonded NICs with VLANs. In both cases the last bridge to be configured is used for corosync cluster communications. It is on VLAN v1910. The first bridge interface has the default gateway, 172.17.83.1/24

    However with Proxmox 5.0, only the most recently created bridge passes traffic (that I can see). So pvemanager (and GUI running on remaining 4.4 machine), which uses the bridge on v1901, cannot see the 5.0 machines, but they do form part of the quorum via v1910. Also how I have SSH access.

    If I move the iface vmbr01 inet static stanza in /etc/network/interfaces below the stanza for vmbr10, I get the opposite configuration. pvemanager:yes, corosync:no. I can't figure it out. Driving me crazy. :confused:

    root@lab2:~# brctl show
    bridge name bridge id STP enabled interfaces
    vmbr0 8000.d8d385b7a3c1 no bond1
    vmbr01 8000.d8d385b7a3c0 no bond0.1901
    vmbr10 8000.d8d385b7a3c0 no bond0.1910

    root@lab2:~# ip link show bond0.1901
    11: bond0.1901@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr01 state UP mode DEFAULT group default qlen 1000
    link/ether d8:d3:85:b7:a3:c0 brd ff:ff:ff:ff:ff:ff
    root@lab2:~# ip link show bond0.1910
    26: bond0.1910@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master vmbr10 state UP mode DEFAULT group default qlen 1000
    link/ether d8:d3:85:b7:a3:c0 brd ff:ff:ff:ff:ff:ff

    root@lab2:~# ip -4 address show vmbr01
    12: vmbr01: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 172.17.83.14/24 brd 172.17.83.255 scope global vmbr01
    valid_lft forever preferred_lft forever
    root@lab2:~# ip -4 address show vmbr10
    27: vmbr10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 172.20.33.100/29 brd 172.20.33.103 scope global vmbr10
    valid_lft forever preferred_lft forever

    root@lab2:~# ip route show
    default via 172.17.83.1 dev vmbr01 onlink
    172.17.83.0/24 dev vmbr01 proto kernel scope link src 172.17.83.14
    172.20.33.96/29 dev vmbr10 proto kernel scope link src 172.20.33.100

    root@lab2:~# ip neigh
    172.20.33.97 dev vmbr10 lladdr 00:04:96:97:b9:04 STALE
    172.17.83.251 dev vmbr01 FAILED
    172.20.33.98 dev vmbr10 lladdr 78:e3:b5:f6:22:20 REACHABLE
    172.20.33.99 dev vmbr10 lladdr d8:d3:85:bb:40:c0 REACHABLE
    172.17.83.1 dev vmbr01 FAILED

    root@lab2:~# ping 172.17.83.1
    PING 172.17.83.1 (172.17.83.1) 56(84) bytes of data.
    From 172.17.83.14 icmp_seq=1 Destination Host Unreachable


    root@lab1:~# cat /etc/network/interfaces
    auto lo
    iface lo inet loopback

    iface eth0 inet manual

    iface eth1 inet manual

    iface eth2 inet manual

    iface eth3 inet manual

    auto bond0
    iface bond0 inet manual
    slaves eth0 eth1
    bond_miimon 100
    bond_mode 802.3ad
    bond_xmit_hash_policy layer3+4
    bond_lacp_rate 1
    bond_downdelay 200
    bond_updelay 200

    auto bond1
    iface bond1 inet manual
    slaves eth2 eth3
    bond_miimon 100
    bond_mode 802.3ad
    bond_xmit_hash_policy layer3+4
    bond_lacp_rate 1
    bond_downdelay 200
    bond_updelay 200

    auto bond0.1901
    iface bond0.1901 inet manual
    vlan-raw-device bond0

    auto bond0.1910
    iface bond0.1910 inet manual
    vlan-raw-device bond0

    auto vmbr0
    iface vmbr0 inet manual
    bridge_ports bond1
    bridge_stp off
    bridge_fd 0
    bridge_vlan_aware yes

    auto vmbr01
    iface vmbr01 inet static
    address 172.17.83.13
    netmask 255.255.255.0
    gateway 172.17.83.1
    bridge_ports bond0.1901
    bridge_stp off
    bridge_fd 0

    auto vmbr10
    iface vmbr10 inet static
    address 172.20.33.99
    netmask 255.255.255.248
    bridge_ports bond0.1910
    bridge_stp off
    bridge_fd 0
     
  2. manu

    manu Proxmox Staff Member
    Staff Member

    Joined:
    Mar 3, 2015
    Messages:
    670
    Likes Received:
    36
    > However with Proxmox 5.0, only the most recently created bridge passes traffic (that I can see). So pvemanager (and GUI running on remaining 4.4 machine), which uses the bridge on v1901, cannot see the 5.0 machines, but they do form part of the quorum via v1910. Also how I have SSH access.

    If you using this configuration, can you confirm, that all VMs which have a NIC in v1901 can communicate with each other ?

    Two more questions regarding your config:
    * if you're using the VLAN bond0.1910 only for corosync communication you do need to put this interface in a bridge you can directly assign an IP to bond0.1910
    * the only tested and supported bonding mode for a corosync network is active/passive ( bonding mode 1)

    Finally I would advise you to run tcpdump on the bond interface and have a look at the ARP traffic.

    With the command
    tcpdump -nnn -i BRIDGE_WITH_VM_NIC -e "arp and (host IP_OF_VM_IN_BRIDGEor host IP_OF_MACHINE_IN_LAN)"

    you can have a look at the who-has / reply traffic.
     
  3. Piers Dawson-Damer

    Piers Dawson-Damer New Member
    Proxmox VE Subscriber

    Joined:
    Jul 13, 2017
    Messages:
    2
    Likes Received:
    0
    Ah.. I found a misconfigured port on my switching fabric. Proxmox 4.4 must have tolerated it, albeit with many a dropped packet no doubt.
    One of the VLANs, on one of the physical interface mappings (out of a pair for bonding) within HPE VirtualConnect was incorrectly tagged. So, I guess, as long as the vmbr01 (v1901) interface was the last in /etc/network/interfaces, the last being addressed, the LACP bonding code was working I suppose.

    All good now.
    Also, I have taken your advice and moved the Corosync traffic of a bridge. Moving the L3 assignment down to the VLAN interface and have removed vmbr10.

    Thankyou very much for your assistance.
     

Share This Page