Duplicate ARP problem

Discussion in 'Proxmox VE: Networking and Firewall' started by Stuart Howlette, Aug 13, 2019.

  1. Stuart Howlette

    Stuart Howlette New Member

    Joined:
    Aug 13, 2019
    Messages:
    7
    Likes Received:
    0
    Hi all,

    We appear to be hitting an issue with duplicate ARP entries being generated by multiple Proxmox hosts. These are being generated for the management IP of the hosts.

    The running version of each component is listed below: -
    Code:
     
    pveversion -v
    proxmox-ve: 5.4-1 (running kernel: 4.15.18-15-pve)
    pve-manager: 5.4-6 (running version: 5.4-6/aa7856c5)
    pve-kernel-4.15: 5.4-3
    pve-kernel-4.15.18-15-pve: 4.15.18-40
    pve-kernel-4.15.18-7-pve: 4.15.18-27
    pve-kernel-4.15.17-1-pve: 4.15.17-9
    corosync: 2.4.4-pve1
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: 1.2-2
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.1-10
    libpve-apiclient-perl: 2.0-5
    libpve-common-perl: 5.0-52
    libpve-guest-common-perl: 2.0-20
    libpve-http-server-perl: 2.0-13
    libpve-storage-perl: 5.0-43
    libqb0: 1.0.3-1~bpo9
    lvm2: 2.02.168-pve6
    lxc-pve: 3.1.0-3
    lxcfs: 3.0.3-pve1
    novnc-pve: 1.0.0-3
    openvswitch-switch: 2.7.0-3
    proxmox-widget-toolkit: 1.0-28
    pve-cluster: 5.0-37
    pve-container: 2.0-39
    pve-docs: 5.4-2
    pve-edk2-firmware: 1.20190312-1
    pve-firewall: 3.0-22
    pve-firmware: 2.0-6
    pve-ha-manager: 2.0-9
    pve-i18n: 1.1-4
    pve-libspice-server1: 0.14.1-2
    pve-qemu-kvm: 3.0.1-2
    pve-xtermjs: 3.12.0-1
    qemu-server: 5.0-52
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3
    zfsutils-linux: 0.7.13-pve1~bpo2
    
    We appear to be hitting some strange behaviour where two interfaces on the hosts respond to ARP, with different MACs, and interestingly only if the source address of the ARP packet is 0.0.0.0.

    Code:
     ip a  | grep -EiA2 "vmbr0|vport0"
    vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
        link/ether 18:66:da:51:b3:eb brd ff:ff:ff:ff:ff:ff
    
    vport0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
        link/ether aa:83:29:09:fa:bc brd ff:ff:ff:ff:ff:ff
        inet 10.21.0.15/24 brd 10.21.0.255 scope global vport0
    
    In the packet captures, we see ARP replies with a source MAC address of 18:66:da:51:b3:eb and aa:83:29:09:fa:bc

    Previously this hasn't caused any issues. However, we recently upgraded our Juniper switching, which now enables ARP Suppression by default. This means that rather than both ARP messages being return for ARP replies, sometimes only one is getting through, and it so happens to be the one which starts blackholing traffic.

    We can recreate the issue using arping, by turning ARP suppression back on, and sending ARP packets to the IP with a source IP of 0.0.0.0.

    Using 0.0.0.0 as a source IP is a valid usage of ARP, and appears to be for duplicate ARP detection. Unfortunately this very detection is causing duplicate ARP responses, usefully enough!

    We see this across 3 separate hosts, and the only way we can stop this problem is disable ARP suppression on our switches. This isn't a good permanent fix, as the next version of JunOS will remove the ARP suppression feature.

    Does anyone have any insight into what could be causing this? I can provide PCAPs (although not yet, as I can't post links yet!) to show the behaviour.

    I am more than happy to provide more details if it helps!
     
  2. Stuart Howlette

    Stuart Howlette New Member

    Joined:
    Aug 13, 2019
    Messages:
    7
    Likes Received:
    0
    We may have worked around this, using the following sysctl option

    net.ipv4.conf.vmbr0.arp_ignore=2

    So far all is good. If this is the case, it would seem prudent for this to be part of the default Proxmox install I would have thought
     
  3. spirit

    spirit Well-Known Member

    Joined:
    Apr 2, 2010
    Messages:
    3,370
    Likes Received:
    140
    this is strange, your bridge shouldn't reply to arp request, as they are not any ip on it.

    what kind of interface is vport0 ?

    can you send your full /etc/network/interfaces ?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  4. Stuart Howlette

    Stuart Howlette New Member

    Joined:
    Aug 13, 2019
    Messages:
    7
    Likes Received:
    0
    # Main interface
    allow-vmbr0 bond0
    iface bond0 inet manual
    ovs_bonds eno3 eno4
    ovs_type OVSBond
    ovs_bridge vmbr0
    ovs_options lacp=active bond_mode=balance-tcp

    auto lo
    iface lo inet loopback

    # Interface to secondary network
    allow-vmbr1 eno1
    iface eno1 inet manual
    ovs_type OVSPort
    ovs_bridge vmbr1

    # Mirror to port capture server
    allow-vmbr0 eno2
    iface eno2 inet manual
    ovs_type OVSPort
    ovs_bridge vmbr0

    iface eno3 inet manual

    iface eno4 inet manual

    # Management interface
    allow-vmbr0 vport0
    iface vport0 inet static
    address 10.21.0.15
    netmask 255.255.255.0
    gateway 10.21.0.210
    ovs_type OVSIntPort
    ovs_bridge vmbr0

    # Secondary network
    allow-vmbr1 vport1
    iface vport1 inet static
    address 172.22.1.15
    netmask 255.255.255.0
    ovs_type OVSIntPort
    ovs_bridge vmbr1
    ovs_options tag=100

    auto vmbr0
    iface vmbr0 inet manual
    ovs_type OVSBridge
    ovs_ports bond0 vport0 eno2

    auto vmbr1
    iface vmbr1 inet manual
    ovs_type OVSBridge
    ovs_ports eno1 vport1

    We're using openvswitch on these.
     
  5. spirit

    spirit Well-Known Member

    Joined:
    Apr 2, 2010
    Messages:
    3,370
    Likes Received:
    140
    mmm, strange, this seem ok.

    I'm just curious about

    Code:
    # Mirror to port capture server
    allow-vmbr0 eno2
    iface eno2 inet manual
    ovs_type OVSPort
    ovs_bridge vmbr0
    
    how do you mirror traffic ?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  6. Stuart Howlette

    Stuart Howlette New Member

    Joined:
    Aug 13, 2019
    Messages:
    7
    Likes Received:
    0
    The mirror is done using OVS Mirrors.

    Code:
    ovs-vsctl -- set Bridge vmbr0 mirrors=@m \
     -- --id=@vm-01 get Port tap102i0 \
     -- --id=@vm-02 get Port tap158i0 \
     -- --id=@dest get Port eno2 \
     -- --id=@m create Mirror name=mirrorport select-src-port=@vm-01,@vm-02 output-port=@dest
    
     ovs-vsctl list Bridge vmbr0 | grep -i mirrors                                                                                                                                                                          mirrors             : [f8c0b5f4-0649-4331-ae5c-cd0eabf622f8]
    
    In looking around at issues with Linux bridging, I found that this isn't just tied to either Proxmox or Openvswitch. This happens with standard Linux bridging too, hence the application of the arp_ignore.
     
  7. Stuart Howlette

    Stuart Howlette New Member

    Joined:
    Aug 13, 2019
    Messages:
    7
    Likes Received:
    0
  8. spirit

    spirit Well-Known Member

    Joined:
    Apr 2, 2010
    Messages:
    3,370
    Likes Received:
    140
    as you don't use vlan on vport0, do you have tried to remove it and setup ip on vmbr0 directly ?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  9. Stuart Howlette

    Stuart Howlette New Member

    Joined:
    Aug 13, 2019
    Messages:
    7
    Likes Received:
    0
    We could, but that restricts us if we want to move to VLANs on vport0 in future. In the event that VLANs are passing through vmbr0, I would hope that it doesn't respond to ARP for the VLANs at least.

    More than anything, I'm wondering if people have seen similar behaviour. We have our workaround, which is working well, and why I'm suggesting it should probably be a default behaviour/parameter when using bridges on Proxmox without IPs on them.
     
  10. spirit

    spirit Well-Known Member

    Joined:
    Apr 2, 2010
    Messages:
    3,370
    Likes Received:
    140
    What I'm not sure, it that it'll repond to arp if the interface have a vlan.
    (maybe it's only a behaviour when interface have novlan and ip is setup on the interface)
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  11. Stuart Howlette

    Stuart Howlette New Member

    Joined:
    Aug 13, 2019
    Messages:
    7
    Likes Received:
    0
    That is kind of my assumption, however I also assumed an interface without an IP would never respond to ARP! :D
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice