OpenVZ Containers lose internet connection (VLAN/venet)

Discussion in 'Proxmox VE 1.x: Installation and configuration' started by cle.ram, Feb 8, 2012.

  1. cle.ram

    cle.ram New Member

    Joined:
    Jun 15, 2009
    Messages:
    17
    Likes Received:
    0
    I have rather weird issues with my OpenVZ containers using venet and running in Proxmox 1.9 (Kernel 2.6.32-6-pve, fully updated as of 2012-02-08) hosted at OVH.
    The Setup:
    - a private 172.16.0.0/12 and a public x.x.x.x/28 subnet are assigned to the VLAN-Interface
    - the Containers are each provided with ONLY a public IP out of the /28 subnet to their venet-Interface

    Problem:
    At first the connectivity of the containers seem just fine for about four hours (the timeframe is reproducable).
    After this time they are no longer accessible by other machines in the public x.x.x.x/28 subnet or the internet. They cannot ping other machines with public IPs, too.
    The strange part is, that the containers can ping machines in the private 172.16.0.0/12 subnet all the time - but cannot be pinged from these machines.
    The containers public IPs also respond when directly accessed from the HN.

    This issues are gone (for the next four hours), when the containers are rebooted.

    /etc/network/interfaces:
    Code:
    # The loopback network interface
    auto lo
    iface lo inet loopback
    
    # for Routing
    auto vmbr1
    iface vmbr1 inet manual
        post-up /etc/pve/kvm-networking.sh
        bridge_ports dummy0
        bridge_stp off
        bridge_fd 0
    
    # vmbr0: Bridging. Make sure to use only MAC adresses that were assigned to you.
    auto vmbr0
    iface vmbr0 inet static
        address xxx.xxx.xxx.xxx
        netmask xxx.xxx.xxx.xxx
        network xxx.xxx.xxx.xxx
        broadcast xxx.xxx.xxx.xxx
        gateway xxx.xxx.xxx.xxx
        bridge_ports eth0
        bridge_stp off
        bridge_fd 0
    
    auto eth0.XXXX
    iface eth0.XXXX inet static
            address 172.16.0.2 
            netmask 255.240.0.0 
            post-up ip r a 172.16.0.0/12 via 172.31.255.254 dev eth0.XXXX; true 
    
    auto eth0.XXXX:0
    iface eth0.XXXX:0 inet static 
            address xxx.xxx.xxx.2 
            network xxx.xxx.xxx.0 
            broadcast xxx.xxx.xxx.15 
            gateway xxx.xxx.xxx.14 
            netmask 255.255.255.240 
            post-up /sbin/ip route add default via xxx.xxx.xxx.14  dev eth0.XXXX table 125
            post-up /sbin/ip rule add from xxx.xxx.xxx.0/28 table 125 
            post-down /sbin/ip route del default via xxx.xxx.xxx.14 dev eth0.XXXX table 125
            post-down /sbin/ip rule del from xxx.xxx.xxx.0/28 table 125
    
    XXXX being the VLAN-ID


    /etc/sysctl.conf:
    Code:
    net.ipv4.ip_forward=1
    net.ipv4.conf.default.forwarding=1
    net.ipv4.conf.default.proxy_arp = 1
    net.ipv4.conf.eth0/XXXX.proxy_arp = 1
    


    route -n:
    Code:
     Kernel IP routing table
    Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
    xxx.xxx.xxx.7    0.0.0.0         255.255.255.255 UH    0      0        0 venet0
    xxx.xxx.xxx.9    0.0.0.0         255.255.255.255 UH    0      0        0 venet0
    xxx.xxx.xxx.8    0.0.0.0         255.255.255.255 UH    0      0        0 venet0
    xxx.xxx.xxx.11   0.0.0.0         255.255.255.255 UH    0      0        0 venet0
    xxx.xxx.xxx.10  0.0.0.0         255.255.255.255 UH    0      0        0 venet0
    xxx.xxx.xxx.0    0.0.0.0         255.255.255.240 U     0      0        0 eth0.XXXX
    xxx.xxx.xxx.0   0.0.0.0         255.255.255.0   U     0      0        0 vmbr0
    172.16.0.0        0.0.0.0         255.240.0.0     U     0      0        0 eth0.XXXX
    0.0.0.0         xxx.xxx.xxx.14   0.0.0.0         UG    0      0        0 eth0.XXXX
    0.0.0.0         xxx.xxx.xxx.254  0.0.0.0         UG    0      0        0 vmbr0
    
    I hope someone has an idea how I could solve these issues...
    Thanks in advance!



    EDIT:

    tcpdump -i eth0.XXXX shows that ARP-Requests to a affected IP are beeing received by the HN but are not answered.

    If I ping from inside a container, "tcpdump -i eth.XXXX" on the HN shows that echo requests are beeing transmitted but again: ARP-Requests which are not beeing answered and no echo replies.

    If I ping some machine in 172.16.0.0/12 there the echo replies get to the virtual container:
    Code:
    listening on eth0.XXXX, link-type EN10MB (Ethernet), capture size 96 bytes
    03:57:41.596299 IP [I]PUBLIC-IP-OF-CONTAINER[/I] > 172.16.0.4: ICMP echo request, id 56072, seq 1, length 64
    03:57:41.599311 arp who-has [I]PUBLIC-IP-OF-CONTAINER[/I] tell 172.16.0.4
    03:57:41.904644 IP 172.16.0.4 > [I]PUBLIC-IP-OF-CONTAINER[/I]: ICMP echo reply, id 56072, seq 1, length 64
    
     
    #1 cle.ram, Feb 8, 2012
    Last edited: Feb 9, 2012
  2. cle.ram

    cle.ram New Member

    Joined:
    Jun 15, 2009
    Messages:
    17
    Likes Received:
    0
    I want to document my findings (including a solution) here, if anyone else has similar problems.

    To make it short: the configuration suggested by OVH won't work correctly with any Linux based system with kernel version 2.6.31 or newer.

    Background:
    Beginning with 2.6.31, the "rp_filter=1" (reverse path filter) setting has become working correctly (in earlier versions it was just acting as rp_filter=0 regardless what setting was applied to it). If rp_filter is enabled (which is the default nowadays in most distributions), it checks if an incoming packet is actually directed to some network that really is accessible by that machine (it's basically checked against the FIB). Furthermore there are two modes of rp_filter: Strict mode (rp_filter=1) and loose mode (2). In strict mode, it is checked additionally, if the incoming packet has arrived exactly on that interface which is meant to get traffic originating from the source. If such packet arrives on any other interface it will be dropped in strict mode. Loose mode (rp_filter=2) instead matches only against the FIB and accepts the packet regardless of the interface. Setting rp_filter=0 will disable rp_filtering completely resulting in no checks at all which is not recommended and rather dirty.

    To get back to the setup from OVH:
    As soon as the VLAN configuration is done, the vmbr0 interface (and the associated IP) will not respond to any ARP requests anymore regarding packets originated from the internet (because of rp_filter=1 evaluates vmbr0 as unsuitable for that packets). The VLAN-IP of the HN itself is still accessible by the internet. Due the internal routing of OVH, all traffic that originates from the internet and is meant to go to the virtual nodes using venet, will arrive on vmbr0 (which doesnt respond to arp requests), thus the connection is timing out.
    Now I'll reconstruct, why the connectivity to the VNs seems to work fine for four hours at first:
    If a OpenVZ container is beeing restarted, it's HN associated ARP information seems to be broadcasted over all interfaces (unsure if that is 100% correct, but at least it does kind of that). The router of the OVH-Infrastructure gets these and puts it into its internal ARP table. If any packets to the public VLAN-subnet are arriving now, they are processed corretly and the VN is accessible. But the OVH-Router (seems to be a Cisco) has an ARP-cache timout of exactly four hours. After that timeframe the entrys are beeing deleted in the ARP table of the OVH-Router. The VNs are now inaccessible because no more arp/ip-relations are available in the OVH-Router and the HN's vmbr0 (again) doesn't respond to the now coming ARP requests.

    Fix: setting "net.ipv4.conf.vmbr0.rp_filter = 2" in sysctl.conf solves the problem (I additionally had to do "echo 2 > /proc/sys/net/ipv4/conf/vmbr0/rp_filter" get it active on a running HN ).
    If anyone has an idea of a more ideal solution (probably by adding some crazy post-up routing rules to /etc/network/interfaces) I'd be happy to hear.

    Other notes (but possibly with no relevance after setting "rp_filter=2"):
    Before I found the rp_filter issues, I was trying to get the VNs to ping each other in the "4-hour-connectivity-timeframe", which was at fist not working because of the proposed routing settings by OVH in /etc/network/interfaces. I got it fixed by setting following routes/rules in the VLAN specific part:
    Code:
    #VLAN Config:
    
    
    auto eth0.XXXX
    iface eth0.XXXX inet static
            address 172.16.0.2 
            netmask 255.240.0.0 
            post-up   ip route add 172.16.0.0/12 via 172.31.255.254 dev eth0.XXXX
            post-down ip route del 172.16.0.0/12 via 172.31.255.254 dev eth0.XXXX
    
    
    
    
    auto eth0.XXXX:0
    iface eth0.XXXX:0 inet static 
            address xxx.xxx.xxx.xxx.2 
            network xxx.xxx.xxx.0 
            broadcast xxx.xxx.xxx.15 
            gateway xxx.xxx.xxx.14 
            netmask 255.255.255.240 
            post-up    /sbin/ip rule add to xxx.xxx.xxx.0/28 lookup main pref 125              
            post-up    /sbin/ip rule add from xxx.xxx.xxx.0/28 iif venet0 table 125            
            post-up    /sbin/ip route add default via xxx.xxx.xxx.14 dev eth0.XXXX:0 table 125
            post-down  /sbin/ip rule del to xxx.xxx.xxx.0/28 lookup main pref 125                
            post-down  /sbin/ip rule del from xxx.xxx.xxx.0/28 iif venet0 table 125             
            post-down  /sbin/ip route del default via xxx.xxx.xxx.14 dev eth0.XXXX:0 table 125
    
    It would be fine if some Mod could mark this thread as SOLVED.
    Thanks!
     
    #2 cle.ram, Feb 9, 2012
    Last edited: Feb 9, 2012
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice