Network suddenly drops

Discussion in 'Proxmox VE: Networking and Firewall' started by Sergio Fernandez, Feb 7, 2019.

  1. Sergio Fernandez

    Sergio Fernandez New Member

    Joined:
    Feb 7, 2019
    Messages:
    3
    Likes Received:
    0
    Hi all,

    I'm facing a strange behaviour in my office proxmox server. Periodically (sometimes every month, sometimes with days of difference), the network interfaces gets down and I lost connectivity, both the host and VMS. Rebooting the host solves the issue until the next time.
    I've been mad looking for error mesages in all logs but nothing, also checked other network elements with no luck.
    My server has two bridges. One of them has a bond with two nics to a switch using LACP. The other has one NIC directly o our ISP device for Internet connectivity. I thought that can be some issue with the LACP, but in fact both bridges gets down, sometimes at the same time, other not.

    My environment is:

    Code:
    root@multivac:/var/log# pveversion --verbose
    proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve)
    pve-manager: 5.1-41 (running version: 5.1-41/0b958203)
    pve-kernel-4.13.13-2-pve: 4.13.13-32
    libpve-http-server-perl: 2.0-8
    lvm2: 2.02.168-pve6
    corosync: 2.4.2-pve3
    libqb0: 1.0.1-1
    pve-cluster: 5.0-19
    qemu-server: 5.0-18
    pve-firmware: 2.0-3
    libpve-common-perl: 5.0-25
    libpve-guest-common-perl: 2.0-14
    libpve-access-control: 5.0-7
    libpve-storage-perl: 5.0-17
    pve-libspice-server1: 0.12.8-3
    vncterm: 1.5-3
    pve-docs: 5.1-12
    pve-qemu-kvm: 2.9.1-5
    pve-container: 2.0-18
    pve-firewall: 3.0-5
    pve-ha-manager: 2.0-4
    ksm-control-daemon: 1.2-2
    glusterfs-client: 3.8.8-1
    lxc-pve: 2.1.1-2
    lxcfs: 2.0.8-1
    criu: 2.11.1-1~bpo90
    novnc-pve: 0.6-4
    smartmontools: 6.5+svn4324-1
    zfsutils-linux: 0.7.3-pve1~bpo9
    
    My network interfaces config is:

    Code:
    root@multivac:/var/log# cat /etc/network/interfaces
    # network interface settings; autogenerated
    # Please do NOT modify this file directly, unless you know what
    # you're doing.
    #
    # If you want to manage part of the network configuration manually,
    # please utilize the 'source' or 'source-directory' directives to do
    # so.
    # PVE will preserve these directives, but will NOT its network
    # configuration from sourced files, so do not attempt to move any of
    # the PVE managed interfaces into external files!
    
    auto lo
    iface lo inet loopback
    
    auto eno2
    iface eno2 inet static
        address  172.22.1.5
        netmask  255.255.255.0
        gateway  172.22.1.1
    #Management NIC
    
    iface eno1 inet manual
    
    auto enp6s0f0
    iface enp6s0f0 inet manual
    
    auto enp6s0f1
    iface enp6s0f1 inet manual
    
    auto bond0
    iface bond0 inet manual
        slaves enp6s0f0 enp6s0f1
        bond_miimon 100
        bond_mode 802.3ad
        bond_xmit_hash_policy layer2+3
    #General LACP Bond for VMs
    
    auto vmbr1
    iface vmbr1 inet manual
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0
    #Internet Access for pfsense
    
    auto vmbr2
    iface vmbr2 inet manual
        bridge_ports bond0
        bridge_stp on
        bridge_fd 0
        bridge_vlan_aware yes
    #VM General Purpose Bridge
    
    Do you guys know where can I look for more logging? Somebody facing similar issue?
    THank you very much.
     
    #1 Sergio Fernandez, Feb 7, 2019
    Last edited: Feb 7, 2019
  2. wolfgang

    wolfgang Proxmox Staff Member
    Staff Member

    Joined:
    Oct 1, 2014
    Messages:
    4,257
    Likes Received:
    269
    Hi,

    your installation is quite old please update to current version.
    What you write sounds like a kernel problem, so the only way get rid of it is to update your system.
    Or you switch has a problem with LACP, check also if new firmware is available.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. Sergio Fernandez

    Sergio Fernandez New Member

    Joined:
    Feb 7, 2019
    Messages:
    3
    Likes Received:
    0
    Hi @wolfgang, I've discarded LACP problem because sometimes only the network interface which is not attached to that bond ges down, but I'll review it again. Also I'll try upgrading next week and give you feedback. Thanks for your help.
     
  4. Sergio Fernandez

    Sergio Fernandez New Member

    Joined:
    Feb 7, 2019
    Messages:
    3
    Likes Received:
    0
    Hi @wolfgang. Finally I was able to upgrade to 5.3 succesfully. Five minutes after reboot and VMs started working, the network went down, but this time I had a kernel log telling me there was some addresses mess in the bond where the LACP is, which pointed me to the problem.

    After trying different LACP setups and configurations, seems my switch in fact has any problem with LACP and there is no upgrade available, I wasn't able to make it running.

    Finally, I've drifted to a single port setup with manual failover (as this is not a critical service) which is working nice and reliable.
    Thank you very much!
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice