Hi all, I'm facing a strange behaviour in my office proxmox server. Periodically (sometimes every month, sometimes with days of difference), the network interfaces gets down and I lost connectivity, both the host and VMS. Rebooting the host solves the issue until the next time. I've been mad looking for error mesages in all logs but nothing, also checked other network elements with no luck. My server has two bridges. One of them has a bond with two nics to a switch using LACP. The other has one NIC directly o our ISP device for Internet connectivity. I thought that can be some issue with the LACP, but in fact both bridges gets down, sometimes at the same time, other not. My environment is: Code: root@multivac:/var/log# pveversion --verbose proxmox-ve: 5.1-32 (running kernel: 4.13.13-2-pve) pve-manager: 5.1-41 (running version: 5.1-41/0b958203) pve-kernel-4.13.13-2-pve: 4.13.13-32 libpve-http-server-perl: 2.0-8 lvm2: 2.02.168-pve6 corosync: 2.4.2-pve3 libqb0: 1.0.1-1 pve-cluster: 5.0-19 qemu-server: 5.0-18 pve-firmware: 2.0-3 libpve-common-perl: 5.0-25 libpve-guest-common-perl: 2.0-14 libpve-access-control: 5.0-7 libpve-storage-perl: 5.0-17 pve-libspice-server1: 0.12.8-3 vncterm: 1.5-3 pve-docs: 5.1-12 pve-qemu-kvm: 2.9.1-5 pve-container: 2.0-18 pve-firewall: 3.0-5 pve-ha-manager: 2.0-4 ksm-control-daemon: 1.2-2 glusterfs-client: 3.8.8-1 lxc-pve: 2.1.1-2 lxcfs: 2.0.8-1 criu: 2.11.1-1~bpo90 novnc-pve: 0.6-4 smartmontools: 6.5+svn4324-1 zfsutils-linux: 0.7.3-pve1~bpo9 My network interfaces config is: Code: root@multivac:/var/log# cat /etc/network/interfaces # network interface settings; autogenerated # Please do NOT modify this file directly, unless you know what # you're doing. # # If you want to manage part of the network configuration manually, # please utilize the 'source' or 'source-directory' directives to do # so. # PVE will preserve these directives, but will NOT its network # configuration from sourced files, so do not attempt to move any of # the PVE managed interfaces into external files! auto lo iface lo inet loopback auto eno2 iface eno2 inet static address 172.22.1.5 netmask 255.255.255.0 gateway 172.22.1.1 #Management NIC iface eno1 inet manual auto enp6s0f0 iface enp6s0f0 inet manual auto enp6s0f1 iface enp6s0f1 inet manual auto bond0 iface bond0 inet manual slaves enp6s0f0 enp6s0f1 bond_miimon 100 bond_mode 802.3ad bond_xmit_hash_policy layer2+3 #General LACP Bond for VMs auto vmbr1 iface vmbr1 inet manual bridge_ports eno1 bridge_stp off bridge_fd 0 #Internet Access for pfsense auto vmbr2 iface vmbr2 inet manual bridge_ports bond0 bridge_stp on bridge_fd 0 bridge_vlan_aware yes #VM General Purpose Bridge Do you guys know where can I look for more logging? Somebody facing similar issue? THank you very much.
Hi, your installation is quite old please update to current version. What you write sounds like a kernel problem, so the only way get rid of it is to update your system. Or you switch has a problem with LACP, check also if new firmware is available.
Hi @wolfgang, I've discarded LACP problem because sometimes only the network interface which is not attached to that bond ges down, but I'll review it again. Also I'll try upgrading next week and give you feedback. Thanks for your help.
Hi @wolfgang. Finally I was able to upgrade to 5.3 succesfully. Five minutes after reboot and VMs started working, the network went down, but this time I had a kernel log telling me there was some addresses mess in the bond where the LACP is, which pointed me to the problem. After trying different LACP setups and configurations, seems my switch in fact has any problem with LACP and there is no upgrade available, I wasn't able to make it running. Finally, I've drifted to a single port setup with manual failover (as this is not a critical service) which is working nice and reliable. Thank you very much!