VM network freeze

Discussion in 'Proxmox VE: Networking and Firewall' started by Kiril, Sep 25, 2017.

  1. Kiril

    Kiril New Member

    Joined:
    Sep 25, 2017
    Messages:
    2
    Likes Received:
    0
    Hello,

    I'm using Proxmox 5.0. On random intervals my monitoring (Nagios) notify me that one of my VM is not reachable. I do not have ping to this machine from any workstation nor outside from the VM. The machine is running fine just seems like network is not working. Workaround is resetting the interface from the UI and then the machine is fine. What could be the cause of the problem?


    Thanks
     
  2. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,512
    Likes Received:
    131
    What OS and config (qm config <vimid>) are you running?

    What do you mean by that, a reset of the VM or do you reset something inside the VM?
     
  3. martinb

    martinb New Member

    Joined:
    Apr 28, 2014
    Messages:
    8
    Likes Received:
    0
    I have a similar problem. I have a CentOS 7 VM on Proxmox 5.0 and the network stops working completely (can't ping anything) under heavy traffic. When it happens, the only way I found to restore connectivity besides rebooting the VM is to rmmod and modprobe the virtio-net driver on the VM. There doesn't seem to be any useful information in the system log. I tried upgrading the VM's kernel to 4.13 from the ELRepo kernel-ml repo but it doesn't fix the problem.
     
    #3 martinb, Sep 27, 2017
    Last edited: Sep 27, 2017
  4. helloworld

    helloworld New Member

    Joined:
    Aug 29, 2017
    Messages:
    11
    Likes Received:
    1
    Hello martinb,

    I had the same issue last two weeks, suddenly arp packets are not received by the virtual machine. can you give more information about your hardware? e.g. NIC.

    I didnt found any logs or issues, after upgrading NIC firmware and NIC driver, the issue seems to be gone.
     
  5. martinb

    martinb New Member

    Joined:
    Apr 28, 2014
    Messages:
    8
    Likes Received:
    0
    Hi helloworld, I had the problem with both QLogic / Broadcom NetXtreme II (bnx2x) and Intel X710 (i40e) hardware. In both cases my host interfaces were using vlan tagging.
     
  6. helloworld

    helloworld New Member

    Joined:
    Aug 29, 2017
    Messages:
    11
    Likes Received:
    1
    Hello,

    same here I use i40e with X710 but new firmware and driver didnt help either
     
  7. Kiril

    Kiril New Member

    Joined:
    Sep 25, 2017
    Messages:
    2
    Likes Received:
    0
    Hello Alwin,

    Here is the output of the config
    Code:
    root@compute9:~# qm config 175
    perl: warning: Setting locale failed.
    perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LC_CTYPE = “UTF-8”,
        LANG = “en_US.UTF-8"
        are supported and installed on your system.
    perl: warning: Falling back to a fallback locale (“en_US.UTF-8”).
    agent: 1
    balloon: 0
    boot: cdn
    bootdisk: virtio0
    cores: 2
    ide2: none,media=cdrom
    memory: 12288
    name: VMNAME
    net0: virtio=FE:03:1E:0F:44:92,bridge=vmbr405
    numa: 1
    ostype: l26
    scsihw: virtio-scsi-pci
    smbios1: uuid=44709ae7-4f4f-4f76-9fae-a059415c0d48
    sockets: 1
    virtio0: ssdpool2tb:vm-175-disk-1,iothread=1,size=50G
    Machine is running CentOS Linux release 7.3.1611 (Core).
    I'm resetting the network device from the Proxmox web portal. From the disconnect option.
    I have experienced the same issue again.

    Thanks,
     
  8. helloworld

    helloworld New Member

    Joined:
    Aug 29, 2017
    Messages:
    11
    Likes Received:
    1
    I'm using vlan tagging as well, exact the same behavior, when I click disconnect, vm is reachable.

    I'm using vlan tagging with openvswtich:


    root@:~# qm config asd
    agent: 1
    balloon: 0
    bootdisk: virtio0
    cores: 16
    cpu: host
    memory: 65536
    name: asd
    net0: virtio=8A:02:B6:42:69:EB,bridge=vmbr0,tag=asd
    numa: 0
    ostype: l26
    scsihw: virtio-scsi-pci
    smbios1: uuid=8417b003-ef01-4b9f-b886-1aa7e1ceb6a1
    sockets: 2
    virtio0: asd-asd-disk-1,size=128G

    with latest driver, firmware:

    root@:~# ethtool -i ens2f2
    driver: i40e
    version: 2.1.26
    firmware-version: 6.01 0x80003493 0.0.0
    expansion-rom-version:
    bus-info: 0000:02:00.2
    supports-statistics: yes
    supports-test: yes
    supports-eeprom-access: yes
    supports-register-dump: yes
    supports-priv-flags: yes
     
  9. jassmith87

    jassmith87 New Member

    Joined:
    Sep 29, 2017
    Messages:
    4
    Likes Received:
    0
    I get this issue too. I have switched my machines to the vmxnet3 network driver instead of virtio for the time being. e1000 accumulates RX errors for some reason while vmxnet3 doesn't. There is unfortunately some performance lost.

    I would really like to figure out why virtio_net keeps falling over when under load. Like the other poster I can only fix my network by modprobe -r virtio_net and then modprobe virtio_net. After doing that it pops right back up as if nothing happened.
     
  10. manu

    manu Proxmox Staff Member

    Joined:
    Mar 3, 2015
    Messages:
    806
    Likes Received:
    58
    @helloworld:
    > I had the same issue last two weeks, suddenly arp packets are not received by the virtual machine.

    * can you see the arp packets when running tcpdump against the tap device of the VM ?
    * also which kind of load would make the problem reproduciblle ? does the problem happens with linux bridges too ?
     
  11. helloworld

    helloworld New Member

    Joined:
    Aug 29, 2017
    Messages:
    11
    Likes Received:
    1
    @manu

    I testet this with openvswitch and bridge, both issue persists. I see arp packages on tap interface aswell inside VM (without vlan tag = correct). arp packets are not being replied until rmmod virtio / or connect/disconnect via gui

    I guess main focus should be virtio
     
  12. nowrap

    nowrap New Member

    Joined:
    Nov 2, 2016
    Messages:
    6
    Likes Received:
    0
    Hello,
    we are facing similar issues with 2 heavy network loaded VMs after upgrading Proxmox 4.x to 5.0.

    The first VM runs Debian 8 and Zen LB.
    The second VM runs Debian 9 and Icecast 2.

    Both VMs lost their network connectivity within the last 24 hours after the upgrade.
    The VMs could not ping any other VMs or even the gateway.
    The VMs could not be pinged from other VMs or the host system.
    The VMs were only accessible by the novnc console.
    A reboot of the VM restored the network connectivity.
    A ifdown/ifup cycle or an network restart not.

    The reloading of virtio_net we will try the next time.

    The VMs are running in bridged mode with virtio.
    The bridge interfaces are controlled by openvswitch:
    Code:
    ovs-vsctl (Open vSwitch) 2.7.0
    DB Schema 7.14.0
    The pve-firewall is disabled.
     
  13. manu

    manu Proxmox Staff Member

    Joined:
    Mar 3, 2015
    Messages:
    806
    Likes Received:
    58
  14. nowrap

    nowrap New Member

    Joined:
    Nov 2, 2016
    Messages:
    6
    Likes Received:
    0
  15. manu

    manu Proxmox Staff Member

    Joined:
    Mar 3, 2015
    Messages:
    806
    Likes Received:
    58
    oh yes, actually this documention is avaible on each installed pve host, so you can replace the intern.lab address with your PVE hostname, so you will have the documentation which matches the installed version of PVE.
     
  16. nowrap

    nowrap New Member

    Joined:
    Nov 2, 2016
    Messages:
    6
    Likes Received:
    0
    Right now it happend again on our icecast VM.

    This didn't solve it alone:
    Code:
    rmmod virtio_net
    modprobe virtio_net
    Had to restart the network. ens18 was missing. ifup didn't work (already configured):
    Code:
    /etc/init.d/networking restart
    dmesg is showing some info:
    syslog isn't helping:
     
  17. manu

    manu Proxmox Staff Member

    Joined:
    Mar 3, 2015
    Messages:
    806
    Likes Received:
    58
    Code:
    rmmod virtio_net
    modprobe virtio_net
    Had to restart the network. ens18 was missing. ifup didn't work (already configured):
    Code:
    /etc/init.d/networking restart
    yes you have to ifdown, then ifup in that case

    @helloworld @nowrap

    can you test if the kernel pve-kernel-4.10.17-4-pve on the pvetest repository fixes the virtio connectivity issue for you ?
    this has kernel fixes a bug which occurs with virtio guests having to process a large number of connections
     
  18. TwiX

    TwiX Member
    Proxmox VE Subscriber

    Joined:
    Feb 3, 2015
    Messages:
    100
    Likes Received:
    1
    Hi,

    We also have this kind of issue for the last two weeks. It happens sometimes under avg/high load (KVM - debian 8 64 bits).
    We have to reboot the VM or disconnect/reconnect the virtio interface via GUI.

    Do you suggest to switch the net virtio to e1000 if the latest kernel doesn't fix this issue ?

    Code:
    proxmox-ve: 5.0-20 (running kernel: 4.10.17-2-pve)
    pve-manager: 5.0-30 (running version: 5.0-30/5ab26bc)
    pve-kernel-4.10.17-2-pve: 4.10.17-20
    pve-kernel-4.10.15-1-pve: 4.10.15-15
    libpve-http-server-perl: 2.0-6
    lvm2: 2.02.168-pve3
    corosync: 2.4.2-pve3
    libqb0: 1.0.1-1
    pve-cluster: 5.0-12
    qemu-server: 5.0-15
    pve-firmware: 2.0-2
    libpve-common-perl: 5.0-16
    libpve-guest-common-perl: 2.0-11
    libpve-access-control: 5.0-6
    libpve-storage-perl: 5.0-14
    pve-libspice-server1: 0.12.8-3
    vncterm: 1.5-2
    pve-docs: 5.0-9
    pve-qemu-kvm: 2.9.0-4
    pve-container: 2.0-15
    pve-firewall: 3.0-2
    pve-ha-manager: 2.0-2
    ksm-control-daemon: 1.2-2
    glusterfs-client: 3.8.8-1
    lxc-pve: 2.0.8-3
    lxcfs: 2.0.7-pve4
    criu: 2.11.1-1~bpo90
    novnc-pve: 0.6-4
    smartmontools: 6.5+svn4324-1
    zfsutils-linux: 0.6.5.11-pve17~bpo90
    ceph: 12.1.2-pve1
    
     
  19. Svetozar Kolev

    Svetozar Kolev New Member

    Joined:
    Oct 12, 2017
    Messages:
    2
    Likes Received:
    0
    Hi, we have the issue too, it seem when you disable gso and tso on the virtual servers it looks like the issue is gone.
    It looks like a bug and whe you disable the segmentation the issue is gone.
    ethtool -K eth0 gso off
    ethtool -K eth0 tso off
    ethtool -k eth0 | grep segment

    It is happening on a machines with heavy traffic load.
     
  20. manu

    manu Proxmox Staff Member

    Joined:
    Mar 3, 2015
    Messages:
    806
    Likes Received:
    58
    it would be interesting to have a feedback if the pve-test kernel I mentioned fixes the issue or not that way the workaround mentionned would not be needed
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice