> 3000 mSec Ping and packet drops with VirtIO under load

Discussion in 'Proxmox VE: Installation and configuration' started by Andreas Piening, Sep 2, 2017.

  1. AlBundy

    AlBundy New Member

    Joined:
    Sep 26, 2017
    Messages:
    7
    Likes Received:
    0
    The load is better, only when you're using the netwerk a lot (like remote backups) it still goes up again!

    I tried newer kernel again and the numbers are worse again, rebooting to the older one tonight (also set to default from now on)
     
  2. Phinitris

    Phinitris Member

    Joined:
    Jun 1, 2014
    Messages:
    83
    Likes Received:
    11
    Hello,
    I just installed the Kernel Linux 4.4.83-1-pve, however the IO issue remains. Have set a few VMs that did a lot of IO to IDE and load is much better now (20% -> 4%) and the packet drops are gone.

    Screenshot at Oct 15 19-06-00.png
     
  3. joshin

    joshin Member
    Proxmox Subscriber

    Joined:
    Jul 23, 2013
    Messages:
    92
    Likes Received:
    8
    Changing from SCSI Virtio to IDE did not solve the problem for me.

    I'm seeing 900ms ping times from the gateway to the KVM virt. Should be .36ms. VM is nothing special- Apache serving lots of static content.

    VM's drive is on local-ZFS storage. Latest Proxmox 5 with all updates, VirtIO net. Gentoo guest with 4.13 kernel.

    Until this is fixed, I'm kinda stuck on a half 4.4 & half 5.0 cluster.
     
  4. Andreas Piening

    Joined:
    Mar 11, 2017
    Messages:
    58
    Likes Received:
    7
    Have you checked if the same VM (exact clone) from the PVE 5.0 host running fine on 4.4?
    If this is the case I would like to downgrade my hosts to PVE 4.4 since IDE runs stable but the disk performance is relatively poor then.
     
  5. joshin

    joshin Member
    Proxmox Subscriber

    Joined:
    Jul 23, 2013
    Messages:
    92
    Likes Received:
    8
    Nope, but I could. Just not right now. With all the reboots to test, there's been enough downtime on that server for the day (week!).

    Going back to SCSI Virtio and changing to E1000 for the network is currently behaving decently. (I have CPU to burn.)

     
  6. Andreas Piening

    Joined:
    Mar 11, 2017
    Messages:
    58
    Likes Received:
    7
    Since you are running ZFS it should be easy to create a snapshot and send / receive the snapshot to a test VM on your PVE 4.4 host without any down time. At least if you're running ZFS on the PVE 4.4 host as well and you have some spare storage left there to create the VM.
    It would be really interesting to find out if this problem really is a PVE 5.0 thing.

    I'm running a terminal server so E1000 is probably no option for me: The rdp protocol don't like slow networks or network latency and the users are picky about that.
     
  7. joshin

    joshin Member
    Proxmox Subscriber

    Joined:
    Jul 23, 2013
    Messages:
    92
    Likes Received:
    8
    Easy doesn't mean I have the time or inclination to do so on your schedule.

    Moving the VM between hosts on a mixed cluster is not the problem. Dealing with downtime from failure due to 700ms+ ping times does. Not to mention my time constraints this week.

    Plus the issue manifests under NETWORK LOAD - which can not be properly replicated in a test VM.

    The evidence in this thread & the other related ones all point to something that changed between 4.4 and 5.0. A Google Doc or some other spreadsheet where folk can put their observations would be useful. If no one else has a chance to do that before I can revisit this issue, I will.

    -J

     
  8. Tim H

    Tim H Member
    Proxmox Subscriber

    Joined:
    Jun 27, 2017
    Messages:
    30
    Likes Received:
    1
  9. Andreas Piening

    Joined:
    Mar 11, 2017
    Messages:
    58
    Likes Received:
    7
    Ok something is going on: On my test system (software stack is the same as on my production system, one Windows Server 1016 has been cloned to the test system to be able to analyze this issue) I updated the system to the latest version with (upgrade / dist-upgrade, now PVE 5.1.36) and then I downloaded the newest virtIO drivers (0.1.141) because I stumbled upon this bug report: https://bugzilla.redhat.com/show_bug.cgi?id=1451978
    The bug report is for a different version (not the 1.126 which is still marked as stable) but since the issue sounds related and I don't have anything to loose, I gave it a try. I updated all virtIO drivers and services (netKVM, virtio-scsi, Balloon drivers and Service, Guest Agent ...) and did a reboot. Guess what? Problem gone!
    I did a ping test while doing a backup of the machine and a complete DB check at the same time and when I did this before I was kicked out of my RDP session for minutes and had huge package loss and delays. Now: Not a single glitch. I can use the browser to scroll web pages under heavy load and no single packet loss (ping max was 0.3 mSec).
    I did a few more checks on my test system and after 2 days I decided to do the same updates to my production system with 2 VMs. The issue completely disappeared. In order to be able to switch from IDE to SCSI I had to create an additional small (32 GB) disk and attach it to my SCSI bus so that the RedHat VirtIO SCSI controller was active and I could install the needed drivers. After a shutdown I removed the small disk and changed my drives from IDE to SCSI and started the machine again. That worked well.

    I'm still not sure if the latest PVE upgrade, the newer "testing" virtIO Drivers or a combination of both brought the cure.
    However, I'm really happy that it is finally working with virtIO-SCSI since the boot times and initial application startup times are much lower compared to IDE for obvious reasons.
     
    udo likes this.
  10. Phinitris

    Phinitris Member

    Joined:
    Jun 1, 2014
    Messages:
    83
    Likes Received:
    11
    @Andreas Piening
    I can confirm that after upgrading to PVE 5.1 the issue is gone and the host is operating normally again with SCSI (virtio-scsi).
     
  11. Andreas Piening

    Joined:
    Mar 11, 2017
    Messages:
    58
    Likes Received:
    7
    Oh that's good news! So you did upgrade to PVE 5.1 but did not upgrade your virtIO drivers, correct?
     
  12. Phinitris

    Phinitris Member

    Joined:
    Jun 1, 2014
    Messages:
    83
    Likes Received:
    11
    Yes correct. I have many Linux and Windows based VMs and they all are running fine now without upgraded VirtIO drivers.
     
  13. AlBundy

    AlBundy New Member

    Joined:
    Sep 26, 2017
    Messages:
    7
    Likes Received:
    0
    Screen Shot 2017-11-06 at 09.34.12.png Screen Shot 2017-11-06 at 09.34.08.png Screen Shot 2017-11-06 at 09.33.18.png Yes, it's finally fixed! Even doing backups didn't create high load emails!
     
  14. Andreas Piening

    Joined:
    Mar 11, 2017
    Messages:
    58
    Likes Received:
    7
    Great.
    So if anyone has a background on this (Proxmox Staff?): It would be interesting to know what the reason was behind this issue.
    I can't find anything related to this in the release notes.
     
  15. micro

    micro Member
    Proxmox Subscriber

    Joined:
    Nov 28, 2014
    Messages:
    58
    Likes Received:
    12
    Unfortunately for me even after the upgrade to PVE5.1 I'm experiencing high IO wait and huge jump in Load during writing operations to VirtIO storage. I did a quick test - adding a new virtio disk to existing linux vm machine and simple dd if=/dev/zero of=test bs=1M count=500 bumped the Load of the host to 18 and the IO wait was huge again.
     
  16. fabian

    fabian Proxmox Staff Member
    Staff Member

    Joined:
    Jan 7, 2016
    Messages:
    3,390
    Likes Received:
    523
    most likely the newer kernel (5.0 had a 4.10 based kernel, 5.1 has a 4.13 based one)
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  17. Phinitris

    Phinitris Member

    Joined:
    Jun 1, 2014
    Messages:
    83
    Likes Received:
    11
    What hardware do you use and how is your storage configured? What speed do you get with dd?
     
  18. micro

    micro Member
    Proxmox Subscriber

    Joined:
    Nov 28, 2014
    Messages:
    58
    Likes Received:
    12
    The hardware, the storage and dd speed aren't relevant here. The only difference is IDE vs VirtIO (and this was discussed multiple times in this thread).
     
  19. micro

    micro Member
    Proxmox Subscriber

    Joined:
    Nov 28, 2014
    Messages:
    58
    Likes Received:
    12
    Just to mention here 5.1 didn't solved the issue to me. I have changed the storage and switched from FC to iSCSI. The problems are gone. I don't know what resolved it - was it something with OS support for direct attached FC or the storage itself.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice