> 3000 mSec Ping and packet drops with VirtIO under load

micro · Sep 10, 2017

The issue is not gone. It was mitigated by moving this particular VM to the local storage of one of the host nodes in the cluster. All other VMs are running on the shared storage. So the local disk is idle and there is no chance for any io wait.
The thing is that this VM (it is a firewall, shaper, suricata IPS (nfqueue)) is way more sensitive than the other VMs to the kvm/virtio iowait problem. It was affected even by the normal IO by all other vms to the shared storage. By moving it to the local storage it can be affected ONLY if I perform serious write to the local disk of this node. And of course I stay away from doing that. Since it was moved locally, this VM performs perfect (absolutely no latency, jitter or holding the packets).

Still if I do big sequential write to the shared storage (with disk throtling >~80% of the write speed of the SAN) the host node is showing huge load ( Load 50.0 and up) and all the kvm vms with their hard disks on this shared storage are not responding. Most probably if i do big sequential write to the local storage of the node of this particular vm machine (firewall,shaper,ips) it's ping latency will hit the roof again.
Even worse - we can't control the speed of restoring the backups. If I attempt to restore backup to the SAN storage all my vms are going to suffer greatly again. Yes, I can restore it to some local storage, but then I need to copy the disk to the shared storage so I would be struck again with this defect.

Right now I have disk throttle on all kvms so no vm can saturate the shared storage. Otherwise the hosts load is hitting the roof and everything can happens - including quorum lost.

I really think that something is very wrong with handling the IO here. I don't think it is normal for guest kvm virtual machine to run "dd if=/dev/zero of=test bs=1M" and get the host node to Load >50 and stop all the kvm running on this shared storage...
Unfortunately, my previous setup was with Proxmox 3.x and iSCSI so I can't say if the problem we are experiencing is because PVE 5 or it was still happening in PVE 4.x.

gkovacs · Sep 12, 2017

micro said:
I really think that something is very wrong with handling the IO here. I don't think it is normal for guest kvm virtual machine to run "dd if=/dev/zero of=test bs=1M" and get the host node to Load >50 and stop all the kvm running on this shared storage...

My thoughts exactly!

micro said:
I found some other thread and bug report filled about it. Looks like the same problem.
https://forum.proxmox.com/threads/k...ckup-restore-migrate.34362/page-2#post-174960
https://bugzilla.proxmox.com/show_bug.cgi?id=1453

Andreas Piening said:
Thank you. The more I read and think about it, the more confusing it gets.
I do backups from within VMs on PVE 4.4-17 and one machine has 1.8 TB of data and even while doing a full backup I have no packet drops and my ping responses are fine. Using ZFS + KVM + virtIO + OpenVPN there but it is a linux guest instead.
So to me it looked like this must be a PVE 5 issue, but in this bug report says the issue was also in PVE 4.

We are indeed experiencing this issue since PVE 3.x, when our VM disks were stored on ext4+LVM. We are currently running Proxmox 4 with ZFS local storage, but some others have experienced it over NFS, so the issue is most likely unrelated to the storage backend, rather it's an IO scheduling issue in the Linux kernel and KVM + VirtIO.

As I have written in the bugreport and in many threads, the best (although not completely effective) mitigation tactic is the reconfiguration of the Linux VM subsystem on the Proxmox host. After tweaking these values the network issues only happen once or twice month compared to every night before.

First you add these lines to /etc/sysctl.conf:

Code:

vm.swappiness = 1
vm.min_free_kbytes=262144
vm.dirty_ratio=3
vm.dirty_background_ratio=1

Then run the following command (or reboot):

Code:

# sysctl -p

Please report back with your experience!

micro · Sep 12, 2017

Thank you gkovacs for you suggestions. Definitely I would give it a try with the recommended parameters by you and will report back when I have the results.
This problem is really weird for me - are we all using something exotic in common? Virtio disks on shared storage - I doubt it - it looks so common, it should affect almost everyone, yet we are not so many complaining about it. I really wonder why this defect is not happening to more people (I doubt they wouldn't notice it).

Andreas Piening · Sep 12, 2017

@gkovacs Thank you for your comment.

I have just set and applied the settings you suggested, but it hadn't changed anything for me: I still get kicked out and can't reconnect with RDP while a backup is running and I get network delays when I have disk I/O.

As far as I understand the settings, they configure the swap behavior. I don't think this is a swap issue since the VMs have a fixed amount of memory and from my understanding they cannot use more than the KVM config defines.

Here are my memory stats:

Code:

# free -h
              total        used        free      shared  buff/cache   available
Mem:           251G        196G         52G        703M        3.1G         52G
Swap:          8.0G        1.6G        6.4G

I think the settings might help if you are low on RAM and the host has only a small amount to spare, but this is not true for me. However, I cannot explain why there is some usage on my swap since there is also enough RAM available.

I found this in the Proxmox Wiki: https://pve.proxmox.com/wiki/ZFS_on_Linux#_limit_zfs_memory_usage
So they kind of suggest "vm.swappiness = 10" while mine was on 60 which is the default.

I kept the settings you provided to monitor how the system behaves in the long run.

RobFantini · Sep 13, 2017

gkovacs said:
My thoughts exactly!

We are indeed experiencing this issue since PVE 3.x, when our VM disks were stored on ext4+LVM. We are currently running Proxmox 4 with ZFS local storage, but some others have experienced it over NFS, so the issue is most likely unrelated to the storage backend, rather it's an IO scheduling issue in the Linux kernel and KVM + VirtIO.

As I have written in the bugreport and in many threads, the best (although not completely effective) mitigation tactic is the reconfiguration of the Linux VM subsystem on the Proxmox host. After tweaking these values the network issues only happen once or twice month compared to every night before.

First you add these lines to /etc/sysctl.conf:

Code:

vm.swappiness = 1 vm.min_free_kbytes=262144 vm.dirty_ratio=3 vm.dirty_background_ratio=1

Then run the following command (or reboot):

Code:

# sysctl -p

Please report back with your experience!

Hello gkovacs
We've been using this for awhile [ saw when i added your settings ]. Do you happen to know if these are not good for pve?

Code:

# https://lwn.net/Articles/616241/
# http://feeding.cloud.geek.nz/posts/usual-server-setup/
# To reduce the server's contribution to bufferbloat I change the default kernel queueing
# discipline (jessie or later) by putting the following ..
#
net.core.default_qdisc=fq_codel

micro · Sep 13, 2017

Changing the vm.* parameters doesn't helped here. I have performed additional tests trying to check if the fault is in the HBA queue parameters or in the storage iops ... but a simple test proved me that it is entirely VM virtio/io fault. Creating a new lvm locally on the one of the nodes, mkfs.ext4, mount and dd proved that this isn't problem with the storage, fc etc... it is the same shared storage, same layer (LVM)...only not running from VM inside. The load is normal. I noticed one big difference in "top" io stats when I'm performing dd from the host node and from one of the vm.
- When the dd is performed from the host node only 2-3 (different) cpu cores/threads on the host have IO wait..... let's say 3-4 (different every second). So only 3-4 cores/threads in the host node are busy with some io wait.
- When the dd is performed inside one vm ALL cores/threads on the host node are very busy with IO wait.

Additional test results:
In addition to VirtIO, also SCSI and SATA is causing huge Load and IO Wait.
IDE is working fine. When IDE is selected as vm hard disk the dd running inside the VM is not causing such mayhem for the host node.

I'm thinking about changing all hard disks of all VMs on the nodes to IDE until this VirtIO mystery is resolved.

Andreas Piening · Sep 13, 2017

micro said:
- When the dd is performed from the host node only 2-3 (different) cpu cores/threads on the host have IO wait..... let's say 3-4 (different every second). So only 3-4 cores/threads in the host node are busy with some io wait.
- When the dd is performed inside one vm ALL cores/threads on the host node are very busy with IO wait.

I wonder if this depends on the number of cores assigned to the VM. I have 6 cores / 12 threads and assigned 6 cores to both of my VMs, I overprovisioned this on purpose because the systems are probably never fully utilized at the same time.
It would be interesting to see if the number of cores in IO wait would be less if the number of cores assigned to the VM causing the load is one less than the physical core number.
So for example if 5 out of six cores are in state IO wait, there is still one left to handle networking IO. Probably I'm talking bullshit but this stupid problems drives me crazy.

@micro What is your CPU core config? Have you assigned all physical cores?

micro · Sep 13, 2017

I don't think it does matter. The IO Wait is showing up on all other nodes cpus/cores/threads (not only on the one who is running the dd test).
This node currently is (intentionally) left only with two VMs with assigned 1 socket 8 cores each. The test iowait vm on the same host node had only 1cpu with 2cores assigned. The host has CPU(s)24 x Intel(R) Xeon(R) CPU X5680 @ 3.33GHz (2 Sockets).

Andreas Piening · Sep 13, 2017

Ok, this would be too easy for a solution anyways.
I just checked on another system with PVE 3.4-16 and a Windows Server 2012 how the network behaves when I stress the disk IO: Doing a full backup gives me an average of 0.2 ms and a max of 8.015 ms while doing ping tests over the whole process. There was no single packet loss and the system was responsible at all times. I downloaded Windows updates and started apps during the backup. All went smooth without issues.

Since I experienced this issue with PVE 5.0 and never had it before I thought it has to be a bug in the new version. If this is not true, there needs to be something special in our configuration. But I have no clue what this might be.

I have no spare hardware at the moment but I'll probably get one or two servers in the next weeks so I can do some tests. I'm careful with this system since it is in production and I don't want to risk making it even worse.

micro · Sep 15, 2017

I have switched almost all of my vms from virtio to ide. Here is an example graph of the iowait difference from one of my nodes:

This node is still running the same vms with the same workload on them. Those red peaks (and some green maybe) in the ide section of the graph I suspect are because I still have 2-3 more vms left on other nodes running with virtio and they are still affecting the iowait of the shared storage and this node as affected too. I'm planing to migrate them to ide today.

Andreas Piening · Sep 15, 2017

I would expect the CPU load to raise dramatically when switching from virtIO to IDE, since IDE is fully emulated, right? Or is this not so much the case?
What about the R/W performance in your VMs with IDE?
Where did u get this IO Wait graph from? This is not from the Proxmox web interface, is it?

I also noticed that not only the network IO suffers from this issue: While I was rebooting one of the VMs today the stats from Qemu Agent were not delivered any more while the system was under load. Of course while booting and until the Qemu Agent tools were started in the VM the memory utilization is shown as 100%. But then after I get memory stats delivered by the Agent after a while the utilization drops back to 100% for a while and it takes quite a while until everything is started so that the Agent delivers results reliable.
The Agent uses a virtualized serial port to deliver the stats, right? So the system is not even able to write these stats to the serial port while there is IO wait.

While the VM I restartet was suffering from IO Wait and I can't even connect to the console for a few minutes, the other VM on the same host was still responsible. So the IO Wait seems to be happening on one of the VMs but the other one is not affected.
Is this different on your system?

micro · Sep 15, 2017

Most probably there are more interrupts because of the IDE. Yes, it will some impact on the full disk write/read performance (maybe 10/20/30% lower throughput). Currently (ide) vm is making about 50 Мegabytes/s write perormance. This is more than enough for me. With virtio probably it will be about 70-80 Megabytes/s. I can't tell how much exactly currently because if I try to test it the whole cluster will go unstable again (huge Load, huge IOwait on all nodes). Virtio VMs are currently crippled intentionally (by disk throttling).
The IO wait graph is rrd from munin.

It is not only network IO that suffers from this issue. Everything stops because of the IO wait. You cannot expect host nodes to have 90-100% IO delay on all cores, the system load to be 50 and something to works normally on the nodes or inside the VMs.

The sad thing is that this virtio bug is so severe that because of it the whole proxmox cluster setup is completely unreliable. Every service on it can be destroyed in any time by any vm just by simple IO operation inside.

What is the difference for the node cpu/load/iowait from migrating from virtio to ide? You can see from the graph bellow that it is only the IO delay gone:

Andreas Piening · Sep 15, 2017

@micro So you have linux guests, right? I wonder if I can shut down my windows VMs, switch the storage from virtIO-SCSI to IDE and boot them up more or less safely to check out if this works better.
I've never done this before and I'm a little afraid I might bork the Windows VM.

micro · Sep 15, 2017

@Andreas Piening You have to change the bus from virtio to ide on the hard disks, not the controller type (virtio-scsi).
You shouldn't have problem with booting after the change from virtio to ide I think. Returning back shouldn't be a problem also because you already have the virtio drivers. Of course it's best to make cold backup of the VM before your test just to make sure you wouldn't fsck up something with this Windows

It's even better to make a clone for this VM and test there first, it may be faster and safer.

Andreas Piening · Sep 17, 2017

@micro Thank you, that helped a lot!

Since this issue bothered me so much, I took a new server (different hardware, fewer cores, less RAM) and installed PVE 5.0 freshly via the installer ISO. Then I created a new KVM VM with the same settings that I used for my windows machines (just with less memory assigned) and restore a current backup from one of the systems where I experienced the problems with.

The problem remains the same: Huge ping delays, packet drops and dropped network connections when the system has load on the disk IO.

I somehow hoped that I did something stupidly wrong on my PVE install / config but unless the VirtIO Drivers I'm using are borked there is a reproducible problem with virtIO / KVM in the versions that are released with PVE 5.0.
I know others had this issue before with older PVE versions as well, but at least in my case the same setup I used here works without issues on PVE 3.4.

Then I tried to change the bus for my hard disks to IDE and voila: While doing a DB-Check or backup I can still use the system, browse websites and it feels responsive. The max ping response time was 300 ms while the average was at 5 ms which is ok for me under load. No single packet was dropped and no RDP connection issues, everything is smooth.
Of course I loose the "discard" feature on my zvol, but this is a faire trade-in for getting a system that runs stable. Also I expect the throughput to be lower compared to virtIO but it was not that bad from my subjective experience.

I think I'll do a few more tests and then switch my production system to IDE as well. Looks like the only way to me to get a reliable setup without moving to another platform. Thanks again micro!

I'm still interested in searching for a solution for this problem in order to switch back to virtIO once the issue is resolved. Since I have a test system now, I can do this without risking my production system.

So if there are suggestions that I may check out, please let me know.

Honestly I'm a little bit concerned that there is no single official statement from the Proxmox team regarding this. Not on the website, the forum or the bug reports, at least I have not seen anything. This issue probably affects most of the PVE 5.0 users since virtIO is the goto option for storage even in the official howtos.

micro · Sep 18, 2017

Andreas Piening said:
Honestly I'm a little bit concerned that there is no single official statement from the Proxmox team regarding this. Not on the website, the forum or the bug reports, at least I have not seen anything. This issue probably affects most of the PVE 5.0 users since virtIO is the goto option for storage even in the official howtos.

My thoughts exactly.

hansm · Sep 18, 2017

@Andreas Piening and @micro
What hardware are you using? Brand of server and CPU model?

I'm experiencing a strange issue with QEMU since 2.6, all versions before behave correctly but from 2.6.0 I have problems. Not the same as you but it might be related somehow because IDE solves it for me too. Proxmox team couldn't reproduce my case while I can do it on 3 different environments all the time, but all DELL hardware. I possibly found the git commit which causes this by running git bisect. I'm investigating further and hopefully Proxmox will do too. If something interesting comes by I will come back to you. For now I'm very interested in you hardware.

micro · Sep 18, 2017

Servers: IBM System x3850 X5 and IBM System x3650 M3
CPUs: Intel(R) Xeon(R) CPU E7- 4830 and Intel(R) Xeon(R) CPU X5680

hansm · Sep 18, 2017

micro said:
Servers: IBM System x3850 X5 and IBM System x3650 M3
CPUs: Intel(R) Xeon(R) CPU E7- 4830 and Intel(R) Xeon(R) CPU X5680

Thanks. Hardware is unrelated. I'm using Dell PowerEdge R310, R320, R420 and R610. In R610 I use Intel XEON X5650.
My problem doesn't occur also when I change the SCSI Controller type to default (LSI 53C895A) and use SCSI as hard disk bus. VMWARE PVSCSI also works. My git bisect revealed a commit related to virtio. SCSI controller type VirtIO SCSI and VirtIO SCSI Single give me the possibility to crash my VM's.
As I mentioned in my previous post IDE bus for hard disk works for me, although I had VirtIO SCSI as controller.

Andreas Piening · Sep 18, 2017

I have PVE 5.0 running on two systems experiencing the issues stated before:

Intel® Xeon® E5-1650 v3 Hexa-Core with 256 GB DDR4 ECC RAM
Intel® Xeon® E3-1275 v5 Quad-Core with 64 GB DDR4 ECC RAM

both systems has 2 x 4 TB SATA 6 Gb/s 7200 rpm HDD Enterprise Class. These are custom systems from a german hosting company (Hetzner), so no big brand like Dell. But this is carefully selected hardware with excellent Linux compatibility. I have a second server of the Intel® Xeon® E3-1275 v5 Quad-Core with an older version of PVE it it works perfectly fine.

> 3000 mSec Ping and packet drops with VirtIO under load

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Famous Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Renowned Member

Well-Known Member

Renowned Member

Well-Known Member

Renowned Member

We value your privacy