> 3000 mSec Ping and packet drops with VirtIO under load

aderumier

Active Member
May 14, 2013
207
19
38
Hi,
could you try to disable transparent hugepage on host ?
it was disable in proxmox 4 before january 2017, 4.4.x kernel (don't remember exactly which version), and now set to madvise by default

echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

(you need to stop/start the vm after this change)
 
Last edited:

Andreas Piening

Active Member
Mar 11, 2017
72
9
28
42
Hi aderumier,

both options was on "madvise" on my system as well (whatever this means):

Code:
#cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
#cat /sys/kernel/mm/transparent_hugepage/defrag
always defer [madvise] never

I stopped my machine, set both options to never and changed by disks to scsi again (was IDE before). But the windows guest fails with "no boot device" after already presenting the spinning wheel on blue background and booting a while.
I even restored the VM to an earlier state, but I can't get it to boot anymore. It fails with a blue screen collecting a memory dump and reboots again.
Luckily I did this on my test system, so the production system is still on IDE and it is running fine since the switch.
I can't tell if switching my disks from scsi to ide and back again did any bad. But for me it is not working right now.
 

hansm

Active Member
Feb 27, 2015
62
3
28
You can easily go back to the old values by:
Code:
echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
echo madvise > /sys/kernel/mm/transparent_hugepage/defrag
Or simply reboot your host.

My problem is solved with a patch from @wbumiller I asked for an updated pve-qemu-kvm package for you to test. The fix applies to virtio, maybe it also solves your problem.
 

aderumier

Active Member
May 14, 2013
207
19
38
Hi aderumier,

both options was on "madvise" on my system as well (whatever this means):

Code:
#cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
#cat /sys/kernel/mm/transparent_hugepage/defrag
always defer [madvise] never

I stopped my machine, set both options to never and changed by disks to scsi again (was IDE before). But the windows guest fails with "no boot device" after already presenting the spinning wheel on blue background and booting a while.
I even restored the VM to an earlier state, but I can't get it to boot anymore. It fails with a blue screen collecting a memory dump and reboots again.
Luckily I did this on my test system, so the production system is still on IDE and it is running fine since the switch.
I can't tell if switching my disks from scsi to ide and back again did any bad. But for me it is not working right now.

it's 100% unrelated. Note that if you change disk from ide->scsi, scsi->ide, you need to change boot drive each in vm option.
 

Andreas Piening

Active Member
Mar 11, 2017
72
9
28
42
it's 100% unrelated. Note that if you change disk from ide->scsi, scsi->ide, you need to change boot drive each in vm option.
Do you mean the transparent_hugepage setting is unrelated to the issue in general, or to the problem with my vm that doesn't boot anymore?
I can't refuse the second assumption, I had not the time yet to investigate any further.
 

aderumier

Active Member
May 14, 2013
207
19
38
Do you mean the transparent_hugepage setting is unrelated to the issue in general, or to the problem with my vm that doesn't boot anymore?
I can't refuse the second assumption, I had not the time yet to investigate any further.
I mean, transparent huge can't impact boot. (maybe windows don't like switch between ide-> scsi, I really don't known).
Transparent hugepage could impact performance only.

BTW, I have build last pve-qemu-kvm with patch for@hansm bug. (which is virtio related, so maybe it could improve performance too)

http://odisoweb1.odiso.net/pve-qemu-kvm_2.9.1-1_amd64.deb
 

RobFantini

Renowned Member
May 24, 2012
1,975
96
68
Boston,Mass
I have a question - is anyone seeing i/o issues when using LXC?

If not we'll migrate some systems. We are noticing disk slowdown as time goes on . testing with xfce disk utility as per forum suggestion.
 

RobFantini

Renowned Member
May 24, 2012
1,975
96
68
Boston,Mass
I mean, transparent huge can't impact boot. (maybe windows don't like switch between ide-> scsi, I really don't known).
Transparent hugepage could impact performance only.

BTW, I have build last pve-qemu-kvm with patch for@hansm bug. (which is virtio related, so maybe it could improve performance too)

http://odisoweb1.odiso.net/pve-qemu-kvm_2.9.1-1_amd64.deb

I just upgraded pve node and this was installed. then i restarted all kvms
Code:
pve-qemu-kvm (2.9.1-1)

have you tested pve-qemu-kvm (2.9.1-1) ?
 
Last edited:

aderumier

Active Member
May 14, 2013
207
19
38
I just upgraded pve node and this was installed. then i restarted all kvms
Code:
pve-qemu-kvm (2.9.1-1)

have you tested pve-qemu-kvm (2.9.1-1) ?

I can't reproduce the problem from this thread, so I'm blind currently.

Note that my pve-qemu-kvm package 2.9.1-1 is not the same than 2.9.1-1 from proxmox repo. (I have added the patch, but not changed the version number)
 

micush

Active Member
Jul 18, 2015
69
3
28
This thread is very interesting to me.

I have a moderately used 5 node cluster (PVE version 5.0-31 on all nodes) using OVS with 10Gig ethernet to a couple of NFS shared storage hosts.

All my VMs (~25) are configured with virtio disks using the 'virtio scsi single' controller type and all the vnics are virito with multiqueues set to equal the number of vcpus.

I see none of the symptoms mentioned in this thread.

Attached is a snapshot of my busiest host. It shows no IO delay, even though there is a lot of IO on that host.

What makes my setup any different from the people that are having issues?
 

Attachments

  • Screenshot from 2017-09-20 15-13-17.png
    Screenshot from 2017-09-20 15-13-17.png
    101.7 KB · Views: 17

Andreas Piening

Active Member
Mar 11, 2017
72
9
28
42
What makes my setup any different from the people that are having issues?
In fact interesting. One difference I see now is that you are using "VirtIO SCSI single" and I haven't even tried that. I was using simply "VirtIO SCSI".
Can you tell the difference? Is it one single port then?
 

aderumier

Active Member
May 14, 2013
207
19
38
All my VMs (~25) are configured with virtio disks using the 'virtio scsi single' controller type
virtio-scsi && virtio-scsi-single controller are for scsi disks.

if you use virtio disks, it does nothing.
 

aderumier

Active Member
May 14, 2013
207
19
38
In fact interesting. One difference I see now is that you are using "VirtIO SCSI single" and I haven't even tried that. I was using simply "VirtIO SCSI".
Can you tell the difference? Is it one single port then?

virtio-scsi single is 1 virtio-scsi pci controller for 1 scsi disk
virtio-scsi normal controller, is 1 virtio-scsi controller for 7 disks.

The main usage of virtio-scsi single is for enable iothread, as iothread is enabled on controller. (so you have have 1 iothread by disk/controller)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!