high io wait but ONLY in guest - no clue anymore

sigmarb

Well-Known Member
Nov 8, 2016
69
5
48
38
Dear Users,

i discover poor io/performance in a linux guest but the host itself is not under io load:

Proxmox 4.2-18/158720b9

Guest:

Code:
root@ucs:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r  b  swpd  free  buff  cache  si  so  bi  bo  in  cs us sy id wa
1  0  31464 219112 177856 2389736  0  1  122  450  347  926 13  3 49 33
3  0  31464 217528 177856 2390068  0  0  328  64 1821 3704 35  8 37  1
1  0  31464 210512 177856 2390076  0  0  4  88 2028 3694 39  6 42  3
1  0  31464 147496 177868 2390220  0  0  0  124 2191 1992 38  5 44  9
1  2  31464 182484 177872 2396688  0  0  24  276 1058 2545 50  2 33 12
1  1  31464 206720 177876 2396716  0  0  48  240 1330 3816 45  5 18 28
9  1  31464 215344 177892 2397464  0  0  656  420 1784 5563 71 11  2  1
9  1  31464 213368 177924 2397448  0  0  60  384 2316 3937 66 11  9  3
7  1  31464 214012 177944 2397520  0  0  12  516 2301 4646 55  5 23 13
3  5  31464 206664 177944 2397592  0  0  28  4136 2721 9189 55  9 13 17
9  1  31464 206560 177952 2397604  0  0  16  1428 1504 4140 58  5  2 31
4  6  31464 211896 177952 2397616  0  0  4  1264 1756 6035 62  6  0 21
1  4  31464 211760 177952 2397704  0  0  12  964 1162 3153 42  6  7 41
3  0  31464 210168 177964 2397760  0  0  60  368 1172 4065 50  7  5 32

Host:

Code:
root@proxmox:/etc/pve/qemu-server# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r  b  swpd  free  buff  cache  si  so  bi  bo  in  cs us sy id wa st
2  0  6692 140780  91572 105592  0  0  265  505  163  271 21  6 72  1  0
1  0  6692 141556  91580 105592  0  0  0  138 11407 25441 52 16 32  0  0
3  0  6692 141852  91580 105592  0  0  16  252 6246 8947 59  8 34  0  0
1  0  6692 142488  91580 105592  0  0  3088  211 3952 5786 53  3 44  0  0
3  0  6692 141876  91580 105592  0  0  108  517 5778 7384 77  6 17  0  0
5  0  6692 140888  91596 105592  0  0  700  721 7951 13610 90 10  1  0  0
2  0  6692 140700  91596 105592  0  0  140  562 7305 11937 77  9 14  0  0
1  0  6692 139180  91596 105592  0  0  152  3942 5169 8913 48  4 48  0  0
1  0  6692 138996  91596 105592  0  0  652  934 5568 6536 78  6 16  0  0
1  0  6692 136484  91604 105592  0  0  336  2169 5509 6908 76  6 17  1  0
2  0  6692 136088  91612 105596  0  0  964  721 4650 7814 54  5 40  0  0
3  0  6692 137080  91628 105580  0  0  1240  1397 4295 6154 52  6 38  4  0
3  0  6692 138496  91628 105596  0  0  388  890 6050 7138 84  7  9  0  0
2  0  6692 138432  91628 105596  0  0  1924  558 6754 9311 87  6  7  0  0


Code:
root@proxmox:/etc/pve/qemu-server# more 100.conf
bootdisk: virtio0
cores: 2
ide2: local:iso/UCS-Installation-amd64.iso,media=cdrom
memory: 7000
name: ucs
net0: virtio=32:89:72:1E:06:3D,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
smbios1: uuid=01a567fd-3ef2-4c2d-9c86-d6f59b961c63
sockets: 1
virtio0: local:100/vm-100-disk-1.qcow2,size=750G


any idea on this?

Thank you.

Sigmar
 
IMHO, there is something else you should be worried about: very high values for context switches (cs) and interrupts (in). High io-wait might be just consequence.

Check "cat /proc/interrupts" a few times to see which one is triggered so frequently. Also try "pidstat -wt" to find what is causing so many context switches...
 
Hi,
have you tried "raw" instaed of qcow2?
And what about cache?

Markus
Not yet as its quite an act to move a complete VM to a different storage.
Cache is not set. This is recommended in the proxmox wiki. Should i change it?
 
IMHO, there is something else you should be worried about: very high values for context switches (cs) and interrupts (in). High io-wait might be just consequence.

Check "cat /proc/interrupts" a few times to see which one is triggered so frequently. Also try "pidstat -wt" to find what is causing so many context switches...

Thank you guys for your help.

Code:
root@ucs:~# cat /proc/interrupts
           CPU0       CPU1       
  0:         34          0   IO-APIC-edge      timer
  1:         10          0   IO-APIC-edge      i8042
  6:          3          0   IO-APIC-edge      floppy
  8:          1          0   IO-APIC-edge      rtc0
  9:          0          0   IO-APIC-fasteoi   acpi
10:      56599          0   IO-APIC-fasteoi   virtio0
11:         34          0   IO-APIC-fasteoi   uhci_hcd:usb1
12:        144          0   IO-APIC-edge      i8042
14:          0          0   IO-APIC-edge      ata_piix
15:        100          0   IO-APIC-edge      ata_piix
24:          0          0   PCI-MSI-edge      virtio1-config
25:    3864343          0   PCI-MSI-edge      virtio1-req.0
26:          0          0   PCI-MSI-edge      virtio2-config
27:   10457704          0   PCI-MSI-edge      virtio2-input.0
28:        154          0   PCI-MSI-edge      virtio2-output.0
NMI:          0          0   Non-maskable interrupts
LOC:   11228791   11714087   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:          1          0   IRQ work interrupts
RTR:          0          0   APIC ICR read retries
RES:    9579801   11840088   Rescheduling interrupts
CAL:        698    1938183   Function call interrupts
TLB:     149713     148759   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:        383        383   Machine check polls
HYP:          0          0   Hypervisor callback interrupts
ERR:          0
MIS:          0

sorted pidstat -wt with only values over 1:

Code:
0,99    1,37    apache2
1,00    0,00    apache2
1,00    0,00    memcached
1,04    0,00    ntpd
1,04    0,99    apache2
1,10    1,18    apache2
12,19    0,00    ksoftirqd/1
14,30    0,02    ksoftirqd/0
14,70    0,10    nscd
1,73    0,02    python2.7
1,88    0,27    smbd
1,99    0,00    nrpe
1,99    0,00    python
2,00    0,00    rpc.gssd
2,51    0,00    runsvdir
2,59    0,09    runsv
2,82    0,00    kworker/1:1
4,29    0,00    kworker/0:1
4,57    0,00    kworker/0:1H
4,99    0,00    kworker/1:1H
60,93    1,81    kopano-server
6,27    9,61    python
66,74    0,00    rcu_sched
8,78    2,98    jbd2/vda1-8
9,34    9,68    python
9,97    0,01    univention-mana
cswch/s    nvcswch/s    Command
(ucs)    09.11.2016    _x86_64_(2CPU)
 
Sorry, for digging up an old thread, but I am seeing a similar issue.

On the host:
Code:
[root@pve ~]$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  1 4973316 47982220 148744 11946924    3    3   130   105    0    0  8  2 89  1  0
 0  0 4973316 47982000 148744 11946924    0    0  8064   771 1768 7646  1  0 98  1  0
 1  2 4973316 47981612 148744 11946924    0    0  7836   861 2021 9263  1  0 97  2  0
 1  0 4973316 47982236 148748 11946948    0    0  7276  2191 4548 13255  1  1 96  3  0
 0  1 4973316 47982368 148748 11946948    0    0  8992  3525 1952 7127  1  0 98  0  0
 2  1 4973316 47983068 148748 11946956    0    0  8948  4609 1442 5929  0  0 99  1  0
 0  0 4972292 47969028 148748 11946940  704    0 12732   650 1711 6678  1  1 98  1  0
 0  1 4972292 47966632 148748 11946960    0    0  6120  3081 2010 7583  0  0 98  1  0
 0  0 4972292 47966224 148756 11946980    0    0 14364  1365 4539 12566  0  1 96  3  0
 0  1 4972292 47966200 148756 11946976    0    0  8676  6449 2029 7366  1  0 98  1  0
 1  0 4972292 47965468 148756 11947064    0    0  8212  8995 2814 8610  2  1 96  2  0
 0  0 4972292 47966244 148756 11947064    0    0 18456   240 1500 6124  1  0 99  0  0
 0  0 4972292 47966212 148756 11947064    0    0 13076   210 1546 6344  1  0 99  0  0
 0  0 4972292 47966460 148764 11947096    0    0 13520   504 4706 14870  1  0 98  1  0
 0  0 4972292 47966492 148764 11947096    0    0 15960   200 1839 8243  0  0 99  0  0

On the guest:

Code:
root@polaris:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  1  30220 285180   7316 9476376    0    0  8806     1   74  122  1  0 76 23  0
 0  1  30220 276400   7316 9485380    0    0  8904     0  197  349  0  0 75 25  0
 0  1  30220 268712   7316 9492972    0    0  7684     0  175  296  1  0 75 25  0
 0  1  30220 261148   7316 9500420    0    0  7616     0  197  403  0  0 75 24  0
 0  1  30220 254576   7316 9506936    0    0  6516     0  165  278  0  0 75 25  0
 0  1  30220 245928   7316 9515836    0    0  8872     0  176  322  0  0 75 25  0
 0  1  30220 237000   7316 9524888    0    0  8928     0  173  303  0  0 75 25  0
 0  1  30220 224848   7316 9536904    0    0 12000     0  201  385  0  0 75 24  0
 0  1  30220 219020   7316 9542660    0    0  5776     0  159  283  0  0 75 25  0
 0  1  30220 205164   7316 9556616    0    0 13936     0  239  413  1  0 75 24  0
 0  1  30220 197104   7316 9564456    0    0  8008     0  164  283  0  0 75 25  0
 0  1  30220 188388   7316 9573180    0    0  8556     0  170  323  0  0 75 25  0

So a 25% IO wait in the guest while the host is barely breaking sweat.


The guest is an Ubuntu Bionic install.

Any idea on how to improve on this?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!