high io wait but ONLY in guest - no clue anymore

sigmarb

Renowned Member
Nov 8, 2016
75
6
73
39
Dear Users,

i discover poor io/performance in a linux guest but the host itself is not under io load:

Proxmox 4.2-18/158720b9

Guest:

Code:
root@ucs:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r  b  swpd  free  buff  cache  si  so  bi  bo  in  cs us sy id wa
1  0  31464 219112 177856 2389736  0  1  122  450  347  926 13  3 49 33
3  0  31464 217528 177856 2390068  0  0  328  64 1821 3704 35  8 37  1
1  0  31464 210512 177856 2390076  0  0  4  88 2028 3694 39  6 42  3
1  0  31464 147496 177868 2390220  0  0  0  124 2191 1992 38  5 44  9
1  2  31464 182484 177872 2396688  0  0  24  276 1058 2545 50  2 33 12
1  1  31464 206720 177876 2396716  0  0  48  240 1330 3816 45  5 18 28
9  1  31464 215344 177892 2397464  0  0  656  420 1784 5563 71 11  2  1
9  1  31464 213368 177924 2397448  0  0  60  384 2316 3937 66 11  9  3
7  1  31464 214012 177944 2397520  0  0  12  516 2301 4646 55  5 23 13
3  5  31464 206664 177944 2397592  0  0  28  4136 2721 9189 55  9 13 17
9  1  31464 206560 177952 2397604  0  0  16  1428 1504 4140 58  5  2 31
4  6  31464 211896 177952 2397616  0  0  4  1264 1756 6035 62  6  0 21
1  4  31464 211760 177952 2397704  0  0  12  964 1162 3153 42  6  7 41
3  0  31464 210168 177964 2397760  0  0  60  368 1172 4065 50  7  5 32

Host:

Code:
root@proxmox:/etc/pve/qemu-server# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r  b  swpd  free  buff  cache  si  so  bi  bo  in  cs us sy id wa st
2  0  6692 140780  91572 105592  0  0  265  505  163  271 21  6 72  1  0
1  0  6692 141556  91580 105592  0  0  0  138 11407 25441 52 16 32  0  0
3  0  6692 141852  91580 105592  0  0  16  252 6246 8947 59  8 34  0  0
1  0  6692 142488  91580 105592  0  0  3088  211 3952 5786 53  3 44  0  0
3  0  6692 141876  91580 105592  0  0  108  517 5778 7384 77  6 17  0  0
5  0  6692 140888  91596 105592  0  0  700  721 7951 13610 90 10  1  0  0
2  0  6692 140700  91596 105592  0  0  140  562 7305 11937 77  9 14  0  0
1  0  6692 139180  91596 105592  0  0  152  3942 5169 8913 48  4 48  0  0
1  0  6692 138996  91596 105592  0  0  652  934 5568 6536 78  6 16  0  0
1  0  6692 136484  91604 105592  0  0  336  2169 5509 6908 76  6 17  1  0
2  0  6692 136088  91612 105596  0  0  964  721 4650 7814 54  5 40  0  0
3  0  6692 137080  91628 105580  0  0  1240  1397 4295 6154 52  6 38  4  0
3  0  6692 138496  91628 105596  0  0  388  890 6050 7138 84  7  9  0  0
2  0  6692 138432  91628 105596  0  0  1924  558 6754 9311 87  6  7  0  0


Code:
root@proxmox:/etc/pve/qemu-server# more 100.conf
bootdisk: virtio0
cores: 2
ide2: local:iso/UCS-Installation-amd64.iso,media=cdrom
memory: 7000
name: ucs
net0: virtio=32:89:72:1E:06:3D,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
smbios1: uuid=01a567fd-3ef2-4c2d-9c86-d6f59b961c63
sockets: 1
virtio0: local:100/vm-100-disk-1.qcow2,size=750G


any idea on this?

Thank you.

Sigmar
 
IMHO, there is something else you should be worried about: very high values for context switches (cs) and interrupts (in). High io-wait might be just consequence.

Check "cat /proc/interrupts" a few times to see which one is triggered so frequently. Also try "pidstat -wt" to find what is causing so many context switches...
 
Hi,
have you tried "raw" instaed of qcow2?
And what about cache?

Markus
Not yet as its quite an act to move a complete VM to a different storage.
Cache is not set. This is recommended in the proxmox wiki. Should i change it?
 
IMHO, there is something else you should be worried about: very high values for context switches (cs) and interrupts (in). High io-wait might be just consequence.

Check "cat /proc/interrupts" a few times to see which one is triggered so frequently. Also try "pidstat -wt" to find what is causing so many context switches...

Thank you guys for your help.

Code:
root@ucs:~# cat /proc/interrupts
           CPU0       CPU1       
  0:         34          0   IO-APIC-edge      timer
  1:         10          0   IO-APIC-edge      i8042
  6:          3          0   IO-APIC-edge      floppy
  8:          1          0   IO-APIC-edge      rtc0
  9:          0          0   IO-APIC-fasteoi   acpi
10:      56599          0   IO-APIC-fasteoi   virtio0
11:         34          0   IO-APIC-fasteoi   uhci_hcd:usb1
12:        144          0   IO-APIC-edge      i8042
14:          0          0   IO-APIC-edge      ata_piix
15:        100          0   IO-APIC-edge      ata_piix
24:          0          0   PCI-MSI-edge      virtio1-config
25:    3864343          0   PCI-MSI-edge      virtio1-req.0
26:          0          0   PCI-MSI-edge      virtio2-config
27:   10457704          0   PCI-MSI-edge      virtio2-input.0
28:        154          0   PCI-MSI-edge      virtio2-output.0
NMI:          0          0   Non-maskable interrupts
LOC:   11228791   11714087   Local timer interrupts
SPU:          0          0   Spurious interrupts
PMI:          0          0   Performance monitoring interrupts
IWI:          1          0   IRQ work interrupts
RTR:          0          0   APIC ICR read retries
RES:    9579801   11840088   Rescheduling interrupts
CAL:        698    1938183   Function call interrupts
TLB:     149713     148759   TLB shootdowns
TRM:          0          0   Thermal event interrupts
THR:          0          0   Threshold APIC interrupts
MCE:          0          0   Machine check exceptions
MCP:        383        383   Machine check polls
HYP:          0          0   Hypervisor callback interrupts
ERR:          0
MIS:          0

sorted pidstat -wt with only values over 1:

Code:
0,99    1,37    apache2
1,00    0,00    apache2
1,00    0,00    memcached
1,04    0,00    ntpd
1,04    0,99    apache2
1,10    1,18    apache2
12,19    0,00    ksoftirqd/1
14,30    0,02    ksoftirqd/0
14,70    0,10    nscd
1,73    0,02    python2.7
1,88    0,27    smbd
1,99    0,00    nrpe
1,99    0,00    python
2,00    0,00    rpc.gssd
2,51    0,00    runsvdir
2,59    0,09    runsv
2,82    0,00    kworker/1:1
4,29    0,00    kworker/0:1
4,57    0,00    kworker/0:1H
4,99    0,00    kworker/1:1H
60,93    1,81    kopano-server
6,27    9,61    python
66,74    0,00    rcu_sched
8,78    2,98    jbd2/vda1-8
9,34    9,68    python
9,97    0,01    univention-mana
cswch/s    nvcswch/s    Command
(ucs)    09.11.2016    _x86_64_(2CPU)
 
Sorry, for digging up an old thread, but I am seeing a similar issue.

On the host:
Code:
[root@pve ~]$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  1 4973316 47982220 148744 11946924    3    3   130   105    0    0  8  2 89  1  0
 0  0 4973316 47982000 148744 11946924    0    0  8064   771 1768 7646  1  0 98  1  0
 1  2 4973316 47981612 148744 11946924    0    0  7836   861 2021 9263  1  0 97  2  0
 1  0 4973316 47982236 148748 11946948    0    0  7276  2191 4548 13255  1  1 96  3  0
 0  1 4973316 47982368 148748 11946948    0    0  8992  3525 1952 7127  1  0 98  0  0
 2  1 4973316 47983068 148748 11946956    0    0  8948  4609 1442 5929  0  0 99  1  0
 0  0 4972292 47969028 148748 11946940  704    0 12732   650 1711 6678  1  1 98  1  0
 0  1 4972292 47966632 148748 11946960    0    0  6120  3081 2010 7583  0  0 98  1  0
 0  0 4972292 47966224 148756 11946980    0    0 14364  1365 4539 12566  0  1 96  3  0
 0  1 4972292 47966200 148756 11946976    0    0  8676  6449 2029 7366  1  0 98  1  0
 1  0 4972292 47965468 148756 11947064    0    0  8212  8995 2814 8610  2  1 96  2  0
 0  0 4972292 47966244 148756 11947064    0    0 18456   240 1500 6124  1  0 99  0  0
 0  0 4972292 47966212 148756 11947064    0    0 13076   210 1546 6344  1  0 99  0  0
 0  0 4972292 47966460 148764 11947096    0    0 13520   504 4706 14870  1  0 98  1  0
 0  0 4972292 47966492 148764 11947096    0    0 15960   200 1839 8243  0  0 99  0  0

On the guest:

Code:
root@polaris:~# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  1  30220 285180   7316 9476376    0    0  8806     1   74  122  1  0 76 23  0
 0  1  30220 276400   7316 9485380    0    0  8904     0  197  349  0  0 75 25  0
 0  1  30220 268712   7316 9492972    0    0  7684     0  175  296  1  0 75 25  0
 0  1  30220 261148   7316 9500420    0    0  7616     0  197  403  0  0 75 24  0
 0  1  30220 254576   7316 9506936    0    0  6516     0  165  278  0  0 75 25  0
 0  1  30220 245928   7316 9515836    0    0  8872     0  176  322  0  0 75 25  0
 0  1  30220 237000   7316 9524888    0    0  8928     0  173  303  0  0 75 25  0
 0  1  30220 224848   7316 9536904    0    0 12000     0  201  385  0  0 75 24  0
 0  1  30220 219020   7316 9542660    0    0  5776     0  159  283  0  0 75 25  0
 0  1  30220 205164   7316 9556616    0    0 13936     0  239  413  1  0 75 24  0
 0  1  30220 197104   7316 9564456    0    0  8008     0  164  283  0  0 75 25  0
 0  1  30220 188388   7316 9573180    0    0  8556     0  170  323  0  0 75 25  0

So a 25% IO wait in the guest while the host is barely breaking sweat.


The guest is an Ubuntu Bionic install.

Any idea on how to improve on this?