Very high load of the node

docent

Renowned Member
Jul 23, 2009
96
1
73
Hello,
Yesterday I have upgraded to PVE 3.4 from 3.2 and today I have big problems. The load of the node suddenly grew at 12:00. I can’t find the cause. Some servers hang with the next messages:
gw.pngload1.pngload2.png

Code:
# iotop -d 10 -P
Total DISK READ:      21.17 K/s | Total DISK WRITE:       2.82 M/s
  PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
10287 be/4 root        0.00 B/s    0.00 B/s  0.00 % 13.75 % kvm -id 104
 8691 be/4 root        6.33 K/s    8.61 K/s  0.00 % 13.54 % kvm -id 140
 9633 be/4 root        0.00 B/s    0.00 B/s  0.00 % 12.79 % kvm -id 111
10059 be/4 root        0.00 B/s    0.00 B/s  0.00 % 10.82 % kvm -id 156
 8895 be/4 root        0.00 B/s    0.00 B/s  0.00 %  7.17 % kvm -id 117
 9178 be/4 root        0.00 B/s    0.00 B/s  0.00 %  5.01 % kvm -id 119
 9277 be/4 root      405.13 B/s   44.31 K/s  0.00 %  2.08 % kvm -id 108
 7534 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.27 % [txg_sync]
10858 be/4 root      229.47 K/s  752.69 K/s  0.00 %  0.02 % kvm -id 113
12155 be/4 root       30.46 K/s  329.96 K/s  0.00 %  0.01 % kvm -id 116
 8423 be/4 root      810.26 B/s   25.72 K/s  0.00 %  0.01 % kvm -id 112
10481 be/4 root        0.00 B/s    4.35 K/s  0.00 %  0.01 % kvm -id 106
 1083 be/3 root        0.00 B/s 1215.38 B/s  0.00 %  0.00 % [jbd2/dm-0-8]
 2554 be/3 root        0.00 B/s 1620.51 B/s  0.00 %  0.00 % [jbd2/sda4-8]
10076 be/4 root        0.00 B/s    9.50 K/s  0.00 %  0.00 % kvm -id 110
30156 be/4 root        0.00 B/s   11.87 K/s  0.00 %  0.00 % kvm -id 109
29999 be/4 root      405.13 B/s  130.16 K/s  0.00 %  0.00 % kvm -id 153
 9801 be/4 root        0.00 B/s    4.75 K/s  0.00 %  0.00 % kvm -id 131
65437 be/4 root        0.00 B/s    2.77 K/s  0.00 %  0.00 % kvm -id 124
64509 be/4 root        0.00 B/s    3.17 K/s  0.00 %  0.00 % kvm -id 102
11121 be/4 root      810.26 B/s    5.54 K/s  0.00 %  0.00 % kvm -id 130
 9751 be/4 root        0.00 B/s    2.77 K/s  0.00 %  0.00 % kvm -id 129
11169 be/4 root      405.13 B/s  405.13 B/s  0.00 %  0.00 % kvm -id 127

Code:
# zpool iostat -v 10
...
               capacity     operations    bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
pool2       2.20T   799G      0     66  32.7K  3.09M
  pve-csv2  2.20T   799G      0     66  32.7K  3.09M
cache           -      -      -      -      -      -
  sdb       55.9G  7.62M      0      1  21.0K   256K
----------  -----  -----  -----  -----  -----  -----

Code:
# iostat -d -x 10
...
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     1.30    1.10  237.30     5.40  3124.20    26.26     0.02    0.07    6.36    0.04   0.06   1.33
sdc               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
sdb               0.00     0.00    2.10    0.00    75.75     0.00    72.14     0.00    0.67    0.67    0.00   0.67   0.14
dm-0              0.00     0.00    0.20    1.70     1.20    13.60    15.58     0.00    1.05   10.00    0.00   1.05   0.20
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-2              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-3              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-4              0.00     0.00    0.10   98.30     0.60  2755.00    56.01     0.00    0.05    7.00    0.04   0.05   0.45

Code:
# top
top - 16:11:27 up 1 day,  2:07,  2 users,  load average: 4.94, 5.70, 6.35
Tasks: 1087 total,   1 running, 1086 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.5 us,  1.4 sy,  0.0 ni, 95.7 id,  0.3 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem:    128840 total,    76732 used,    52107 free,       95 buffers
MiB Swap:    65535 total,        0 used,    65535 free,     4892 cached
 
   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 12155 root      20   0 5157m 3.5g 4116 S    16  2.8  12:09.68 kvm
 29999 root      20   0 9231m 7.8g 3900 S    15  6.2  49:53.35 kvm
  9801 root      20   0 4852m 4.1g 3960 S    15  3.3 465:40.06 kvm
 64509 root      20   0 10.8g  10g 3972 S    10  8.0 245:27.25 kvm
 11169 root      20   0 1406m 1.0g 3772 S     8  0.8 114:50.26 kvm
  8423 root      20   0 3676m 3.1g 3808 S     6  2.5 113:30.77 kvm
 10858 root      20   0 9313m 5.2g 3788 S     5  4.2  89:14.78 kvm

Code:
# pveversion --verbose
proxmox-ve-2.6.32: 3.3-147 (running kernel: 3.10.0-1-pve)
pve-manager: 3.4-1 (running version: 3.4-1/3f2d890e)
pve-kernel-3.10.0-1-pve: 3.10.0-5
pve-kernel-2.6.32-28-pve: 2.6.32-124
pve-kernel-2.6.32-37-pve: 2.6.32-147
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.3-20
pve-firmware: 1.1-3
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-31
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-12
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

Code:
# pveperf
CPU BOGOMIPS:      110201.04
REGEX/SECOND:      921999
HD SIZE:           62.87 GB (/dev/mapper/pve-root)
BUFFERED READS:    499.85 MB/sec
AVERAGE SEEK TIME: 9.07 ms
FSYNCS/SECOND:     4271.21

Code:
# pveperf /pool2/VMs/images/
CPU BOGOMIPS:      110201.04
REGEX/SECOND:      942748
HD SIZE:           2970.82 GB (pool2/VMs)
FSYNCS/SECOND:     4683.57

Code:
# pveperf /mnt/sda4/images/
CPU BOGOMIPS:      110201.04
REGEX/SECOND:      923666
HD SIZE:           3023.67 GB (/dev/sda4)
BUFFERED READS:    349.94 MB/sec
AVERAGE SEEK TIME: 9.98 ms
FSYNCS/SECOND:     2474.56

Code:
# qm list | grep -v stopped
      VMID NAME                 STATUS     MEM(MB)    BOOTDISK(GB) PID
       102 server102            running    10240             50.00 64509
       104 server104            running    512                2.00 10287
       106 server106            running    2048             300.00 10481
       108 server108            running    2048               4.00 9277
       109 server109            running    4096              50.00 30156
       110 server110            running    1024             150.00 100765
       111 server111            running    2048             100.00 9633
       112 server112            running    3072              32.00 8423
       113 server113            running    8192              48.00 10858
       115 server115            running    1024              50.00 10631
       116 server116            running    4096              32.00 12155
       117 server117            running    2048               4.00 8895
       119 server119            running    1024              16.00 9178
       122 server122            running    1024              50.00 10779
       124 server124            running    3072              40.00 65437
       127 server127            running    1024              30.00 11169
       128 server128            running    2048               8.00 11283
       129 server129            running    4096              50.00 9751
       130 server130            running    1024              50.00 11121
       131 server131            running    4096              40.00 9801
       140 server140            running    1024             150.00 8691
       153 server153            running    8192             150.00 29999
       156 server156            running    2048             150.00 10059

Thanks.
 
I am using: proxmox-ve-2.6.32: 3.4-150 (running kernel: 3.10.0-8-pve)

And below a list of all our hypervisors:
hypervisor01 load average: 7.90, 6.79, 7.84 - VMs: 35
hypervisor02 load average: 3.37, 4.06, 4.23 - VMs: 31
hypervisor03 load average: 5.22, 5.32, 5.74 - VMs: 27
hypervisor04 load average: 9.37, 7.16, 6.20 - VMs: 37
hypervisor05 load average: 12.11, 14.66, 13.08 <-- Running only Windows VMs - VMs: 18
hypervisor06 load average: 2.55, 3.45, 3.20 - VMs: 28
hypervisor07 load average: 0.50, 0.69, 0.18 <- Running kernel pve-kernel-2.6.32-37-pve - VMs: 32

Quite a difference between hypervisor02 running kernel 3.10 and hypervisor07 running kernel 2.6 with the exact amount of VM's.
I will migrate VM's on hypervisor07 to hypervisor02, and VM's of hypervisor02 to hypervisor07 to check what the load will be then.
 
Last edited:
The results after migrating VMs of hypervisor07 to hypervisor02, and vice versa:
hypervisor02 load average: 6.37, 7.22, 5.77 <- Running kernel pve-kernel-3.10.0-8-pve - VMs: 32
hypervisor07 load average: 0.68, 1.02, 0.26 <- Running kernel pve-kernel-2.6.32-37-pve - VMs: 31

It's evident...
 
But unix 'load' is a totally artificial meassure, and I don't think you can compare that between different kernel version.
 
But unix 'load' is a totally artificial meassure, and I don't think you can compare that between different kernel version.

How do you want us to meassure the load then? In my opinion seeing only this difference in Unix 'load' is saying something is not right..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!