High Load Average

scool

New Member
Sep 9, 2012
10
0
1
Hello all,
First post here as a proxmox user.

I need your assistance in order to sort out the high load average my server is experiencing.
I am an novice linux user and i am using proxmox for about 4-5 months.

The host machine is as follows:
Harddisk 2x 2000 GB SATA 3,5" 7.200 rpmCPU 2x AMD Opteron 6164 HE 12-Core
RAM 24x Gigabyte RAM

This host is used for production products. i can't use VM due to network policy.
so,
2 openvz with plesk panel (hosts about 200+ domains(
1 openvz with centovacast (shoutcast,icecast,autodj, hosts 10 streaming accounts)
1 openvz with backup software

Although it looks like a small usage, since previous week, server is high on load average.
i can't find the cause of it. i 've search over threads for similar issue (found few) but compered to other server mine is quite smaller.

My proxmox :
Linux 2.6.32-23-pve #1 SMP Tue Aug 6 07:04:06 CEST 2013 x86_64 GNU/Linux
proxmox-ve-2.6.32: 3.1-114 (running kernel: 2.6.32-23-pve)
pve-manager: 3.1-24 (running version: 3.1-24/060bd5a6)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-19-pve: 2.6.32-96
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-26-pve: 2.6.32-114
pve-kernel-2.6.32-23-pve: 2.6.32-109
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-2
pve-cluster: 3.0-8
qemu-server: 3.1-8
pve-firmware: 1.0-23
libpve-common-perl: 3.0-9
libpve-access-control: 3.0-8
libpve-storage-perl: 3.0-18
pve-libspice-server1: 0.12.4-2
vncterm: 1.1-6
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.1-1


Current load average : 60.61 32.62 40.47
 
Following a disastrous upgrade from 3.0 to 3.1, where very high load averages were accompanied by kernel errors, unusual process times, progressive inability to start or kill processes and finally general system failure, I did a clean install of 3.1 (from the OVH Proxmox installer). Within minutes of first boot, it was clear that the system was exhibiting the same behaviour. This is most visible with syslog entries like:

Nov 21 09:09:01 host /USR/SBIN/CRON[6551]: (root) CMD (/usr/local/rtm/bin/rtm 30 > /dev/null 2> /dev/null)
Nov 21 09:09:04 host kernel: INFO: task rtm:6467 blocked for more than 120 seconds.
Nov 21 09:09:04 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 21 09:09:04 host kernel: rtm D ffff881073c2c340 0 6467 1 0 0x00000000
Nov 21 09:09:04 host kernel: ffff881075373d48 0000000000000086 ffff88002821ea40 ffff88002821ea40
Nov 21 09:09:04 host kernel: ffff881075373cf8 ffffffff8105c98e ffff881075373cf8 ffffffff8106412d
Nov 21 09:09:04 host kernel: ffff881075373d18 ffff88002821ea40 ffff881075373fd8 ffff881075373fd8
Nov 21 09:09:04 host kernel: Call Trace:
Nov 21 09:09:04 host kernel: [<ffffffff8105c98e>] ? pick_next_task_fair+0x15e/0x1c0
Nov 21 09:09:04 host kernel: [<ffffffff8106412d>] ? put_prev_task_fair+0xdd/0x370
Nov 21 09:09:04 host kernel: [<ffffffff81541374>] schedule_timeout+0x204/0x300
Nov 21 09:09:04 host kernel: [<ffffffff81055646>] ? enqueue_task+0x66/0x80
Nov 21 09:09:04 host kernel: [<ffffffff81540bb7>] wait_for_completion+0xd7/0x110
Nov 21 09:09:04 host kernel: [<ffffffff81064010>] ? default_wake_function+0x0/0x20
Nov 21 09:09:04 host kernel: [<ffffffff81066e20>] sched_exec+0xd0/0xe0
Nov 21 09:09:04 host kernel: [<ffffffff811a7a8b>] do_execve+0xdb/0x2c0
Nov 21 09:09:04 host kernel: [<ffffffff81009947>] sys_execve+0x47/0x70
Nov 21 09:09:04 host kernel: [<ffffffff8100b5da>] stub_execve+0x6a/0xc0
Nov 21 09:09:04 host kernel: INFO: task hddinfo.pl:6470 blocked for more than 120 seconds.
Nov 21 09:09:04 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 21 09:09:04 host kernel: hddinfo.pl D ffff8810730173e0 0 6470 6464 0 0x00000000
Nov 21 09:09:04 host kernel: ffff881074c6bd48 0000000000000082 ffff881074915080 ffff881079bb1150
Nov 21 09:09:04 host kernel: ffff881074c6bd08 ffffffff8119b657 ffff881074c6bcd8 ffff881079bb1150
Nov 21 09:09:04 host kernel: 0000000000000021 0000000000008020 ffff881074c6bfd8 ffff881074c6bfd8
Nov 21 09:09:04 host kernel: Call Trace:
Nov 21 09:09:04 host kernel: [<ffffffff8119b657>] ? __dentry_open.isra.10+0x137/0x350
Nov 21 09:09:04 host kernel: [<ffffffff8119b964>] ? nameidata_to_filp+0x44/0x60
Nov 21 09:09:04 host kernel: [<ffffffff81541374>] schedule_timeout+0x204/0x300
Nov 21 09:09:04 host kernel: [<ffffffff810586a8>] ? task_rq_lock+0x58/0xa0
Nov 21 09:09:04 host kernel: [<ffffffff81540bb7>] wait_for_completion+0xd7/0x110
Nov 21 09:09:04 host kernel: [<ffffffff81064010>] ? default_wake_function+0x0/0x20
Nov 21 09:09:04 host kernel: [<ffffffff81066e20>] sched_exec+0xd0/0xe0
Nov 21 09:09:04 host kernel: [<ffffffff811a7a8b>] do_execve+0xdb/0x2c0
Nov 21 09:09:04 host kernel: [<ffffffff81009947>] sys_execve+0x47/0x70
Nov 21 09:09:04 host kernel: [<ffffffff8100b5da>] stub_execve+0x6a/0xc0

This was accompanied by the same increasing system load, reaching similar values as the previous poster.

lspci -v output attached, snippets from dmesg attached.

Has anyone else experienced this on their host? (I've not yet tried a test kernel, and have to restore some VMs so will have to do a proxmox 2 install, for now.)

View attachment lspci.txt
View attachment dmesg.txt
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!