i just started powertop in a ssh session to see what the cpu is doing.
My ssh session froze, so i started a ping to the host to see if it was still alive: yes no problems.
So i logged in with a second ssh session and i found this in dmesg:
at that point the machine came back to live and powertop started just like nothing happened. The "load" had rissen to 34
At that moment 3 kvm guests and 1 openvz guest were running.
pveversion -v
pve-manager: 1.1-4 (pve-manager/1.1/3746)
qemu-server: 1.0-10
pve-kernel: 2.6.24-5
pve-kvm: 83-1
pve-firmware: 1
vncterm: 0.9-1
vzctl: 3.0.23-1pve1
vzdump: 1.1-1
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1dso1
# pveperf
CPU BOGOMIPS: 42564.11
REGEX/SECOND: 219550
HD SIZE: 99.95 GB (/dev/mapper/pve-root)
BUFFERED READS: 105.55 MB/sec
AVERAGE SEEK TIME: 10.78 ms
FSYNCS/SECOND: 3548.39
DNS EXT: 70.03 ms
Long dmesg @ http://dth.net/pve/dmesg_cpu_lockup
And here the munin stats of this server: http://stats.bitsource.net/munin/la.ow.bitsource.net/vhost2.la.ow.bitsource.net.html
The spike can be clearly seen (when vieuwed not to late after i post this)
Anybody ideas /suggestions ?
My ssh session froze, so i started a ping to the host to see if it was still alive: yes no problems.
So i logged in with a second ssh session and i found this in dmesg:
Code:
BUG: soft lockup - CPU#4 stuck for 11s! [kstopmachine:32366]
CPU 4:
Modules linked in: kvm_intel kvm vzethdev vznetdev simfs vzrst vzcpt tun vzdquota vzm
on vzdev xt_tcpudp xt_length ipt_ttl xt_tcpmss xt_TCPMSS iptable_mangle iptable_filte
r xt_multiport xt_limit ipt_tos ipt_REJECT ip_tables x_tables ac battery ipv6 bridge
joydev psmouse sg serio_raw e1000e evdev pcspkr button sr_mod cdrom xfs dm_mirror dm_
snapshot dm_mod raid10 raid1 md_mod ide_generic ide_core sd_mod usbhid hid usb_storag
e libusual ahci ehci_hcd libata uhci_hcd scsi_mod usbcore thermal processor fan
Pid: 32366, comm: kstopmachine Not tainted 2.6.24-2-pve #1 ovz005
RIP: 0010:[<ffffffff80281e78>] [<ffffffff80281e78>] stopmachine+0x68/0x100
RSP: 0018:ffff81025edcff30 EFLAGS: 00000202
RAX: 0000000000000001 RBX: 0000000000000001 RCX: 0000000000000001
RDX: 0000000000000001 RSI: 0000000000000202 RDI: 0000000000000000
RBP: ffffffff804a6387 R08: ffff81025edce000 R09: ffff81000105f810
R10: ffff810001065ee0 R11: 0000000000000001 R12: 0000000000000004
R13: ffffffff8024a550 R14: 0000000000000000 R15: ffff81025edcfeb0
FS: 0000000000000000(0000) GS:ffff81032f6c3300(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00000000b7ae4000 CR3: 0000000330551000 CR4: 00000000000026e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff8020d338>] child_rip+0xa/0x12
[<ffffffff80281e10>] stopmachine+0x0/0x100
[<ffffffff8020d32e>] child_rip+0x0/0x12
At that moment 3 kvm guests and 1 openvz guest were running.
pveversion -v
pve-manager: 1.1-4 (pve-manager/1.1/3746)
qemu-server: 1.0-10
pve-kernel: 2.6.24-5
pve-kvm: 83-1
pve-firmware: 1
vncterm: 0.9-1
vzctl: 3.0.23-1pve1
vzdump: 1.1-1
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1dso1
# pveperf
CPU BOGOMIPS: 42564.11
REGEX/SECOND: 219550
HD SIZE: 99.95 GB (/dev/mapper/pve-root)
BUFFERED READS: 105.55 MB/sec
AVERAGE SEEK TIME: 10.78 ms
FSYNCS/SECOND: 3548.39
DNS EXT: 70.03 ms
Long dmesg @ http://dth.net/pve/dmesg_cpu_lockup
And here the munin stats of this server: http://stats.bitsource.net/munin/la.ow.bitsource.net/vhost2.la.ow.bitsource.net.html
The spike can be clearly seen (when vieuwed not to late after i post this)
Anybody ideas /suggestions ?