[1.4b2] Ubuntu Hardy KVM guest : soft lockup - CPU#0 stuck

luma

New Member
Sep 17, 2009
8
0
1
Hi,

I'm testing Proxmox VE and I'm very impressed by the product.

I'm running the host on a Dell R300, last bios, etc... Here is my cpuinfo (quadcore so x4) :

Code:
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Xeon(R) CPU           X3323  @ 2.50GHz
stepping        : 6
cpu MHz         : 2500.001
cache size      : 3072 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr dca sse4_1 lahf_lm
bogomips        : 5003.53
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:
I try PVE to migrate my VMWare Server 2 to Proxmox.

My first try is to migrate my monitoring server running Nagios/Cacti. The migration run fine. (vmdk conversion + VMWare Tools removal).

From time to time, my guest seems to freeze, here is an guest dmesg extract :
Code:
[    0.000000] BUG: soft lockup - CPU#0 stuck for 11s! [swapper:0]
[    0.000000] CPU 0:
[    0.000000] Modules linked in: iptable_filter ip_tables x_tables parport_pc lp parport loop ipv6 joydev serio_raw psmouse pcspkr evdev virtio_balloon ext3 jbd mbcache sg sr_mod cdrom usbhid hid sd_mod ata_generic pata_acpi ata_piix virtio_pci virtio_ring virtio uhci_hcd 8139cp libata scsi_mod usbcore 8139too mii fbcon tileblit font bitblit softcursor fuse
[    0.000000] Pid: 0, comm: swapper Not tainted 2.6.24-24-server #1
[    0.000000] RIP: 0010:[<ffffffff80244090>]  [<ffffffff80244090>] __do_softirq+0x60/0xe0
[    0.000000] RSP: 0018:ffffffff80689f20  EFLAGS: 00000206
[    0.000000] RAX: 0000000000000042 RBX: 0000000000000042 RCX: 0000000000000006
[    0.000000] RDX: 0000000000000000 RSI: 0000000000000005 RDI: ffffffff8020b3b9
[    0.000000] RBP: ffffffff80689ea0 R08: 0000000000000000 R09: 0000000000000010
[    0.000000] R10: ffff81008098a000 R11: ffffffff8025ca70 R12: ffffffff8020cfd6
[    0.000000] R13: ffffffff8067f120 R14: ffffffff805c8100 R15: 0000000000000287
[    0.000000] FS:  0000000000000000(0000) GS:ffffffff805c6000(0000) knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[    0.000000] CR2: 00007f0e8e30d000 CR3: 000000001cc04000 CR4: 00000000000006e0
[    0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[    0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[    0.000000]
[    0.000000] Call Trace:
[    0.000000]  <IRQ>  [<ffffffff8020d52c>] call_softirq+0x1c/0x30
[    0.000000]  [<ffffffff8020ed45>] do_softirq+0x35/0x90
[    0.000000]  [<ffffffff80244028>] irq_exit+0x88/0x90
[    0.000000]  [<ffffffff80220b76>] smp_apic_timer_interrupt+0x46/0x70
[    0.000000]  [<ffffffff8020b390>] default_idle+0x0/0x40
[    0.000000]  [<ffffffff8020b390>] default_idle+0x0/0x40
[    0.000000]  [<ffffffff8020b390>] default_idle+0x0/0x40
[    0.000000]  [<ffffffff8020cfd6>] apic_timer_interrupt+0x66/0x70
[    0.000000]  <EOI>  [<ffffffff8020b3b9>] default_idle+0x29/0x40
[    0.000000]  [<ffffffff8020b418>] cpu_idle+0x48/0xe0
[    0.000000]  [<ffffffff80632885>] start_kernel+0x2c5/0x350
[    0.000000]  [<ffffffff8063212e>] _sinittext+0x12e/0x140
[    0.000000]
This is not a problem of load average, it is pretty low, my last top reports :

  • 0.04, 0.21, 0.15 on guest
  • 0.02, 0.10, 0.09 on host

Here is my guest config :
Code:
name: ba-monitor
bootdisk: ide0
ostype: l26
memory: 512
sockets: 1
ide0: local:102/vm-102-disk-1.qcow2
boot: cad
freeze: 0
cpuunits: 1000
acpi: 0
kvm: 1
onboot: 1
cores: 1
vlan0: rtl8139=12:C8:52:3B:48:A8
I had to disable ACPI in config to prevent clock speed problems on guest. Without ACPI, I have no more clock problem that can be 2 min or 3 min late by hour. (please note : I'm running NTP on host).

Do you have any tip for me to successfully convert my guest?

Keep the good work team!

Thanks and regards
 
Last edited:
Just a few more information.
I have a 99.1% software interrupts usage reported by top on my guest when the cpu locks :

Code:
top - 12:49:59 up 21:28,  1 user,  load average: 0.85, 0.49, 0.23
Tasks:  61 total,   4 running,  57 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.1%sy,  0.0%ni,  0.7%id,  0.0%wa,  0.0%hi, 99.1%si,  0.0%st
Mem:    510920k total,   416836k used,    94084k free,   109468k buffers
Swap:   979956k total,        0k used,   979956k free,   204972k cached

Hope that helps
 
A day trying to resolve the problem...

But now I have 6 hours uptime with no problem. It seems the "demand-based power management" option in bios was the problem.

:D
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!