CPU Stuck

yves.menge

Active Member
Dec 9, 2009
22
0
41
Hello!

I have a problem with using CentOS 5.4 under Proxmox 1.4! Directly after Setup i got this error (multiple times)


Code:
Dec 16 09:16:59 webxwb101 kernel: BUG: soft lockup - CPU#0 stuck for 10s! [swapper:0]
Dec 16 09:16:59 webxwb101 kernel: 
Dec 16 09:16:59 webxwb101 kernel: Pid: 0, comm:              swapper
Dec 16 09:16:59 webxwb101 kernel: EIP: 0060:[<c0428e9b>] CPU: 0
Dec 16 09:16:59 webxwb101 kernel: EIP is at __do_softirq+0x57/0x114
Dec 16 09:16:59 webxwb101 kernel:  EFLAGS: 00000286    Not tainted  (2.6.18-164.el5 #1)
Dec 16 09:16:59 webxwb101 kernel: EAX: c0738380 EBX: c06fcf90 ECX: 00000000 EDX: 010cdf00
Dec 16 09:16:59 webxwb101 kernel: ESI: 00000022 EDI: c06f2b00 EBP: 0000000a DS: 007b ES: 007b
Dec 16 09:16:59 webxwb101 kernel: CR0: 8005003b CR2: 08071800 CR3: 374f6000 CR4: 000006d0
Dec 16 09:16:59 webxwb101 kernel:  [<c04073cf>] do_softirq+0x52/0x9c
Dec 16 09:16:59 webxwb101 kernel:  [<c04059d7>] apic_timer_interrupt+0x1f/0x24
Dec 16 09:16:59 webxwb101 kernel:  [<c0403bb0>] default_idle+0x0/0x59
Dec 16 09:16:59 webxwb101 kernel:  [<c0403be1>] default_idle+0x31/0x59
Dec 16 09:16:59 webxwb101 kernel:  [<c0403ca8>] cpu_idle+0x9f/0xb9
Dec 16 09:16:59 webxwb101 kernel:  [<c07019f0>] start_kernel+0x37b/0x383
Dec 16 09:16:59 webxwb101 kernel:  =======================

I have this error on both architectures (i386 or x86_64) and also with all new updates from CentOS.

I used both, IDE or SCSI for HDD and as Networkcard i tried rtl8139 and e1000. HDD Format is qcow2.

Can someone help in this case? I tried also with Debian and there i don't have this error but i need CentOS!

Thx
Yves
 
Hello!

I have a problem with using CentOS 5.4 under Proxmox 1.4! Directly after Setup i got this error (multiple times)


Code:
Dec 16 09:16:59 webxwb101 kernel: BUG: soft lockup - CPU#0 stuck for 10s! [swapper:0]
Dec 16 09:16:59 webxwb101 kernel: 
Dec 16 09:16:59 webxwb101 kernel: Pid: 0, comm:              swapper
Dec 16 09:16:59 webxwb101 kernel: EIP: 0060:[<c0428e9b>] CPU: 0
Dec 16 09:16:59 webxwb101 kernel: EIP is at __do_softirq+0x57/0x114
Dec 16 09:16:59 webxwb101 kernel:  EFLAGS: 00000286    Not tainted  (2.6.18-164.el5 #1)
Dec 16 09:16:59 webxwb101 kernel: EAX: c0738380 EBX: c06fcf90 ECX: 00000000 EDX: 010cdf00
Dec 16 09:16:59 webxwb101 kernel: ESI: 00000022 EDI: c06f2b00 EBP: 0000000a DS: 007b ES: 007b
Dec 16 09:16:59 webxwb101 kernel: CR0: 8005003b CR2: 08071800 CR3: 374f6000 CR4: 000006d0
Dec 16 09:16:59 webxwb101 kernel:  [<c04073cf>] do_softirq+0x52/0x9c
Dec 16 09:16:59 webxwb101 kernel:  [<c04059d7>] apic_timer_interrupt+0x1f/0x24
Dec 16 09:16:59 webxwb101 kernel:  [<c0403bb0>] default_idle+0x0/0x59
Dec 16 09:16:59 webxwb101 kernel:  [<c0403be1>] default_idle+0x31/0x59
Dec 16 09:16:59 webxwb101 kernel:  [<c0403ca8>] cpu_idle+0x9f/0xb9
Dec 16 09:16:59 webxwb101 kernel:  [<c07019f0>] start_kernel+0x37b/0x383
Dec 16 09:16:59 webxwb101 kernel:  =======================
I have this error on both architectures (i386 or x86_64) and also with all new updates from CentOS.

I used both, IDE or SCSI for HDD and as Networkcard i tried rtl8139 and e1000. HDD Format is qcow2.

Can someone help in this case? I tried also with Debian and there i don't have this error but i need CentOS!

Thx
Yves

centos 5.4 works here. what kernel do you use? (uname -a)

and also post your VMID.conf file (see /etc/qemu-server/VMID.conf).

and finally: what hardware platform, all relevant details especially if you do not installed from ISO.
 
I use Kernel

Linux webxwb101.temp.net 2.6.18-164.6.1.el5.centos.plus #1 SMP Wed Nov 4 09:36:36 EST 2009 i686 i686 i386 GNU/Linux

in the VM (tried the centos.plus version but the common version have the error too).

The CONF File:

name: WEBXWB101
bootdisk: ide1
ostype: l26
memory: 1024
sockets: 1
onboot: 1
cores: 1
boot: dac
freeze: 0
cpuunits: 1000
acpi: 1
kvm: 1
vlan1: e1000=4E:4C:C2:3D:01:A4
ide0: none,media=cdrom
ide1: local:101/vm-101-disk-1.qcow2
description: Primary Web Server

The Proxmox Server is a MSI Mainboard based System with a 8 x Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz CPU and 12GB RAM and a 3ware 9650SE-2LP RAID Controller.

I installed Proxmox with the ISO Installer.
 
Last edited:
I use Kernel

Linux webxwb101.temp.net 2.6.18-164.6.1.el5.centos.plus #1 SMP Wed Nov 4 09:36:36 EST 2009 i686 i686 i386 GNU/Linux

....

the question is what kernel do you use on the Proxmox VE host.
 
OOps ;)

Proxmox Server --> Linux webxvm101 2.6.24-8-pve #1 SMP PREEMPT Fri Oct 16 11:17:55 CEST 2009 x86_64 GNU/Linux
 
OOps ;)

Proxmox Server --> Linux webxvm101 2.6.24-8-pve #1 SMP PREEMPT Fri Oct 16 11:17:55 CEST 2009 x86_64 GNU/Linux

this is outdated, run apt-get update and apt-get dist-upgrade to get the latest stable version and test again.
 
OK now i've upgraded to the latest Kernel and reinstalled the CentOS VM but same error still exists!
 
So now i had access to the proxmox console, i see there an error:

2009-12-16_183811.png


It seems that RedHat / CentOS take usage of a unsupported function!
 
So now i had access to the proxmox console, i see there an error:

2009-12-16_183811.png


It seems that RedHat / CentOS take usage of a unsupported function!

you can ignore this error, not relevant for this issue.
 
Since i've Upgraded to Proxmox 1.7 i've the CPU STUCK again and it kills my whole Proxmox host :(

Is there now a solution for that i don't think i can ignore it when it breaks down the whole host and every vm is freezed including the main system...

SysLog Output on server before crash:
Dec 14 13:29:28 kernel [<ffffffff81177a77>] ? walk_page_buffers+0x6f/0x96
Dec 14 13:29:28 kernel [<ffffffff81179713>] ? journal_dirty_data_fn+0x0/0x1f
Dec 14 13:29:28 kernel [<ffffffff8117b2a2>] ? ext3_ordered_write_end+0x7d/0x11e
Dec 14 13:29:28 kernel [<ffffffff810d3bfc>] ? generic_file_buffered_write+0x17c/0x235
Dec 14 13:29:28 kernel [<ffffffff814b2780>] ? _raw_spin_lock+0xe/0x12
Dec 14 13:29:28 kernel [<ffffffff810d56e7>] ? __generic_file_aio_write+0x25e/0x292
Dec 14 13:29:28 kernel [<ffffffff810d577b>] ? generic_file_aio_write+0x60/0xa7
Dec 14 13:29:28 kernel [<ffffffff81118355>] ? do_sync_write+0xcc/0x112
Dec 14 13:29:28 kernel [<ffffffff81061e5f>] ? group_send_sig_info+0x39/0x41
Dec 14 13:29:28 kernel [<ffffffff810101ce>] ? read_tsc+0xe/0x25
Dec 14 13:29:28 kernel [<ffffffff81072b35>] ? ktime_get+0x6a/0xc9
Dec 14 13:29:28 kernel [<ffffffff811f76d1>] ? security_file_permission+0x16/0x18
Dec 14 13:29:28 kernel [<ffffffff81118d26>] ? vfs_write+0xb0/0x10a
Dec 14 13:29:28 kernel [<ffffffff81118de1>] ? sys_pwrite64+0x61/0x82
Dec 14 13:29:28 kernel [<ffffffff81009d32>] ? system_call_fastpath+0x16/0x1b
Dec 14 13:29:28 kernel Code: eb 11 0f b7 f6 0f b6 f8 4c 89 c9 4c 89 d2 e8 e4 fc ff ff c9 c3 90 90 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 38 e0 74 06 f3 90 <8a> 07 eb f6 c9 c3 55 48 89 e5 0f b7 07 38 e0 8d 90 00 01 00 00
Dec 14 13:29:28 kernel Call Trace:
Dec 14 13:29:28 kernel [<ffffffff814b2780>] ? _raw_spin_lock+0xe/0x12
Dec 14 13:29:28 kernel [<ffffffff811c189d>] ? journal_dirty_data+0x73/0x1d5
Dec 14 13:29:28 kernel [<ffffffff811796eb>] ? ext3_journal_dirty_data+0x1d/0x45
Dec 14 13:29:28 kernel [<ffffffff8117972c>] ? journal_dirty_data_fn+0x19/0x1f
Dec 14 13:29:28 kernel [<ffffffff81177a77>] ? walk_page_buffers+0x6f/0x96
Dec 14 13:29:28 kernel [<ffffffff81179713>] ? journal_dirty_data_fn+0x0/0x1f
Dec 14 13:29:28 kernel [<ffffffff8117b2a2>] ? ext3_ordered_write_end+0x7d/0x11e
Dec 14 13:29:28 kernel [<ffffffff810d3bfc>] ? generic_file_buffered_write+0x17c/0x235
Dec 14 13:29:28 kernel [<ffffffff814b2780>] ? _raw_spin_lock+0xe/0x12
Dec 14 13:29:28 kernel [<ffffffff810d56e7>] ? __generic_file_aio_write+0x25e/0x292
Dec 14 13:29:28 kernel [<ffffffff810d577b>] ? generic_file_aio_write+0x60/0xa7
Dec 14 13:29:28 kernel [<ffffffff81118355>] ? do_sync_write+0xcc/0x112
Dec 14 13:29:28 kernel [<ffffffff81061e5f>] ? group_send_sig_info+0x39/0x41
Dec 14 13:29:28 kernel [<ffffffff810101ce>] ? read_tsc+0xe/0x25
Dec 14 13:29:28 kernel [<ffffffff81072b35>] ? ktime_get+0x6a/0xc9
Dec 14 13:29:28 kernel [<ffffffff811f76d1>] ? security_file_permission+0x16/0x18
Dec 14 13:29:28 kernel [<ffffffff81118d26>] ? vfs_write+0xb0/0x10a
Dec 14 13:29:28 kernel [<ffffffff81118de1>] ? sys_pwrite64+0x61/0x82
Dec 14 13:29:28 kernel [<ffffffff81009d32>] ? system_call_fastpath+0x16/0x1b
Dec 14 13:29:28 kernel BUG: soft lockup - CPU#5 stuck for 61s! [kvm:9239]
Dec 14 13:29:28 kernel Modules linked in: vhost_net kvm_intel kvm ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp snd_hda_codec_realtek snd_hda_intel snd_hda_codec i7core_edac tpm_tis snd_hwdep i2c_i801 tpm tpm_bios snd_pcm edac_core snd_timer snd pcspkr soundcore snd_page_alloc firewire_ohci firewire_core crc_itu_t ohci1394 e1000 ieee1394 ahci r8169 libahci pata_jmicron mii 3w_9xxx [last unloaded: scsi_wait_scan]
Dec 14 13:29:28 kernel CPU 5
Dec 14 13:29:28 kernel Modules linked in: vhost_net kvm_intel kvm ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp snd_hda_codec_realtek snd_hda_intel snd_hda_codec i7core_edac tpm_tis snd_hwdep i2c_i801 tpm tpm_bios snd_pcm edac_core snd_timer snd pcspkr soundcore snd_page_alloc firewire_ohci firewire_core crc_itu_t ohci1394 e1000 ieee1394 ahci r8169 libahci pata_jmicron mii 3w_9xxx [last unloaded: scsi_wait_scan]
Dec 14 13:29:28 kernel
Dec 14 13:29:28 kernel Pid: 9239, comm: kvm Tainted: G D 2.6.35-1-pve #1 MSI X58 Pro-E (MS-7522)/MS-7522
Dec 14 13:29:28 kernel RIP: 0010:[<ffffffff8102e074>] [<ffffffff8102e074>] __ticket_spin_lock+0x14/0x1a
Dec 14 13:29:28 kernel RSP: 0018:ffff880300597a38 EFLAGS: 00000293
Dec 14 13:29:28 kernel RAX: 0000000000006b69 RBX: ffff880300597a38 RCX: 0000000000000034
Dec 14 13:29:28 kernel RDX: 0000000000000002 RSI: ffff88061e043c98 RDI: ffff8806242bb94c
Dec 14 13:29:28 kernel RBP: ffffffff8100a6ce R08: 0000000000000000 R09: 0000000000000000
Dec 14 13:29:28 kernel R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000000
Dec 14 13:29:28 kernel R13: 0000000000000400 R14: 0000000000000000 R15: ffffffff00000800
Dec 14 13:29:28 kernel FS: 00000000431cf950(0063) GS:ffff880001ea0000(0000) knlGS:0000000000000000
Dec 14 13:29:28 kernel CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b
Dec 14 13:29:28 kernel CR2: 00000000b7fba000 CR3: 000000061b30e000 CR4: 00000000000026e0
Dec 14 13:29:28 kernel DR0: 0000000000000003 DR1: 00000000000000b0 DR2: 0000000000000001
Dec 14 13:29:28 kernel DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Dec 14 13:29:28 kernel Process kvm (pid: 9239, threadinfo ffff880300596000, task ffff880624230000)
Dec 14 13:29:28 kernel Stack:
Dec 14 13:29:28 kernel ffff880300597a48 ffffffff814b2780 ffff880300597b48 ffffffff811c1e65
Dec 14 13:29:28 kernel <0> ffff88061e043c98 ffff8802e95a2138 ffff880300597ab8 ffff880300597ad0
Dec 14 13:29:28 kernel <0> ffff880624230000 ffff88028eeddc98 00000000a1fe0340 ffff880622b0be70
Dec 14 13:29:28 kernel Call Trace:
Dec 14 13:29:28 kernel [<ffffffff814b2780>] ? _raw_spin_lock+0xe/0x12
Dec 14 13:29:28 kernel [<ffffffff811c1e65>] ? do_get_write_access+0x34d/0x47d
Dec 14 13:29:28 kernel [<ffffffff811c1fbc>] ? journal_get_write_access+0x27/0x38
Dec 14 13:29:28 kernel [<ffffffff81185951>] ? __ext3_journal_get_write_access+0x24/0x4d
Dec 14 13:29:28 kernel [<ffffffff811784f5>] ? ext3_reserve_inode_write+0x44/0x7b
Dec 14 13:29:28 kernel [<ffffffff8117855b>] ? ext3_mark_inode_dirty+0x2f/0x4c
Dec 14 13:29:28 kernel [<ffffffff811786c6>] ? ext3_dirty_inode+0x71/0x88
Dec 14 13:29:28 kernel [<ffffffff811352e6>] ? __mark_inode_dirty+0x31/0x19d
Dec 14 13:29:28 kernel [<ffffffff8112b2b1>] ? file_update_time+0x119/0x140
Dec 14 13:29:28 kernel [<ffffffff810d40b4>] ? file_remove_suid+0x26/0x5c
Dec 14 13:29:28 kernel [<ffffffff810d55f6>] ? __generic_file_aio_write+0x16d/0x292
Dec 14 13:29:28 kernel [<ffffffff810d577b>] ? generic_file_aio_write+0x60/0xa7
Dec 14 13:29:28 kernel [<ffffffff81118355>] ? do_sync_write+0xcc/0x112
Dec 14 13:29:28 kernel [<ffffffff810101ce>] ? read_tsc+0xe/0x25
Dec 14 13:29:28 kernel [<ffffffff81072b35>] ? ktime_get+0x6a/0xc9
Dec 14 13:29:28 kernel [<ffffffff811f76d1>] ? security_file_permission+0x16/0x18
Dec 14 13:29:28 kernel [<ffffffff81118d26>] ? vfs_write+0xb0/0x10a
Dec 14 13:29:28 kernel [<ffffffff81118de1>] ? sys_pwrite64+0x61/0x82
Dec 14 13:29:28 kernel [<ffffffff81009d32>] ? system_call_fastpath+0x16/0x1b
Dec 14 13:29:28 kernel Code: eb 11 0f b7 f6 0f b6 f8 4c 89 c9 4c 89 d2 e8 e4 fc ff ff c9 c3 90 90 55 b8 00 01 00 00 48 89 e5 f0 66 0f c1 07 38 e0 74 06 f3 90 <8a> 07 eb f6 c9 c3 55 48 89 e5 0f b7 07 38 e0 8d 90 00 01 00 00
Dec 14 13:29:28 kernel Call Trace:
Dec 14 13:29:28 kernel [<ffffffff814b2780>] ? _raw_spin_lock+0xe/0x12
Dec 14 13:29:28 kernel [<ffffffff811c1e65>] ? do_get_write_access+0x34d/0x47d
Dec 14 13:29:28 kernel [<ffffffff811c1fbc>] ? journal_get_write_access+0x27/0x38
Dec 14 13:29:28 kernel [<ffffffff81185951>] ? __ext3_journal_get_write_access+0x24/0x4d
Dec 14 13:29:28 kernel [<ffffffff811784f5>] ? ext3_reserve_inode_write+0x44/0x7b
Dec 14 13:29:28 kernel [<ffffffff8117855b>] ? ext3_mark_inode_dirty+0x2f/0x4c
Dec 14 13:29:28 kernel [<ffffffff811786c6>] ? ext3_dirty_inode+0x71/0x88
Dec 14 13:29:28 kernel [<ffffffff811352e6>] ? __mark_inode_dirty+0x31/0x19d
Dec 14 13:29:28 kernel [<ffffffff8112b2b1>] ? file_update_time+0x119/0x140
Dec 14 13:29:28 kernel [<ffffffff810d40b4>] ? file_remove_suid+0x26/0x5c
Dec 14 13:29:28 kernel [<ffffffff810d55f6>] ? __generic_file_aio_write+0x16d/0x292
Dec 14 13:29:28 kernel [<ffffffff810d577b>] ? generic_file_aio_write+0x60/0xa7
Dec 14 13:29:28 kernel [<ffffffff81118355>] ? do_sync_write+0xcc/0x112
Dec 14 13:29:28 kernel [<ffffffff810101ce>] ? read_tsc+0xe/0x25
Dec 14 13:29:28 kernel [<ffffffff81072b35>] ? ktime_get+0x6a/0xc9
Dec 14 13:29:28 kernel [<ffffffff811f76d1>] ? security_file_permission+0x16/0x18
Dec 14 13:29:28 kernel [<ffffffff81118d26>] ? vfs_write+0xb0/0x10a
Dec 14 13:29:28 kernel [<ffffffff81118de1>] ? sys_pwrite64+0x61/0x82
Dec 14 13:29:28 kernel [<ffffffff81009d32>] ? system_call_fastpath+0x16/0x1b

I use latest 1.7 proxmox with all updates (checked with pveversion) and Kernel 2.6.35.

thanks for your help.
 
OK what i've to say is i've upgraded directly from 1.5 to 1.7 is that a problem?

I can reproduce it when i add more than one vCPU to a Windows based system. Then i try to install it and in 99% of the time, the server crashes.
 
OK what i've to say is i've upgraded directly from 1.5 to 1.7 is that a problem?

I can reproduce it when i add more than one vCPU to a Windows based system. Then i try to install it and in 99% of the time, the server crashes.

pls give all details about your crash scenario and I will build up the same in our lab. post your host setup (pveversion -v), your VMID.conf file, and what ISO do you use?
 
Scenario:
6 VMs running with different Systems (Linux, Windows) some of them with 2 vCPU config on a Proxmox 1.7. Then tried to install a Windows 2008R2 with ISO from MSDN (MD5 checked ISO file to be sure that it's valid)

Commonly in when Windows Setup is expanding the image to the disk i can first see some of the Soft Lock issues but later it crashes down before everything is finished. the only thing still working is the main interface of proxmox that replies to ping tests but all services are dead caused by the kernel panic crash. after a reboot of the system Proxmox don't display any issue also not in dmesg.

I've checked also my RAID status to be sure the RAID config and the controller don't generate failures.

Server Specification:
24GB RAM
Intel Core i7 920
3Ware RAID with 1.35TB RAID 1

PVE Version:
webxvm101:/# pveversion -v
pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.7-28
pve-kernel-2.6.32-4-pve: 2.6.32-28
pve-kernel-2.6.18-2-pve: 2.6.18-5
qemu-server: 1.1-25
pve-firmware: 1.0-9
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-9
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-2
ksm-control-daemon: 1.0-4

VMID Configuration:
name: WEBXVS105
ide2: local:iso/en_windows_server_2008r2_datacenter_enterprise_standard_x64_dvd_X15-50365.iso,media=cdrom
bootdisk: ide0
vlan1: e1000=4A:3F:42:91:5B:38
ostype: w2k8
ide0: local:105/vm-105-disk-1.qcow2
memory: 2048
sockets: 2

I've checked my RAM, Disk and Mainboard and all test have been passed without a error. So i'm sure there's not a hardware issue.

Thanks for your work on my case.
 
we do such install tests several times, just did it again - no issues here - just installed the same on a Intel Modular Server.

btw, why are you using desktop hardware? keep in mind, that core i7 does not support ECC memory.
 
I know about ECC support of i7 but i've use this server no since Proxmox release 1.4 without any issue and in the same hardware configuration. i think i do a fresh install of proxmox 1.6 and hope Proxmox will do his job then.
 
No, there is no backup running at this time! I've replaced now my whole 24GB RAM and since then i've no CPU Stuck, no freeze and everything is running without any syslog-entry that claims to be a issue ;)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!