Nehalem Xeon E5520 and Intel S5520UR

bohansen · Nov 19, 2009

tdi said:
Got dual Intel(R) Xeon(R) CPU E5506 @ 2.13GHz server, centos hangs with CPU stuck 2-4 time per hour.

Are you running the latest 2.6.24-18 kernel from pvetest repository?

tdi · Nov 19, 2009

no,running Linux pve2 2.6.24-8-pve, if it will help ill migrate this night to the pvetest

bohansen · Nov 19, 2009

tdi said:
no,running Linux pve2 2.6.24-8-pve, if it will help ill migrate this night to the pvetest

The more testing the new kernel and kmod-kvm the better ;-). See http://www.proxmox.com/forum/showthread.php?t=2591 for the upgrade procedure.

bohansen · Nov 20, 2009

34 hours of compiling without a hiccup. I've added two more machines to the test and will let them run over the weekend.

tdi · Nov 20, 2009

tdi@pve2:~$ uname -a
Linux pve2 2.6.24-9-pve #1 SMP PREEMPT Tue Nov 17 09:34:41 CET 2009 x86_64 GNU/Linux

Running this one now.

Same problems.
on guest: BUG: soft lockup - CPU#0 stuck for 10s! [swapper:0]

on host:
ata1.00: qc timeout (cmd 0xa0)
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x5 (timeout)
ata1.00: status: { DRDY ERR }
ata1: port is slow to respond, please be patient (Status 0xd0)
ata1: device not ready (errno=-16), forcing hardreset
ata1: soft resetting link
ata1.01: NODEV after polling detection
ata1.00: configured for UDMA/33
ata1: EH complete
ata1.00: qc timeout (cmd 0xa0)
ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen
ata1.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0
cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
res 51/20:03:00:00:00/00:00:00:00:00/a0 Emask 0x5 (timeout)
ata1.00: status: { DRDY ERR }
ata1: port is slow to respond, please be patient (Status 0xd0)
ata1: device not ready (errno=-16), forcing hardreset
ata1: soft resetting link
ata1.01: NODEV after polling detection
ata1.00: configured for UDMA/33
ata1: EH complete

May it be the problem?

tdi · Nov 21, 2009

Ok i checked on vanilla 2.6.31.6 and ata problem does not occur. i did not start VMs, as did not compile openVZ into kernel.

How can I start KVM machines on proxmox on custom kernel without vz?

It says : unable to create fairsched node. I suspect it is some custom version of kvm userspace tool?

I just want to check whether it will run properly on 2.6.31.

My raid card is:
3ware Inc 9690SA SAS/SATA-II RAID PCIem module 3w_9xxx

dietmar · Nov 21, 2009

tdi said:
It says : unable to create fairsched node. I suspect it is some custom version of kvm userspace tool?

try to set 'cpuunits' to 0 - that should suppress fairsched code.

tdi · Nov 21, 2009

Still the same error, with setting in command line -cpuunits 0 or just not specyfing the cpuunits at all.

Please show me the way. I tried to use not promox kvm-quemu, but then i get Wrong number of CPUs.

dietmar · Nov 22, 2009

tdi said:
Still the same error, with setting in command line -cpuunits 0 or just not specyfing the cpuunits at all.

What version do you use? I will only work with latest 1.4

# pveversion -v

tdi · Nov 22, 2009

pve2:~# pveversion -v
pve-manager: 1.4-10 (pve-manager/1.4/4403)
qemu-server: 1.1-9
pve-kernel: 2.6.24-18
pve-qemu-kvm: 0.11.0-2
pve-firmware: 1
vncterm: 0.9-2
vzctl: 3.0.23-1pve3
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1

Maybe I have to not use -id ?

tdi · Nov 23, 2009

Ok. What i did was check the proxmox with kernel from 1.4, pvetest, my own 2.6.31 compilation. The centos guest still have CPU stuck for Xs! [XXXX] bugs logged and machines are absent for the amount of time.

I do not know what can be wrong with it any more.

/usr/bin/kvm -monitor unix:/var/run/qemu-server/101.mon,server,nowait -vnc unix:/var/run/qemu-server/101.vnc,password -pidfile /var/run/qemu-server/101.pid -daemonize -usbdevice tablet -name ihula1 -smp sockets=1,cores=4 -boot menu=on,order=c -vga std -tdf -localtime -k pl -drive file=/dev/cdrom,if=ide,index=2,media=cdrom -drive file=/var/lib/vz/images/101/vm-101-disk-1.qcow2,if=scsi,index=0,boot=on -m 8096 -net tap,vlan=0,ifname=vmtab101i0,script=/var/lib/qemu-server/bridge-vlan -net nic,vlan=0,model=e1000,macaddr=D6:2B:55:4E:9F:6D

This is how i run command. This are my 2 processors:
Intel(R) Xeon(R) CPU E5506 @ 2.13GHz

Nov 23 00:06:58 223010 kernel: BUG: soft model name : lockup - CPU#0 stuck for 10s! [events/0:14]
Nov 23 00:06:58 223010 kernel: CPU 0:
Nov 23 00:06:58 223010 kernel: Modules linked in: ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc dm_multipath scsi_dh video hwmon backlight sbs i2c_ec button battery asus_acpi acpi_memhotplug ac lp floppy joydev sg virtio_pci i2c_piix4 e1000 serio_raw ide_cd pcspkr virtio_ring i2c_core parport_pc cdrom parport virtio dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod ata_piix libata sym53c8xx scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Nov 23 00:06:58 223010 kernel: Pid: 14, comm: events/0 Not tainted 2.6.18-128.7.1.el5 #1
Nov 23 00:06:58 223010 kernel: RIP: 0010:[<ffffffff80011f8c>] [<ffffffff80011f8c>] __do_softirq+0x51/0x133
Nov 23 00:06:58 223010 kernel: RSP: 0018:ffffffff80425f60 EFLAGS: 00000206
Nov 23 00:06:58 223010 kernel: RAX: 0000000000000022 RBX: 0000000000000022 RCX: 7ffffffffffffffe
Nov 23 00:06:58 223010 kernel: RDX: ffff8102199ebfd8 RSI: 0000000000000000 RDI: ffff8102199e07a0
Nov 23 00:06:58 223010 kernel: RBP: ffffffff80425ee0 R08: 0000000000000001 R09: ffff810080be8000
Nov 23 00:06:58 223010 kernel: R10: ffffffff80425f98 R11: 0000000000000202 R12: ffffffff8005dc8e
Nov 23 00:06:58 223010 kernel: R13: 0000000000000046 R14: ffffffff80077533 R15: ffffffff80425ee0
Nov 23 00:06:58 223010 kernel: FS: 0000000000000000(0000) GS:ffffffff803ac000(0000) knlGS:0000000000000000
Nov 23 00:06:58 223010 kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Nov 23 00:06:58 223010 kernel: CR2: 00002aaaaae04b78 CR3: 0000000000201000 CR4: 00000000000006e0
Nov 23 00:06:58 223010 kernel:
Nov 23 00:06:58 223010 kernel: Call Trace:
Nov 23 00:06:58 223010 kernel: <IRQ> [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
Nov 23 00:06:58 223010 kernel: [<ffffffff8006cada>] do_softirq+0x2c/0x85
Nov 23 00:06:58 223010 kernel: [<ffffffff8005dc8e>] apic_timer_interrupt+0x66/0x6c
Nov 23 00:06:58 223010 kernel: <EOI> [<ffffffff8823a972>] :e1000:e1000_watchdog_task+0x2e/0x65c
Nov 23 00:06:58 223010 kernel: [<ffffffff8004d1b2>] run_workqueue+0x94/0xe4
Nov 23 00:06:58 223010 kernel: [<ffffffff80049a2c>] worker_thread+0x0/0x122
Nov 23 00:06:58 223010 kernel: [<ffffffff80049b1c>] worker_thread+0xf0/0x122
Nov 23 00:06:58 223010 kernel: [<ffffffff8008a4b4>] default_wake_function+0x0/0xe
Nov 23 00:06:58 223010 kernel: [<ffffffff800323c9>] kthread+0xfe/0x132
Nov 23 00:06:58 223010 kernel: [<ffffffff8005dfb1>] child_rip+0xa/0x11
Nov 23 00:06:58 223010 kernel: [<ffffffff800322cb>] kthread+0x0/0x132
Nov 23 00:06:58 223010 kernel: [<ffffffff8005dfa7>] child_rip+0x0/0x11
Nov 23 00:06:58 223010 kernel:

dietmar · Nov 23, 2009

tdi said:
pve2:~#Maybe I have to not use -id ?

Yes, that may be the reason.

dietmar · Nov 23, 2009

tdi said:
I do not know what can be wrong with it any more.

Please can you try with another network emulation - the bug seems to be e1000 related.

bohansen · Nov 23, 2009

I finished testing compiling over the weekend on the following:
4 Ubuntu 8.04 server i386
2GB RAM, IDE, raw-file, rtl8139, 1 socket, 4 cores

My setup is:
Intel SR2600URBRB platform
2x Xeon 5520
Adaptec raid controller

The VMs have been running without a problem for nearly 5 days of stress testing, so I guess the new kernel solves the problem with multicore KVMs - at least for Ubuntu 8.04 .

However I noticed Proxmox was really slow this morning. Not the VMs - only proxmox itself. SSH was very slow, stalling at times and at the webinterface I was not able to do a login. I shutdown two machines to free some memory and afterwards proxmox gained speed again. Is this supposed behaviour if RAM is low? Top said I had 600MB free, but 180MB swap. After shutdown it is still around 180MB swap but 5GB free. Webinterface worked instantly after the two WMs were shutdown.

Before shutdown

Code:

top - 09:31:19 up 5 days, 15:15,  1 user,  load average: 1.13, 1.14, 0.98
Tasks: 250 total,   2 running, 247 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.6%us,  6.8%sy,  0.0%ni, 92.6%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  12221400k total, 11617640k used,   603760k free,    54960k buffers
Swap: 11534328k total,   184344k used, 11349984k free,  1724008k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
16711 www-data  20   0  253m  31m 4144 R  100  0.3   2:26.21 apache2

After shutdown

Code:

top - 09:48:10 up 5 days, 15:32,  2 users,  load average: 0.11, 0.13, 0.40
Tasks: 250 total,   1 running, 248 sleeping,   0 stopped,   1 zombie
Cpu(s):  0.4%us,  0.4%sy,  0.0%ni, 99.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  12221400k total,  7185464k used,  5035936k free,    57484k buffers
Swap: 11534328k total,   174764k used, 11359564k free,  1740960k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                     
 9832 www-data  20   0  256m  35m 4336 S    1  0.3   0:00.30 apache2                                     
17697 root      20   0 94812  21m 2316 S    1  0.2   0:33.06 pvedaemon                                   
 7858 root      20   0 1129m 236m 1284 S    0  2.0  79:10.89 kvm                                         
29499 root      20   0 2183m 2.0g 1344 S    0 17.2   9427:19 kvm                                         
29616 root      20   0 2247m 2.0g 1344 S    0 17.2   9446:37 kvm

I don't know if it makes any difference but I had an ssh login running top all weekend also. This morning it showed something like above where sshd was using too much cpu time and the response time was slow until I did a new login.

Best regards,
Bo

cyberbootje · Apr 30, 2010

Sorry but i have to kick this thread.

I got a cpu stuck error again, i upgraded proxmox to 1.5 and kernel 2.6.32 to solve this issue and it has been working fine for about ~3 months.
This was on several debian 5 vm's.

My setup:
Nehalem Xeon 5520

# pveversion -v
pve-manager: 1.5-5 (pve-manager/1.5/4627)
running kernel: 2.6.32-1-pve
proxmox-ve-2.6.32: 1.5-4
pve-kernel-2.6.32-1-pve: 2.6.32-4
pve-kernel-2.6.24-9-pve: 2.6.24-18
pve-kernel-2.6.24-8-pve: 2.6.24-16
qemu-server: 1.1-11
pve-firmware: 1.0-3
libpve-storage-perl: 1.0-8
vncterm: 0.9-2
vzctl: 3.0.23-1pve8
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.11.1-2
ksm-control-daemon: 1.0-3

Anyone can confirm this is a Nehalem issue ?

tom · Apr 30, 2010

cyberbootje said:
Sorry but i have to kick this thread.

I got a cpu stuck error again, i upgraded proxmox to 1.5 and kernel 2.6.32 to solve this issue and it has been working fine for about ~3 months.
This was on several debian 5 vm's.

My setup:
Nehalem Xeon 5520

# pveversion -v
pve-manager: 1.5-5 (pve-manager/1.5/4627)
running kernel: 2.6.32-1-pve
proxmox-ve-2.6.32: 1.5-4
pve-kernel-2.6.32-1-pve: 2.6.32-4
pve-kernel-2.6.24-9-pve: 2.6.24-18
pve-kernel-2.6.24-8-pve: 2.6.24-16
qemu-server: 1.1-11
pve-firmware: 1.0-3
libpve-storage-perl: 1.0-8
vncterm: 0.9-2
vzctl: 3.0.23-1pve8
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.11.1-2
ksm-control-daemon: 1.0-3

Anyone can confirm this is a Nehalem issue ?

you can try the new 2.6.32 kernel, currently only in the pvetest repo.
see http://forum.proxmox.com/threads/37...d-2.6.32-including-KVM-0.12.3-and-gPXE-UPDATE

cyberbootje · Apr 30, 2010

tom said:
you can try the new 2.6.32 kernel, currently only in the pvetest repo.
see http://forum.proxmox.com/threads/37...d-2.6.32-including-KVM-0.12.3-and-gPXE-UPDATE

Is this kernel intended to fix this issue or is it just a wild guess?

tom · Apr 30, 2010

cyberbootje said:
Is this kernel intended to fix this issue or is it just a wild guess?

a guess.

cyberbootje · Apr 30, 2010

tom said:
a guess.

Well this is now a well known issue as fas as i know with this cpu, is there goining to be a fix anytime soon?
Or can you tell me wich cpu works fine with kvm?

cyberbootje · May 2, 2010

Maby someone knows how to replicate the processor stuck issue?
So i can test te other kernel....

Nehalem Xeon E5520 and Intel S5520UR

Active Member

tdi

Guest

Active Member

Active Member

tdi

Guest

tdi

Guest

Proxmox Staff Member

tdi

Guest

Proxmox Staff Member

tdi

Guest

tdi

Guest

Proxmox Staff Member

Proxmox Staff Member

Active Member

Member

Proxmox Staff Member

Member

Proxmox Staff Member

Member

Member

We value your privacy