Spontaneous reboots of the host system

Aleksey Makarenko

Renowned Member
Nov 12, 2010
5
0
66
Saint Petersburg, Russia
Hello!
Faced with the problem of spontaneous reboots on the host:
Intel S5520HC, 2 x Xeon E5645, Adaptec RAID 5805

The problem can be easily reproduced with the attempt to run the Ubuntu server 12.04.1 as a KVM-guest (even just run an installer from CD). With previous versions all OK... With windows guest all OK. On this host i have 12 KVM guest (windows and linux).

In system logs no errors before reboot. Sometimes in IPMI SEL i can see this (sometimes OEM System boot event only):

Code:
 197 | 11/16/2012 | 04:29:57 | System Event #0x83 | OEM System boot event | Asserted
 198 | 11/28/2012 | 12:17:56 | Processor CATERR | State Asserted
 199 | 11/28/2012 | 12:18:01 | Power Unit Pwr Unit Status | Power off/down | Asserted
 19a | 11/28/2012 | 12:18:06 | Power Unit Pwr Unit Status | Power off/down | Deasserted
 19b | 11/28/2012 | 12:18:07 | Processor CATERR | State Asserted
 19c | 11/28/2012 | 12:18:30 | System Event #0x83 | Timestamp Clock Sync | Asserted
 19d | 11/28/2012 | 12:18:51 | System Event #0x83 | Timestamp Clock Sync | Asserted
 19e | 11/28/2012 | 12:20:07 | System Event #0x83 | OEM System boot event | Asserted

Code:
root@virtserver3:~# pveversion -v
pve-manager: 2.2-31 (pve-manager/2.2/e94e95e9)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-12-pve: 2.6.32-68
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-33
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1

Please help! Maybe I'm missing something?
Sorry for the bad English...
 
Hello!
Faced with the problem of spontaneous reboots on the host:
Intel S5520HC, 2 x Xeon E5645, Adaptec RAID 5805

The problem can be easily reproduced with the attempt to run the Ubuntu server 12.04.1 as a KVM-guest (even just run an installer from CD). With previous versions all OK... With windows guest all OK. On this host i have 12 KVM guest (windows and linux).

In system logs no errors before reboot. Sometimes in IPMI SEL i can see this (sometimes OEM System boot event only):

Code:
 197 | 11/16/2012 | 04:29:57 | System Event #0x83 | OEM System boot event | Asserted
 198 | 11/28/2012 | 12:17:56 | Processor CATERR | State Asserted
 199 | 11/28/2012 | 12:18:01 | Power Unit Pwr Unit Status | Power off/down | Asserted
 19a | 11/28/2012 | 12:18:06 | Power Unit Pwr Unit Status | Power off/down | Deasserted
 19b | 11/28/2012 | 12:18:07 | Processor CATERR | State Asserted
 19c | 11/28/2012 | 12:18:30 | System Event #0x83 | Timestamp Clock Sync | Asserted
 19d | 11/28/2012 | 12:18:51 | System Event #0x83 | Timestamp Clock Sync | Asserted
 19e | 11/28/2012 | 12:20:07 | System Event #0x83 | OEM System boot event | Asserted

Code:
root@virtserver3:~# pveversion -v
pve-manager: 2.2-31 (pve-manager/2.2/e94e95e9)
running kernel: 2.6.32-16-pve
proxmox-ve-2.6.32: 2.2-82
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-12-pve: 2.6.32-68
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.4-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.93-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.9-1
pve-cluster: 1.0-33
qemu-server: 2.0-69
pve-firmware: 1.0-21
libpve-common-perl: 1.0-39
libpve-access-control: 1.0-25
libpve-storage-perl: 2.0-36
vncterm: 1.0-3
vzctl: 4.0-1pve2
vzprocps: 2.0.11-2
vzquota: 3.1-1
pve-qemu-kvm: 1.2-7
ksm-control-daemon: 1.1-1

Please help! Maybe I'm missing something?
Sorry for the bad English...
Hi,
not for host-reboots, but ubuntu has an extra kernel for running in a VM - with the standard kernel I also have reboots (of the guest).
In your cases it's looks more as an acpi-problem?!

Udo
 
Hi,
not for host-reboots, but ubuntu has an extra kernel for running in a VM - with the standard kernel I also have reboots (of the guest).
In your cases it's looks more as an acpi-problem?!

Udo
Hmm... Soon we will be building a similar system, and I will have more freedom to experiment, because the problem was found on the production system. I will write about the results. Interestingly, I can't reproduce the problem on the host with the usual desktop-level hardware... Thanks for pointing!
 
Hmm... Soon we will be building a similar system, and I will have more freedom to experiment, because the problem was found on the production system. I will write about the results. Interestingly, I can't reproduce the problem on the host with the usual desktop-level hardware... Thanks for pointing!

Hi,
esp. for an production system I would go for "linux-image-virtual" instead of "linux-image-server" on an ubuntu system.

Udo
 
After a long period of time the server spontaneously rebooted again... And now I'm pretty sure this is a hardware problem.
I found this article http://www.ingmarverheij.com/damn-you-c-states-unexpected-xenserver-reboot/

If I understand correctly, the problem is that the some Nehalem and Westmere processors have design defects, that lead to reboots and lockups when the processor core switches to C3-C6 states.
Sorry for bad English. The author of above article has provided link to Citrix knowledge base, and wrote, that Microsoft Hyper-V can have same issues with processor C-states.
 
Mabe you can unload the ipmi modules to test if ipmi module is rebooting your system?