VE2.0 beta Hardware not responding anymore

hk@ · Feb 26, 2012

Hi,
we got this running:
pveversion -v
pve-manager: 2.0-33 (pve-manager/2.0/c598d9e1)
running kernel: 2.6.32-7-pve
proxmox-ve-2.6.32: 2.0-60
pve-kernel-2.6.32-6-pve: 2.6.32-55
pve-kernel-2.6.32-7-pve: 2.6.32-60
lvm2: 2.02.88-2pve1
clvm: 2.02.88-2pve1
corosync-pve: 1.4.1-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.8-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.7-1
pve-cluster: 1.0-23
qemu-server: 2.0-20
pve-firmware: 1.0-15
libpve-common-perl: 1.0-14
libpve-access-control: 1.0-16
libpve-storage-perl: 2.0-12
vncterm: 1.0-2
vzctl: 3.0.30-2pve1
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.0-4
ksm-control-daemon: 1.1-1

After running fine with one VE on this node, we moved 18 lightweight VEs onto the box to see how it's doing - we first did a live-migration, then set all the wrong memory-settings using the GUI and rebooted each moved VE on the new host.

After finishing this everything worked fine, but three hours later the whole host went down into some sort of systemhang (no remote kvm only ipmi therefore I have no say what was displayed on the screen). No entries in syslog - if anyone has some hints how to debug this please let me know.

After a remote reset things do seem fine again (no there was no backup running that might have nuked lvm). Any hints appreciated, if the system kicks itself again, we'll move back to non-testing.

regards
hk

dietmar · Feb 27, 2012

hk@ said:
Any hints appreciated, if the system kicks itself again, we'll move back to non-testing.

First, moving back to non-testing will not fix the bug ;-)

Also, the output from the kernel would help - see: http://wiki.openvz.org/Remote_console_setup

Is the bug reproducible. Does it depend on specific containers?

Search

Search

VE2.0 beta Hardware not responding anymore

hk@

Renowned Member

dietmar

Proxmox Staff Member