Hi,
I'm recently encountering a strange and very serious behaviour in one of my proxmox servers. The server sporadically stops. I've set the kernel panic restart timer to a nonzero value so the server is restarting after the failure (it seems no hardware problem keeping it from restarting). There is absolutely no trace in the logs of the cause. No kernel panic message or trace or register dump. The logging suddenly stops at the point of halt, and resumes with kernel booting messages after the automatic restart. This way I cannot even start to diagnose what the problem might be. It's a production server in a datacenter so I can't look at the console when the problem arises. I'm running kernel 2.6.24-12-pve - can't upgrade because in the 2.6.32 branch the EX4650 Promise hwraid in the machine is unstable. This has already happened twice in only a few days interval. After the first one I've upgraded to the latest stable Proxmox release, but it had no effect. The problem is unrelated to the load on the server and there's no specific activity I can spot in the logs before the halt.
Please help me diagnose this problem. There's important data on this server for customers, and I don't like the idea of being forced to move this installation to another node. Should I try to downgrade to 2.6.18 or upgrade to an updated 2.6.3x kernel? I fear the instability of the raid card will be a problem again. I'm not sure 2.6.18 supports this card at all. Should I replace the card to a different brand? (I was planning on doing it anyway since I've had problems before with this specific model.)
Proxmox info:
# pveversion -v
pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.24-12-pve
proxmox-ve-2.6.24: 1.6-26
pve-kernel-2.6.32-4-pve: 2.6.32-28
pve-kernel-2.6.24-12-pve: 2.6.24-25
qemu-server: 1.1-25
pve-firmware: 1.0-9
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-9
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-2
I'm recently encountering a strange and very serious behaviour in one of my proxmox servers. The server sporadically stops. I've set the kernel panic restart timer to a nonzero value so the server is restarting after the failure (it seems no hardware problem keeping it from restarting). There is absolutely no trace in the logs of the cause. No kernel panic message or trace or register dump. The logging suddenly stops at the point of halt, and resumes with kernel booting messages after the automatic restart. This way I cannot even start to diagnose what the problem might be. It's a production server in a datacenter so I can't look at the console when the problem arises. I'm running kernel 2.6.24-12-pve - can't upgrade because in the 2.6.32 branch the EX4650 Promise hwraid in the machine is unstable. This has already happened twice in only a few days interval. After the first one I've upgraded to the latest stable Proxmox release, but it had no effect. The problem is unrelated to the load on the server and there's no specific activity I can spot in the logs before the halt.
Please help me diagnose this problem. There's important data on this server for customers, and I don't like the idea of being forced to move this installation to another node. Should I try to downgrade to 2.6.18 or upgrade to an updated 2.6.3x kernel? I fear the instability of the raid card will be a problem again. I'm not sure 2.6.18 supports this card at all. Should I replace the card to a different brand? (I was planning on doing it anyway since I've had problems before with this specific model.)
Proxmox info:
# pveversion -v
pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.24-12-pve
proxmox-ve-2.6.24: 1.6-26
pve-kernel-2.6.32-4-pve: 2.6.32-28
pve-kernel-2.6.24-12-pve: 2.6.24-25
qemu-server: 1.1-25
pve-firmware: 1.0-9
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-9
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-2