Proxmox kernel-crash

tdo

Member
Oct 8, 2011
33
0
6
Hello,
I permanently get the kernel crash, that is attached to this post. Yesterday more than 4 times, today only once. I don't know where this comes from. Maybe someone can help me to fix it?
System is very unstable. When this error appears, the system is completly offline and only a hard-reset brings it back online.
I've already checked the logfiles, but there are no information about this problem not even any logentries 1-2 minutes before system crash.

Technicians in the datacenter already checked RAM, CPU and HDD but couldn't find any problems. They say it may come from VE 101 (KVM virtualized Windows 2008 R2 with 6 GB RAM and 4 cpu threads assigned) because it is using ~50-170% cpu, but they are not 100% sure. I've checked this but most of the time this VE is only using 2-50%.

So I'm really lost, how to solve this problem :(
I'm now trying to install a new VE and copy the user-files from the VE 101 to the new one, to check if VE101 may not be configured correctly.

My system specs are:

CPU: i7-950 @3.06 GHz (4 cores, 8 threads, 8MB)
RAM: 24 GB DDR3 1333
HDD: 2x 1.5 TB SATA3

Proxmox version: 1.9

Let me know if you need any more information. Thank you.

Kernel-Crash is attached as image.
 

Attachments

  • proxmox1_crash.jpg
    proxmox1_crash.jpg
    155.9 KB · Views: 26
Last edited:
...

Proxmox version: 1.9

Let me know if you need any more information. ..

pls always post the full output of 'pveversion -v'
 
proxmox:~# pveversion -v
pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-47
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-47
qemu-server: 1.1-32
pve-firmware: 1.0-14
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-2pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6
proxmox:~#
 
No, I didn't change anything (that's why my first guess was that there is a hardware-problem). System was stable for more than 2 months. Then the first crash last thursday, then every day until now. Last crash was ~3 hours ago. I've already reinstalled the VE 101, that I mentioned in my first post, but it still crashes :(
System is in cluster with another system (different hardware specs) for ~1 month. This cluster-node is stable.
 
I can't migrate the vm at the moment easily. It's in production-use and the other node is in another country and has different IPs. That means, if I migrate, I need to change the IP of the system. And it's not even sure, that the crash is caused by any VM. I'm running 3 kvm-virtualized VMs and 1 openVZ VM on this hostnode. I've already re-installed the 101 KVM-VM and the crash still occurs. Maybe it's caused by some other KVM-VM on this machine, but I don't know which one :(
For now, I've increased the video-resolution of the proxmox-system. So, if the crash occurs again, technicians will connect via remote-console and take a screenshot. When resolution is higher, I have a better chance to get more lines from the kernel-crash at the beginning (because I'm unable to scroll-up when the crash occurs).
 
No, I didn't change anything (that's why my first guess was that there is a hardware-problem). System was stable for more than 2 months.

hm, you're running 2.6.32-47 - which is not "2 months old" - so i guess you probably did change the kernel at some point...
Just because we have a very similar situation - you could try to boot into 2.6.32-4 (or -5 if you'd install that before); this eliminates kernel crashes on our testing platform at least.
Unfortunately, we have not been able to track this down so far...
 
The system is running since 2 months without any failures. Sure I have upgraded the kernel and proxmox during these 2 months (and rebooted the system). Even after the upgrade system was working stable. The crashes occured last thursday for the first time, then every day 1-3 crashes. Kernel upgrade was one or two weeks earlier. But also the cluster-node is using this kernel-version without problems. But, okay, I think I will switch to the old kernel version, when system crashes next time.
 
we also did not experience those kernel crashes immediately after update - it turned out that kernel crashes when system is heavily loaded. Meanwhile we're testing using "phoronix-test-suite benchmark build-linux-kernel" - 2.6.32-6 _always_ crashes few seconds after start, 2.6.32-5 successfully completes. We have another system running 2.6.32-6 without any issues, though...
 
Okay, I've checked the system with "phoronix-test-suite benchmark build-linux-kernel" and it runs for ~20 minutes now without any crashes or errors :(
Just installed the new Kernel and switched to "pvetest". So, when system crashes next time, I will boot the new updated kernel and see if it helps. Thanks for help.
 
Still crashing. Even with the new kernel-version from today.
I've now switched back to the old kernel version from may 2011. If this will also end up in a crash, I think I will migrate all VMs to a third server, re-install the first one and then migrate them back.
 

Attachments

  • proxmox_crash2.png
    proxmox_crash2.png
    88.5 KB · Views: 11
How many CPU do you use for the VM? (you should use maximal 4). Try to turn off hyperthreading.

...if you addressed me - our crashes are independent from any VM, we even stopped all vz/kvm processes before running our "bench" on the node, just to be sure...

i will have a look into HT - but i am not sure if it can be disabled on that machine by BIOS... (will try kernel option 'noht' if not)