Dell R520 Recurring Problem CPU locking

Mihai

Renowned Member
Dec 22, 2015
104
8
83
39
I've been working with a Dell R520 with dual Xeon(R) CPU E5-2430 v2 @ 2.50GHz and I've had this recurring issue (for multiple years) after various kernel updates where the virtual machines (both windows and linux) lock up. Previously, I was somehow able to rectify this issue by disabling as many CPU features as possible in the BIOS, but this is no longer possible. I would say this has happened about 3 times previously.

This happened with Kernels 5.15 and I upgraded to kernel 5.19 thinking it might solve it, but it has not.

All hardware tests are passing and everything seems to be working fine as far as I can tell, except the virtual machines are not able to run. Ceph is working correctly and the cluster is fine, except that I can't run any VMs on this node.

I don't see any errors show up in the Syslog.

The cluster is 3-notes and the backend is a CEPH cluster.

My only solution is to replace the whole server.

After a machine has been running for 5 or so minutes, past the grub loader, the following can be seen on the console:

virtual machine error 3.png

Does anyone know what the issue can be or how I can diagnose it?
 

Attachments

  • virtual machine error.png
    virtual machine error.png
    11.4 KB · Views: 3
  • virtual machine error.png
    virtual machine error.png
    28 KB · Views: 2
  • virtual machine error 2.png
    virtual machine error 2.png
    27.2 KB · Views: 4
Last edited:
I have r520 in production with same kind of processors, I never had seen this error.

Is your bios updated ? Do you have installed intel-microcode package to have recent cpu update ?

Do you have any error in your idrac lifecycle logs ?
 
I have r520 in production with same kind of processors, I never had seen this error.

Is your bios updated ? Do you have installed intel-microcode package to have recent cpu update ?

Do you have any error in your idrac lifecycle logs ?

I've got BIOS Version 2.9.0.

The only critical Licecycle Errors in the past 2 years are chassis opened, drives removed and power supply redundancy lost (regular stuff). Warnings have to do with NIC ports going up and down.

I've never installed any intel-microcode package for a CPU update. I'm willing to give it a try. Do you recommend this method?
 
I installed the intel-microcode package and restarted then I used all of the following commands to check for a microcode update and there were none.

Code:
dmesg | grep "microcode updated early to"
journalctl -b -k | grep "microcode updated early to"
zgrep "microcode updated early to" /var/log/kern.log*

I checked /proc/cpuinfo for the microcode and here's what I get:

Code:
microcode       : 0x42e

Looking at this microcode update guide under Ivy bridge Xeon Procesor E5 v2, it says the production MCU Rev is 0x42E, which is what I already have.

I still have the same issue =(
 
Last edited:
can you try to add in /etc/default/grub

GRUB_CMDLINE_LINUX="idle=poll intel_idle.max_cstate=0 intel_pstate=disable processor.max_cstate=1"

then "# update-grub"

and reboot ?


This should force cpu to never sleep, maybe it could help.
 
can you try to add in /etc/default/grub

GRUB_CMDLINE_LINUX="idle=poll intel_idle.max_cstate=0 intel_pstate=disable processor.max_cstate=1"

then "# update-grub"

and reboot ?


This should force cpu to never sleep, maybe it could help.

Thanks for the tip but unfortunately, it did not solve the issue, and it also made the server fans run at 100% so I reverted.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!