Proxmox freezing

euler001

New Member
Jan 21, 2020
16
0
1
53
Hi,

I installed Proxmox 6.1 on two servers 1 month ago. Each server has eight 1.8T hard disks. I used RAID 50 to combined them together on each server. Both servers are suffering from system freezing issues. When it happens, SSH and https into the system fail. Even the monitor and keyboard hooking to the server physically cannot get any response. After freezing, powering cycle the system can fully restore the system. The last time the system ran for almost 13 days and went to freezing. And the system went to freezing soon after I run three commands vgs, pvs, and lvs (although I do not think they are related). This does not seem a hardware issue because both servers have the same issue.

Could anyone lend me a hand on this? If you can give me a direct root cause analysis and solution, that is great. If you could ignite me with a workaround solution, that also works. This has become a big pain in my work. Please help!

thanks
 
Last edited:
Could you please provide more information about the systems you are using? Hardware-Specs, Configuration, how many VMs / CTs etc.? Is there PCIe-passthrough being used?
Do the log files (syslog, messages, etc.) provide any information right before the freeze?

Your description is very generic so hard to answer.
I would start investigating from a hardware perspective, including the Power Supply. I have found myself several time in strange situations when the PSU had their age or were running out of power on a specific rail (for me it was 5V).
 
Hi tburger,

thank you so much for your response.
Here is some hardware and VM info.
For each server, I used LSI MegaRAID hardware RAID controller to group all eight 1.8T hard drives into one RAID disk. There are 30 VMs on each server. each server works on stand-alone mode. Each server has 32 CPUs and 160G RAM.
The latest freezing happened last night at 21:27pm. From then on, the syslog just gave whole bunch of "^@". I rebooted the system on Mar 2 at 08:10:46am. Only after rebooting, the syslog started to log meaningful messages.
I am attaching syslog which includes the freezing event, the dmidecode output, lshw output, and some commands' output for your review.
should you need any more info, please let me know.

once again thank you,

Euler
 

Attachments

  • outputFromProxmoxFreezingDiagnose.txt
    143.8 KB · Views: 3
  • dmidecode_result.txt
    45 KB · Views: 2
  • lshw_result.txt
    123.9 KB · Views: 3
  • syslog.1.txt
    576.4 KB · Views: 3
Last edited:
Didn't find anything that jumps into my eye.
You were mentioning two systems, what does the second system report in such an situation?

These points you could investigate further:
  1. What exactly were the points in time your system crashed? Is it happening
    1. After a certain amount of time running?
    2. At a specific time?
  2. What happens at the time?
  3. Was the system experiencing a "special or unusual" load?
  4. Are you using the Proxmox VE replication service? If no, I would disable that service, because it was the last line being logged successfully. Maybe that triggers something in your situation.
I have experienced similar symptoms in the past. Sources varied
  • unstable PSU (unlikely with two systems though)
  • Memory issues (also unlikely if two systems are affected)
  • Firmware issues in RAID-controllers (e.g. if the RAID-controller hangs itself up - system will behave erraticly) - have you checked upgrades?
  • Disk firmware <-> raid controller issue (same here - are upgrades available)
  • Bios issus (upgrade?)
  • Bios settings - in my case it was an high load triggering some odd behavior once an specific ECC setting was used on memory (basically we disabled ECC scrub, to bump up performance of memory. System didn't like it, after some minutes in testing it went BSOD - was a WIN server).
You could try and reboot these systems on a regular basis (every week) and see if that solves the problem. If yes that would indicate "something running out of something".

I know this is all guesswork... sorry for not being able to be more precise. There is just nothing I can point to and say: that's it...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!