[SOLVED] [SOLVED] Random crashes with Elitedesk 800 G4 after upgrade to 8.2

user_001

New Member
Sep 23, 2024
6
3
3
HI all,

I was running into random crashes after updating proxmox to 8.2 on my 3 nodes cluster (running Elitedesk 800 G4).
I was planning to ask a question on the forum, but after struggling for more than 2 days on it I have managed to solve it (hopefully).
So now, I just share my solution (with debugging steps) in the hope it will help some of you.

1. Problem description

I run a 3 nodes cluster with each node being identical (HP Elitedesk 800 G4, 16GB memory).
After I have upgraded to 8.2 (for 7.15), I ran into weird "crashes": Some nodes were randomly restarting.
So I look into the forum and all, but could not find a remedy.

2. Problem identification

I was thinking it might be linked to a power issue, but since it was happening on the 3 nodes randomly, I could not have 3 different issues with transformers.
To clear the power management issue, I decided to reduce the load on 1 server: transfered the instances that was running on it on the 2 different nodes and look at what happened.

Result: this node was running not longer than 20 minutes until reboot. Nothing in the logs, nothing in the kernel logs, nothing in the journctl...
Looked everywhere. But it was clearly linked to power management.


3. Problem solution

In my particular case, I had to change the max cstate to 7 in nano /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on intel_idle.max_cstate=7 i915.enable_dc=0 ahci.mobile_lpm_policy=1"

Then do a update-grub, followed by proxmox-boot-tool refresh and then reboot.

In my case, the 20 minutes reboot server is now online for more than 24H.

I hope it helps
 

Attachments

  • chrome_iHtKm8GLDp.png
    chrome_iHtKm8GLDp.png
    82.5 KB · Views: 16
  • chrome_76SHjIM549.png
    chrome_76SHjIM549.png
    40.6 KB · Views: 16
Hey man, just wanted to say that I really appreciate this post. I had exactly the same issue on a HP Elitedesk 800 G4 and spent hours trying to resolve the issue to no success. Then I did exactly what you described above and the problem was gone! Thanks a lot for making this guide on how to resolve the issue :cool:
 
Hey man, just wanted to say that I really appreciate this post. I had exactly the same issue on a HP Elitedesk 800 G4 and spent hours trying to resolve the issue to no success. Then I did exactly what you described above and the problem was gone! Thanks a lot for making this guide on how to resolve the issue :cool:
No pb.
After I spent so many hours pulling my hairs out, I figured that maybe somebody else could use this little bit of feedback.
 
  • Like
Reactions: Kingneutron
I had the same issue with my G4. None of the GRUB boot options helped. It may not apply here, but if you've installed "lm-sensors" and/or Netdata, try removing both. My overheating and crashing issues immediately stopped once I uninstalled those 2 packages.
 
If you're like me and having this issue and want to know more about what these settings do, this thread has more info about the issue in it.

Looks like the suggested grub defaults above came from here in the Arch Wiki and if you read the details for each setting, not all of them may apply to your system. All those settings will result in more power usage so it may be worth finding out if one of the more targeted fixes works for you.

Based on the thread it seems this would be a more targeted fix for our specific issue with the HP G4s.

GRUB_CMDLINE_LINUX_DEFAULT="quiet i915.enable_dc=0"

There's also someone who claimed disabling the pcie power management in the bios has the same effect. I'm going to test both this grub setting and the bios setting to see if they resolve it for me as I have 2 out of 4 G4s with this issue popping up randomly.
 
Last edited: