Proxmox freeze on Intel NUC

jdruwe

New Member
Mar 13, 2020
18
1
3
30
Helly everyone, I am experiencing freezes (2 in the last 2 days) that require me to reboot the Intel NUC (https://www.intel.com/content/www/us/en/products/boards-kits/nuc/kits/nuc8i3beh.html) manually. I can see under summary of my node that it happened this time at 20:20 in the evening. The Syslog shows logs starting from after the manual reboot. I have no idea where to start, can I enable logging of some sort? I noticed that there is a new bios version for the nuc, is it worth trying?


Screenshot 2020-04-10 at 20.53.13.png
Screenshot 2020-04-10 at 20.50.16.png
Screenshot 2020-04-10 at 20.50.04.png
Screenshot 2020-04-10 at 20.50.02.png
 
some questions / pointers below.

(1) Was this a one-off or happening multiple times?
(2) Anything relevant in the Syslog before the freeze?
(3) What was on the screen when the NUC froze? Did you manage to capture a screenshot/photo?
(4) Have you tested the RAM in the NUC using memtest?
(5) I’d update the BIOS to the latest version
(6) i was experiencing some kernel panics / freezes on my NUC8i5beh last year but recent bios / Proxmox / kernel updates have made the NUC very stable. Given my NUC was headless, I wasn’t able to see what was on the screen at the point of kernel panic so I wrote a guide at the following thread to capture the kernel panic logs on a remote server using netconsole:

https://forum.proxmox.com/threads/proxmox-host-random-freeze.54721/post-252772
 
Thank you so much for your detailed response, here are my answers:

(1) Was this a one-off or happening multiple times?
It happened 1 time before

(2) Anything relevant in the Syslog before the freeze?
No not really, I did see a lot of:

Code:
Started Proxmox VE replication runner.

So I masked that service as I have no replication set up at the moment.

(3) What was on the screen when the NUC froze? Did you manage to capture a screenshot/photo?
I did not manage :( There is no screen connected it's headless, I just connect to the admin interface on my macbook.

(4) Have you tested the RAM in the NUC using memtest?
93129923_1633206743493437_561146041387188224_n.jpg

It's still running atm but this is the result after 2 passes, seems fine.

(5) I’d update the BIOS to the latest version
I just updated to the latest version and i haven't had any crash since (been running now for straight 10 hours I think)

(6) i was experiencing some kernel panics / freezes on my NUC8i5beh last year but recent bios / Proxmox / kernel updates have made the NUC very stable. Given my NUC was headless, I wasn’t able to see what was on the screen at the point of kernel panic so I wrote a guide at the following thread to capture the kernel panic logs on a remote server using netconsole:
https://forum.proxmox.com/threads/proxmox-host-random-freeze.54721/post-252772

I would like to that if the error still occurs but I don't have server laying around, would a macbook or a windows machine also work?

Is there a difference between Proxmox and kernel updates? Or by kernel do you mean those on the VM's?
 
I would like to that if the error still occurs but I don't have server laying around, would a macbook or a windows machine also work?

Is there a difference between Proxmox and kernel updates? Or by kernel do you mean those on the VM's?

Hi! You could run a VM on the windows machine using VirtualBox and use that VM as the netconsole target but obviously the VM needs to be running and awaiting a kernel panic on the Proxmox node.

In terms of differences between Proxmox and kernel updates , the kernel updates are just a subset of the Proxmox updates but yes I meant that clicking “Update” in the web ui and updating Proxmox generally may have also solved my previous issue
 
Hi! You could run a VM on the windows machine using VirtualBox and use that VM as the netconsole target but obviously the VM needs to be running and awaiting a kernel panic on the Proxmox node.

In terms of differences between Proxmox and kernel updates , the kernel updates are just a subset of the Proxmox updates but yes I meant that clicking “Update” in the web ui and updating Proxmox generally may have also solved my previous issue

Thanks for the information, I'll run a VM on my windows machine if I experience such a crash again even after all the updates I did (bios and now th proxmox provided updates via the UI). Did you disable the c6 state when you were having issues as @Dark26 mentioned or did you keep it as it was and the issue resolved without touching it?
 
I can’t recall whether I changed anything with the cstate in the BIOS - I know I looked into it at the time but I don’t recall amending anything in the BIOS specifically for this.

If you can point me to the cstate settings in the BIOS, I can tell you what mine are set to...
 
I can’t recall whether I changed anything with the cstate in the BIOS - I know I looked into it at the time but I don’t recall amending anything in the BIOS specifically for this.

If you can point me to the cstate settings in the BIOS, I can tell you what mine are set to...

I am unable to find the setting, I did find this response from an Intel employee:

"There is no such an option in BIOS that explicitly says: C states or P States -disable or enable-. The only options available that could help are performance and power related tools but they won't necessarily disable any of these states.

I would recommend that you check at the operating system level. There are a couple of commands available for Linux* operating systems and for Windows* it usually depends on how "Power Options" à "Advanced Settings" à "Minimum and Maximum processor state" settings are configured.

This is also third party tools that will show you disabled/enable C and P States. This could be useful."
 
Sorry about opening the topic 3years later but I didn't expect to run into the SAME issues in 2023 but here we go...Maybe it helps anybody here:

Had freezes on Intel NUC 11 Performance Lite (NUC11PAHi50Z02) every 1-3days. Really terrible timespan to test and debug when you have to wait several days to see if that helped. Tried newer BIOS, different RAM but no success.

Strange thing, when the CPU load was non-idle (around 20% or higher) the system was 100% stable.
Anyway what helped me: disable ANY mention of power savings in BIOS
Unchecked "balanced/low/max power" and set intel dynamic power technology - OFF. Also disabled failsafe watchdog.

As I said it is hard to evaluate the actions if you need to wait several days, I disabled more stuff in "one go": hyper-threading and turbo boost + disabled unused stuff like bluetooth, wifi, wireless charging and infrared. although I think the main issue was with the above mentioned power saving or watchdog doing crazy stuff

Btw can'T explain why but when I disabled card reader the proxmox didn't have access to network. So I left this one enabled :D

Now the system is running 2months with 0 issues!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!