How can I troubleshoot a freeze?

arboledax1

New Member
Oct 2, 2022
3
0
1
Hi,

I've got a NUC 12 running latest stable community. I've experienced about 5 freezes in the last week where server becomes unresponsive. It won't ping, and the video output is just black/blank. In these situations I've just pulled its power then restore power and boot. I've looked at the syslog exposed in the web interface and previous to the freeze, I see no clues.

I'm hoping there might be some additional logging I could look at, not sure whether I just don't know where to look for additional logging or if I need to enable it first.

Hoping I can resolve this in one of these ways:

A - with some community input to isolate the error

B - purchasing an additional NUC 12 but with different RAM/SSD, move VMs to the new machine, and then...? Maybe reinstall proxmox onto the first and see if the problem goes away or not. Perhaps swap parts between the two until I can deduce that (speculating here) it only happens with a particular RAM dimm or something

C - purchase a support license as I'm sure they could help me get to the bottom of it

Any suggestions?
 
One other related question - I thought I'd run memtest but when I choose memtest in the boot menu the display output just goes black/blank. Is there something obvious I'm missing to run a memtest? I could put memtest on a bootable USB drive or whatever but I thought I'd ask first if I was missing something for running memtest from the proxmox boot menu...
 
I’m leaning towards the following:

1 - will put memtest on a bootable thumb drive and run that.
2 - regardless, I’m ordering an additional Nuc12 to help rule things out. It’s nice to have a spare with same specs anyway.

if anyone has tips on different logs I could enable and look it, please give some direction. Much appreciated!
 
Freezes are tough in my opinion.
You often don't get the real source of them into logs.
I'd start with memory pressure and see if you can find hints on that.
 
try set idle cstate to max and disable mitigations temporarly
Code:
#if not zfs install 
nano /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="intel_idle.max_cstate=1 processor.max_cstate=1 mitigations=off "
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!