Node being offline out of a sudden - how to investigate

luizromagnoli

New Member
Sep 29, 2025
1
0
1
Hey everyone, this is the first time I'm posting to this forum, even though it has helped me a lot in the past few months.

I'm using Proxmox for about months now. Almost everyday, for no apparent reason, my node becomes unreachable. I can't access the Web, I can't SSH into it nor into any VMs, when I plug an external monitor I get nothing. It's running (cpu cooler is spinning) but totally unresponsive.

I have no idea what's causing the issue. Maybe an SATA expansion PCI card, but not sure. Is there any file or trace that aI can check to investigate what's causing the issue? Any log file?

Any ideas on how to troubleshoot it would be much appreciated.

Thanks in advance
 
Any ideas on how to troubleshoot it would be much appreciated.
There are too many options without additional information. Feel free to add some more technical details regarding your system. (Cluster? Server hardware? Mini-PC? Amount of Ram? ECC Ram? HDD? SSD? ZFS? Number of VMs? PCI-passthrough? ... ... ... )


Try to find hints regarding the crashes in the Journal. If it happened during the previous boot you can look at the end of the relevant journal like this:

Code:
journalctl -b -1  -p warning  -e

For a description of "-b" etc. consult man journalctl. You might post the last few dozen lines or so (depending on your findings) here - in [code]...[/code]-tags, please.

Standard generic hints: check all cables; run memtest86 over night; verify that RAM for your VMs is not overcommited; ...
 
  • Like
Reactions: Johannes S