Regular crashes: Investigation

bolzerrr

Well-Known Member
Apr 11, 2019
58
3
48
43
Hi,
i have regular crashes since i updated to PVE 8.1 and a newer hardware (n100m cpu). It happens between every 2-7 days, too much time between it to actively monitor it but often enough to be very annoying. When it happens system becomes unresponsive, doesn't react upon shutdown button and fan are spinning a bit louder then normal. I don't have a screen attached therefore i cant say if there is any output. I already ran a 6h memtest without any issues.

I would be thankful for any ideas what would be the next step`s to investigate?
Can i increase some loglevels?
Which logs would be the best to investigate?
 
Hello,

Which logs would be the best to investigate?
You can use journalctl as the following command:

Code:
journalctl --since 2024-04-18 | gzip > $(hostname)-syslog.txt.gz
You may have to edit the date above in the above command.
 
Thanks for your hint. I did investigate the log and here is the moment it happened and i hit the reset button. All logs earlier look unsuspicious.


Code:
Apr 17 16:59:40 pve qmeventd[472366]: Starting cleanup for 101
Apr 17 16:59:40 pve qmeventd[472366]: Finished cleanup for 101
Apr 17 16:59:40 pve systemd[1]: 101.scope: Deactivated successfully.
Apr 17 16:59:40 pve systemd[1]: 101.scope: Consumed 11min 40.500s CPU time.
-- Boot 1963fa51284d4f4a9988e08470f07bdb --
Apr 17 18:36:49 pve kernel: Linux version 6.5.13-5-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-5 (2024-04-05T11:03Z) ()
 
n100m cpu
If that "m" after n100 isn't a typo, you're using this ASRock MB with the Intel n100 CPU.

AFAIK that MB uses a fanless design - so I'm assuming when you say:
and fan are spinning a bit louder then normal
you're referring to a fan that you've added (DIY) to either the Case or MB or both.

I must tell you, the minute I started browsing your post - I suspected thermal issues.
 
If that "m" after n100 isn't a typo, you're using this ASRock MB with the Intel n100 CPU.

AFAIK that MB uses a fanless design - so I'm assuming when you say:

you're referring to a fan that you've added (DIY) to either the Case or MB or both.

I must tell you, the minute I started browsing your post - I suspected thermal issues.

You are right about the MB and yes it is fanless. However i prefer cooler hardware thats why i added 2 system fans to the case. I also suspected the temp first, therefore i added a watcher, i would not consider there temperatures critical. Spec of CPU telling critical 105c max.

1713429486523.png

I was also thinking about voltages but i am not very familiar what to look for:


Code:
nct6798-isa-02a0
Adapter: ISA adapter
in0:                     1.15 V  (min =  +0.00 V, max =  +1.74 V)
in1:                     1.66 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                     3.42 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                     3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                   1000.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                   896.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                     1.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                     3.42 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                     3.22 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                     1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in10:                    1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                  992.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                    1.02 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                    1.26 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                    1.50 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
 
i would not consider there temperatures critical
Assuming your image is not from the time of the crash - you don't know that your issue is not thermal centric.

and fan are spinning a bit louder then normal
So I guess they are connected to MB with some sort of temp. sensor/profile. This again is indicative of a thermal issue - but not a proof. (The crashing system, trying to operate will generate higher CPU activity etc.).

Is there anyway you can find the temps. from around the time of a crash?
(If the image is from the time of the crash - & I'd be surprised - Ignore all of the above).

The next thing I would check is the PSU. What are you connecting/running in addition to the MB & CPU. You need to look at EVERYTHING: RAM, HDs, NVMEs, SSD's etc. & yes also those fans. Then inspect the PSU what its capable of producing. Finally even if its capable, it may be faulty.

Happy hunting.

Edit: Looking at the time of your crash from the logs provided, it occurred approx. Apr 17 16:59:40, so I believe you graph does show temps. around time of crash. (Assuming date/time are matched).
 
Last edited:
I am using this PSU: PicoPSU-90 12V DC-DC ATX Mini-ITX 0-90W Netzteil Power Supply
PSU usage is reported:
1713433337727.png
This is the last temp reported just before the crash (the flatline).
1713433017915.png
RAM was already tested. Harddrives will be complicated to test..
 
Here is the next crash.

Code:
Apr 19 20:17:01 pve CRON[148813]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Apr 19 20:17:01 pve CRON[148814]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Apr 19 20:17:01 pve CRON[148813]: pam_unix(cron:session): session closed for user root
-- Reboot --
Apr 20 07:48:42 pve kernel: Linux version 6.5.13-5-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-5 (2024-04-05T11:03Z) ()

Surroundings dont look suspicious
1713600658521.png
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!