Proxmox shutdown last night, again.

thusband

Member
Jun 30, 2022
117
5
18
I have Proxmox running on an Intel NUC with just one VM for Home Assistant and last night it shut down. It's done this a couple of times before for, what appears to be, no reason. It's been several weeks since the last time and I've never been able to determine the cause. The NUC's power seems to be on but Proxmox has shut down. I have to turn the NUC off and on to get Proxmox going again. Here's the Syslog from the time of shutdown. Can anything be determined from this?

Any hints greatly appreciated.
Jun 11 14:17:01 pve CRON[748312]: pam_unix(cron:session): session closed for user root Jun 11 14:36:28 pve smartd[605]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 113 to 112 Jun 11 15:06:28 pve systemd[1]: Starting Cleanup of Temporary Directories... Jun 11 15:06:28 pve systemd[1]: systemd-tmpfiles-clean.service: Succeeded. Jun 11 15:06:28 pve systemd[1]: Finished Cleanup of Temporary Directories. Jun 11 15:17:01 pve CRON[757801]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Jun 11 15:17:01 pve CRON[757802]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jun 11 15:17:01 pve CRON[757801]: pam_unix(cron:session): session closed for user root Jun 11 16:17:01 pve CRON[767221]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Jun 11 16:17:01 pve CRON[767222]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jun 11 16:17:01 pve CRON[767221]: pam_unix(cron:session): session closed for user root Jun 11 17:17:01 pve CRON[776417]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Jun 11 17:17:01 pve CRON[776418]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jun 11 17:17:01 pve CRON[776417]: pam_unix(cron:session): session closed for user root Jun 11 17:36:28 pve smartd[605]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 112 to 111 Jun 11 18:17:01 pve CRON[785613]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Jun 11 18:17:01 pve CRON[785614]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jun 11 18:17:01 pve CRON[785613]: pam_unix(cron:session): session closed for user root Jun 11 19:17:01 pve CRON[794885]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Jun 11 19:17:01 pve CRON[794886]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jun 11 19:17:01 pve CRON[794885]: pam_unix(cron:session): session closed for user root Jun 11 20:17:01 pve CRON[804248]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Jun 11 20:17:01 pve CRON[804249]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jun 11 20:17:01 pve CRON[804248]: pam_unix(cron:session): session closed for user root Jun 11 21:17:01 pve CRON[813556]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0) Jun 11 21:17:01 pve CRON[813557]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jun 11 21:17:01 pve CRON[813556]: pam_unix(cron:session): session closed for user root -- Reboot -- Jun 12 04:52:00 pve kernel: Linux version 5.15.107-2-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.107-2 (2023-05-10T09:10Z) () Jun 12 04:52:00 pve kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.107-2-pve root=/dev/mapper/pve-root ro quiet
 
There is nothing in the log to go on, except that is definitively not a normal graceful shutdown.
Maybe Proxmox could not log the error because there was a problem with the drive (controller)? If so, you could try setting up remote logging to another system on your network.
It could be a hardware issue but replacing parts to find out will take very long if it only happens every few weeks. Maybe run a memtest to make sure? Maybe update BIOS to the latest version?
Could it be an external factor like a very short power interruption or main grid voltage drop?
 
There is nothing in the log to go on, except that is definitively not a normal graceful shutdown.
Maybe Proxmox could not log the error because there was a problem with the drive (controller)? If so, you could try setting up remote logging to another system on your network.
It could be a hardware issue but replacing parts to find out will take very long if it only happens every few weeks. Maybe run a memtest to make sure? Maybe update BIOS to the latest version?
Could it be an external factor like a very short power interruption or main grid voltage drop?
Thanks, that's what I was afraid of. Same as last time when this happened a few months ago. I thought it was a one off thing since it hasn't happened again until last night.

If it was a power issue wouldn't I see that in other devices? Like a few Raspberry Pis I have and even the clock on the stove?

Good suggestions on the memtest and bios. I'm not sure how to do that but will investigate.

Again, thanks a lot.
 
Thanks, that's what I was afraid of. Same as last time when this happened a few months ago. I thought it was a one off thing since it hasn't happened again until last night.
Maybe temperature plays a parts as summer is approaching (on the Northern hemisphere)? A crash once is a whole might not be seem like a big issue but it can cause silent disk corruption (without ZFS or BTRFS) and that can cause more subtle problems.
If it was a power issue wouldn't I see that in other devices? Like a few Raspberry Pis I have and even the clock on the stove?
Only if the power went off. If it was just a dip (or other poor conditioning) then lower powered devices might not be impacted. Anyway, I assume that you do not use a UPS at the moment, which rules out a faulty UPS but not poor grid power (but you can judge this best for your area).
Good suggestions on the memtest and bios. I'm not sure how to do that but will investigate.
The Arch wiki has some tips on stress testing that might help to find a weak system component. And check on the Intel website for a BIOS update and flash instructions.
 
  • Like
Reactions: thusband
I have the exact same issue on a NUC12WSHi3. Really puzzling!

Sep 07 22:34:25 pve-nuc12-1 pmxcfs[1646]: [dcdb] notice: data verification successful
-- Reboot --
Sep 08 12:08:39 pve-nuc12-1 kernel: Linux version 6.2.16-12-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.2.16-12 (2023-09-04T13:21Z) ()

Have you found the root cause by any chance?

Cheers,

D.
 
Unfortunately I haven't. About a month ago someone on Reddit suggested it might be a power management thing and gave me some code to insert in the Proxmox shell,
Code:
systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target

shutdown -r now
I really thought it solved the shutdowns but about a week ago it shut down again. I ran a Memtest86 for 9 passes without any failures so It's not memory. I've started to look around for another device to install Proxmox (maybe a Beelink) as I Don't think I'll ever get to the bottom of the problem on this NUC.
 
Hummm, it does look like an electric issue, the PSU shutting down (safety or something).

It is heatwave at the moment in France but still, temperatures are far from the limits (don't mind the silly ones for NVMe, my 990 Pro isn't recognised properly).

1694173476781.png
 
It looks like it's finally resolved. I went with a new USB powered drive. It hasn't shut down in a couple of months now.
So basically what you are stating is that the errors had to do with your storage. Did you use ZFS ? Did you have LXC and VMs ?
 
Well, I guess that's what I'm saying but I can't specifically pinpoint the HDD. The old drive wasn't USB powered, this one is. I don't use ZFS and only have one VM. No LXC.
 
Im having this problem too.
Have proxmox installed on a NUC.
When multiple VMS are fired up, the thing overheats and turns off.

Are people saying a driver update to the NUC fixes this? Sounds like a proxmox software bug/driver related perhaps?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!