Proxmox shuts down (almost) every night

JD_EV9

New Member
Oct 24, 2024
6
2
3
Hey guys,

Almost every night my proxmox kills every process on the computer. What i mean with this is that it seems like the computer is shutting down, but the power button keeps burning and i have tot force shutdown the computer before i can boot up again.

I am using a
Dell optiplex 3060
i5 8500
32 gb ram
intel UHD Graphics 630

This is my log output before shutting down:

Oct 19 04:05:44 prxmx kernel: EXT4-fs (loop1): unmounting filesystem e9b38ee5-b13c-4423-b57b-09666e4a7b11.
Oct 19 04:05:44 prxmx pvescheduler[602473]: INFO: Finished Backup of VM 107 (00:01:22)
Oct 19 04:05:44 prxmx pvescheduler[602473]: INFO: Starting Backup of VM 112 (lxc)
Oct 19 04:06:35 prxmx pvescheduler[602473]: INFO: Finished Backup of VM 112 (00:00:51)
Oct 19 04:06:35 prxmx pvescheduler[602473]: INFO: Backup job finished successfully
Oct 19 04:17:01 prxmx CRON[616304]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 19 04:17:01 prxmx CRON[616305]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 19 04:17:01 prxmx CRON[616304]: pam_unix(cron:session): session closed for user root
Oct 19 04:28:56 prxmx smartd[649]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 59 to 63
Oct 19 05:17:01 prxmx CRON[633866]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 19 05:17:01 prxmx CRON[633867]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 19 05:17:01 prxmx CRON[633866]: pam_unix(cron:session): session closed for user root
Oct 19 05:58:56 prxmx smartd[649]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 104 to 105
Oct 19 06:17:01 prxmx CRON[651507]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 19 06:17:01 prxmx CRON[651508]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 19 06:17:01 prxmx CRON[651507]: pam_unix(cron:session): session closed for user root
Oct 19 06:22:01 prxmx systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities...
Oct 19 06:22:01 prxmx systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Oct 19 06:22:01 prxmx systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities.
Oct 19 06:25:01 prxmx CRON[653931]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 19 06:25:01 prxmx CRON[653932]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; })
Oct 19 06:25:01 prxmx CRON[653931]: pam_unix(cron:session): session closed for user root
Oct 19 07:17:01 prxmx CRON[669190]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 19 07:17:01 prxmx CRON[669191]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 19 07:17:01 prxmx CRON[669190]: pam_unix(cron:session): session closed for user root
Oct 19 08:17:01 prxmx CRON[686776]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 19 08:17:01 prxmx CRON[686777]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 19 08:17:01 prxmx CRON[686776]: pam_unix(cron:session): session closed for user root
Oct 19 08:28:56 prxmx smartd[649]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 63 to 62
Oct 19 09:17:01 prxmx CRON[704360]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 19 09:17:01 prxmx CRON[704361]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 19 09:17:01 prxmx CRON[704360]: pam_unix(cron:session): session closed for user root
Oct 19 09:28:56 prxmx smartd[649]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 62 to 63
Oct 19 09:40:15 prxmx systemd[1]: Starting man-db.service - Daily man-db regeneration...
Oct 19 09:40:15 prxmx systemd[1]: man-db.service: Deactivated successfully.
Oct 19 09:40:15 prxmx systemd[1]: Finished man-db.service - Daily man-db regeneration.
Oct 19 10:17:01 prxmx CRON[721889]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 19 10:17:01 prxmx CRON[721890]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 19 10:17:01 prxmx CRON[721889]: pam_unix(cron:session): session closed for user root
Oct 19 11:17:01 prxmx CRON[739422]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 19 11:17:01 prxmx CRON[739423]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Oct 19 11:17:01 prxmx CRON[739422]: pam_unix(cron:session): session closed for user root
-- Reboot --
 
Oct 19 05:58:56 prxmx smartd[649]: Device: /dev/sdb [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 104 to 105
105 °C ??
Check your cooling.
With such temperatures the BIOS might shut down something, if it monitors the disk temps. Or other parts of the mainboard are too hot.
 
  • Like
Reactions: Johannes S
Thanks for your reaction!
i thought this was a strange value as well, but it is printing the normalised value instead of the actual value.
See smart values:
 

Attachments

  • smart.PNG
    smart.PNG
    29 KB · Views: 14
  • Like
Reactions: UdoB
Looking at the log, the shutdown is indistinguishable from a (wall) power failure and give no clue. Overclocking or a weak/old PSU can cause a power-dip and freeze like you describe. Make sure the BIOS settings are conservative/defaults and maybe try a different/newer PSU?
 
@leesteken , thanks for your response.
i could look into that.
I am gonna swap the hdd for a ssd one of these day to rule out the system isn't getting to hot and upgrade the BIOS version (checked and saw it wasn't up-to-date).

The weird thing is i had this issue before, but nog for a few months. And now its is crashing almost every night.
 
Yeah good question, i didn't do anything.
I will try if a other power group makes any changes, and maybe go looking for a newer psu.
 
Update:
Eventually I found some logging which I reoccuring around the time of the crashes:
Oct 30 07:34:31 prxmx kernel: x86/cpu: SGX disabled by BIOS.

So i enabled this option (instead of software defined) in the BIOS and for now I will have to wait if it was the solution.
 
Oct 30 07:34:31 prxmx kernel: x86/cpu: SGX disabled by BIOS.

So i enabled this option (instead of software defined) in the BIOS and for now I will have to wait if it was the solution.
SGX has lots of vulnerabilities and might be best left disabled. I cannot imagine that it's related to the crashes or will change anything, but please let us know if it helps.
I would test with less and/or other RAM, a different PSU, a different motherboard and a different CPU in that order. Do you see anything on the display just before it crashes (and might not be in the logs)?