Why did my Proxmox server reboot?

KirikParty

New Member
Apr 8, 2024
12
1
3
I have a single node running 3 VM's and 2 CT's. Its been mostly stable so far. However this morning the server reboot itself.
Pve version: 8.2.2
Kernel Version: 6.8.4-3

I have pasted the syslog when it rebooted. Why did it reboot?

Code:
May 09 04:17:01 pve-prod2 CRON[1118133]: pam_unix(cron:session): session closed for user root
May 09 04:27:48 pve-prod2 smartd[1509]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 70 to 71
May 09 05:17:01 pve-prod2 CRON[1130434]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 09 05:17:01 pve-prod2 CRON[1130435]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 09 05:17:01 pve-prod2 CRON[1130434]: pam_unix(cron:session): session closed for user root
May 09 06:17:01 pve-prod2 CRON[1142598]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 09 06:17:01 pve-prod2 CRON[1142599]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 09 06:17:01 pve-prod2 CRON[1142598]: pam_unix(cron:session): session closed for user root
May 09 06:20:10 pve-prod2 systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities...
May 09 06:20:10 pve-prod2 systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
May 09 06:20:10 pve-prod2 systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities.
May 09 06:25:01 pve-prod2 CRON[1144192]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 09 06:25:01 pve-prod2 CRON[1144193]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; })
May 09 06:25:01 pve-prod2 CRON[1144192]: pam_unix(cron:session): session closed for user root
May 09 06:27:49 pve-prod2 smartd[1509]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 71 to 67
May 09 06:57:48 pve-prod2 smartd[1509]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 67 to 70
May 09 07:17:01 pve-prod2 CRON[1154644]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 09 07:17:01 pve-prod2 CRON[1154645]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 09 07:17:01 pve-prod2 CRON[1154644]: pam_unix(cron:session): session closed for user root
May 09 07:57:48 pve-prod2 smartd[1509]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 70 to 69
May 09 08:17:01 pve-prod2 CRON[1166734]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 09 08:17:01 pve-prod2 CRON[1166735]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 09 08:17:01 pve-prod2 CRON[1166734]: pam_unix(cron:session): session closed for user root
May 09 08:27:49 pve-prod2 smartd[1509]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 69 to 67
May 09 08:57:48 pve-prod2 systemd[1]: Starting apt-daily.service - Daily apt download activities...
May 09 08:57:48 pve-prod2 systemd[1]: apt-daily.service: Deactivated successfully.
May 09 08:57:48 pve-prod2 systemd[1]: Finished apt-daily.service - Daily apt download activities.
May 09 09:17:01 pve-prod2 CRON[1178892]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
May 09 09:17:01 pve-prod2 CRON[1178893]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 09 09:17:01 pve-prod2 CRON[1178892]: pam_unix(cron:session): session closed for user root
May 09 09:57:48 pve-prod2 smartd[1509]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 67 to 69
-- Reboot --
May 09 10:06:03 pve-prod2 kernel: Linux version 6.8.4-3-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-3 (2024-05-02T11:55Z) ()
May 09 10:06:03 pve-prod2 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.4-3-pve root=/dev/mapper/pve-root ro quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction vfio_iommu_type1.allow_unsafe_interrupts=1
May 09 10:06:03 pve-prod2 kernel: KERNEL supported cpus:
May 09 10:06:03 pve-prod2 kernel:   Intel GenuineIntel
May 09 10:06:03 pve-prod2 kernel:   AMD AuthenticAMD
May 09 10:06:03 pve-prod2 kernel:   Hygon HygonGenuine
May 09 10:06:03 pve-prod2 kernel:   Centaur CentaurHauls
May 09 10:06:03 pve-prod2 kernel:   zhaoxin   Shanghai 
May 09 10:06:03 pve-prod2 kernel: BIOS-provided physical RAM map:
May 09 10:06:03 pve-prod2 kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
May 09 10:06:03 pve-prod2 kernel: BIOS-e820: [mem 0x00000000000a0000-0x00000000000fffff] reserved
May 09 10:06:03 pve-prod2 kernel: BIOS-e820: [mem 0x0000000000100000-0x0000000009bfefff] usable
May 09 10:06:03 pve-prod2 kernel: BIOS-e820: [mem 0x0000000009bff000-0x0000000009ffffff] reserved
May 09 10:06:03 pve-prod2 kernel: BIOS-e820: [mem 0x000000000a000000-0x000000000a1fffff] usable
 
There is no clue in the log, which is not uncommon if the power failed or the disk that keeps the log (temporarily) disconnected.
Often it's a (temporary) hardware issue or rare memory corruption or overheating or otherwise stressing the hardware a little too much.
Since you use PCI passthrough, it could also be caused by a VM with passthrough, so check their logs.
You could start replacing hardware parts to see if it make a difference but this usually only works when the issue is (easily) reproducible.
 
Thank you. It's not the first two reasons for sure.

I have been trying to lower the power consumption and have played around with powertop a bit on this machine. Would that cause a reboot without any logs as well?
 
I have been trying to lower the power consumption and have played around with powertop a bit on this machine. Would that cause a reboot without any logs as well?
Undervolting the CPU could easily cause a reset because its then has less margins. Like overclocking, you might have made the system unstable in some conditions (which might have occurred and triggered the reboot). Maybe stress-test your setup with a Linux Live USB that specializes for this (which I assume exist) to make sure the hardware is stable?

Please be aware that Proxmox is intended for enterprise hardware not small form-factor energy efficiency, but there are some threads of people having success in this area.
 
Undervolting the CPU could easily cause a reset because its then has less margins. Like overclocking, you might have made the system unstable in some conditions (which might have occurred and triggered the reboot). Maybe stress-test your setup with a Linux Live USB that specializes for this (which I assume exist) to make sure the hardware is stable?

Please be aware that Proxmox is intended for enterprise hardware not small form-factor energy efficiency, but there are some threads of people having success in this area.
Thank you.
I was testing transcoding a video and it crashed as well. So might as well not be stable when stress testing the system. I will look into the Live USB.
I haven't really undervolved CPU/iGPU. I use a AMD system and have just used the Curve Optimiser in PBO.
I however had Deep Sleep enabled in the BIOS. I have diabled it and will see if it stays stable.

Transcoding a video worked after this. Will need to stress test the CPU and GPU and see if it works.
I will report back.
 
I haven't really undervolved CPU/iGPU. I use a AMD system and have just used the Curve Optimiser in PBO.
Curve Optimizer gives more performance with the same power, which is similar to undervolting or overclocking w.r.t. stability.
I however had Deep Sleep enabled in the BIOS. I have diabled it and will see if it stays stable.
I would not expect that to make any difference, but maybe Deep Sleep is not what I think it is.
I will report back.
None of this is Proxmox or even Linux specific, but other people might be interested.
 
Last edited: