mystery reboots

boxieee

Member
Feb 16, 2023
45
3
8
Sep 02 06:03:57 Prox1 systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities...
Sep 02 06:03:57 Prox1 systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Sep 02 06:03:57 Prox1 systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities.
Sep 02 06:17:01 Prox1 CRON[410326]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 02 06:17:01 Prox1 CRON[410327]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Sep 02 06:17:01 Prox1 CRON[410326]: pam_unix(cron:session): session closed for user root
Sep 02 06:25:01 Prox1 CRON[411831]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 02 06:25:01 Prox1 CRON[411832]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; })
Sep 02 06:25:01 Prox1 CRON[411831]: pam_unix(cron:session): session closed for user root
Sep 02 07:17:01 Prox1 CRON[421577]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 02 07:17:01 Prox1 CRON[421578]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Sep 02 07:17:01 Prox1 CRON[421577]: pam_unix(cron:session): session closed for user root
Sep 02 08:17:01 Prox1 CRON[432831]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 02 08:17:01 Prox1 CRON[432832]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Sep 02 08:17:01 Prox1 CRON[432831]: pam_unix(cron:session): session closed for user root
-- Reboot --
Sep 02 08:21:55 Prox1 kernel: Linux version 6.8.4-2-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-2 (2024-04-10T17:36Z) ()




ep 03 15:26:46 Prox1 smartd[1213]: Device: /dev/sdd [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 107 to 108
Sep 03 16:12:45 Prox1 systemd[1]: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories...
Sep 03 16:12:45 Prox1 systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Sep 03 16:12:45 Prox1 systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
Sep 03 16:12:45 Prox1 systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully.
Sep 03 16:17:01 Prox1 CRON[306019]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 03 16:17:01 Prox1 CRON[306020]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Sep 03 16:17:01 Prox1 CRON[306019]: pam_unix(cron:session): session closed for user root
-- Reboot --
Sep 03 16:22:19 Prox1 kernel: Linux version 6.8.4-2-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-2 (2024-04-10T17:36Z) ()
Sep 03 16:22:19 Prox1 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.4-2-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on initcall_blacklist=sysfb_init pcie_aspm=off
Sep 03 16:22:19 Prox1 kernel: KERNEL supported cpus:
 
The issue with running beta software is that you can never know if it is e.g. newest software or your hardware causing this issue (nothing obvious that got flushed onto the drives in the logs).

The typical pointers would be if your issues started only with e.g. latest upgrade, or whether this same hardware was previously running a non-PVE install just fine for quite a while.

It might be helpful to mention what hardware this is run on, entire boot log (i.e. journalctl -b -1 > attach.log), running a memtest (https://www.memtest.org/) and most importantly, trying to run e.g. stable Debian (https://tracker.debian.org/pkg/linux) kernel on the same configuration, or if you want to do hit and miss, simply downloading older (currently v8.1 ISO of PVE and starting with that):
https://enterprise.proxmox.com/iso/
 
The issue with running beta software is that you can never know if it is e.g. newest software or your hardware causing this issue (nothing obvious that got flushed onto the drives in the logs).

The typical pointers would be if your issues started only with e.g. latest upgrade, or whether this same hardware was previously running a non-PVE install just fine for quite a while.

It might be helpful to mention what hardware this is run on, entire boot log (i.e.
The issue with running beta software is that you can never know if it is e.g. newest software or your hardware causing this issue (nothing obvious that got flushed onto the drives in the logs).

The typical pointers would be if your issues started only with e.g. latest upgrade, or whether this same hardware was previously running a non-PVE install just fine for quite a while.

It might be helpful to mention what hardware this is run on, entire boot log (i.e. journalctl -b -1 > attach.log), running a memtest (https://www.memtest.org/) and most importantly, trying to run e.g. stable Debian (https://tracker.debian.org/pkg/linux) kernel on the same configuration, or if you want to do hit and miss, simply downloading older (currently v8.1 ISO of PVE and starting with that):
https://enterprise.proxmox.com/iso/

), running a memtest (https://www.memtest.org/) and most importantly, trying to run e.g. stable Debian (https://tracker.debian.org/pkg/linux) kernel on the same configuration, or if you want to do hit and miss, simply downloading older (currently v8.1 ISO of PVE and starting with that):
https://enterprise.proxmox.com/iso/

20 x 13th Gen Intel(R) Core(TM) i5-13600K (1 Socket)
Kernel Version

Linux 6.8.4-2-pve (2024-04-10T17:36Z)
Boot Mode

EFI
Manager Version

pve-manager/8.2.2/9355359cd7afbae4
 
The issue with running beta software is that you can never know if it is e.g. newest software or your hardware causing this issue (nothing obvious that got flushed onto the drives in the logs).

The typical pointers would be if your issues started only with e.g. latest upgrade, or whether this same hardware was previously running a non-PVE install just fine for quite a while.

It might be helpful to mention what hardware this is run on, entire boot log (i.e. journalctl -b -1 > attach.log), running a memtest (https://www.memtest.org/) and most importantly, trying to run e.g. stable Debian (https://tracker.debian.org/pkg/linux) kernel on the same configuration, or if you want to do hit and miss, simply downloading older (currently v8.1 ISO of PVE and starting with that):
https://enterprise.proxmox.com/iso/
asus z790 motherboard ddr5 64gb (memtested for full day in loops) no issues
simple 3 realtek lancards
apc ups
cooler master 600watts power supply.
app sata cables checked, nvme 1tb and 500gb samsung 980 pro and good so far
power cable replaced
 
The issue with running beta software is that you can never know if it is e.g. newest software or your hardware causing this issue (nothing obvious that got flushed onto the drives in the logs).

The typical pointers would be if your issues started only with e.g. latest upgrade, or whether this same hardware was previously running a non-PVE install just fine for quite a while.

It might be helpful to mention what hardware this is run on, entire boot log (i.e. journalctl -b -1 > attach.log), running a memtest (https://www.memtest.org/) and most importantly, trying to run e.g. stable Debian (https://tracker.debian.org/pkg/linux) kernel on the same configuration, or if you want to do hit and miss, simply downloading older (currently v8.1 ISO of PVE and starting with that):
https://enterprise.proxmox.com/iso/
 

Attachments

Do you enable some kind of monitoring on your server? I remember having periodic reboots of VM because of the monitoring. I installed it via bash script. Uninstalling or re-configuring it helped be back then.
 
Do you enable some kind of monitoring on your server? I remember having periodic reboots of VM because of the monitoring. I installed it via bash script. Uninstalling or re-configuring it helped be back then.
ok i have been having these reboots for a while now and random. i have seen 7 days max this server worked, i reinstalled proxmox recently and better now but prior to that 4 times reboot a day. what i did was removed nvidia card and i have 3 tplink realtek lancards. i removed all lxc and made them into vm. in vm all host cpu to default now and no pcie or usb forwarding. i have one apc ups monitor made in ubuntu in which (on battery) it turns off some vms and off battery it turns them on and at time of the reboot apc has a nmc card, so its log show no power shift to battery, replaced the apc ups as i had spare, check the neutral and relay chatter etc etc
 
root@Prox1:~# last | grep reboot
reboot system boot 6.8.4-2-pve Sat Sep 7 16:55 still running
reboot system boot 6.8.4-2-pve Sat Sep 7 15:41 still running
reboot system boot 6.8.4-2-pve Sat Sep 7 05:29 still running
reboot system boot 6.8.4-2-pve Tue Sep 3 18:37 still running
reboot system boot 6.8.4-2-pve Tue Sep 3 16:22 - 18:36 (02:14)
reboot system boot 6.8.4-2-pve Mon Sep 2 15:56 - 18:36 (1+02:40)
reboot system boot 6.8.4-2-pve Mon Sep 2 12:05 - 15:56 (03:50)
reboot system boot 6.8.4-2-pve Mon Sep 2 10:24 - 12:05 (01:40)
reboot system boot 6.8.4-2-pve Mon Sep 2 08:21 - 10:24 (02:02)
reboot system boot 6.8.4-2-pve Sat Aug 31 18:39 - 10:24 (1+15:44)
reboot system boot 6.8.4-2-pve Wed Aug 28 04:38 - 18:38 (3+13:59)
reboot system boot 6.8.4-2-pve Wed Aug 28 00:23 - 18:38 (3+18:15)
reboot system boot 6.8.4-2-pve Sat Aug 24 13:27 - 18:38 (7+05:10)
reboot system boot 6.8.4-2-pve Fri Aug 23 17:27 - 13:13 (19:45)
reboot system boot 6.8.4-2-pve Fri Aug 23 16:24 - 17:27 (01:03)
reboot system boot 6.8.4-2-pve Fri Aug 23 14:33 - 16:23 (01:50)
reboot system boot 6.8.4-2-pve Wed Aug 21 12:15 - 14:32 (2+02:17)
reboot system boot 6.8.4-2-pve Tue Aug 20 18:29 - 12:14 (17:44)
reboot system boot 6.8.4-2-pve Tue Aug 20 18:07 - 18:23 (00:16)
reboot system boot 6.8.4-2-pve Tue Aug 20 16:33 - 18:03 (01:29)
reboot system boot 6.8.4-2-pve Tue Aug 20 14:59 - 18:03 (03:04)
reboot system boot 6.8.4-2-pve Tue Aug 20 14:18 - 14:58 (00:40)
reboot system boot 6.8.4-2-pve Tue Aug 20 14:12 - 14:15 (00:02)
reboot system boot 6.8.4-2-pve Tue Aug 20 13:24 - 13:55 (00:31)
reboot system boot 6.8.4-2-pve Mon Aug 19 20:05 - 13:23 (17:17)
reboot system boot 6.8.4-2-pve Mon Aug 19 19:31 - 20:03 (00:32)
 
Sep 02 06:03:57 Prox1 systemd[1]: Starting apt-daily-upgrade.service - Daily apt upgrade and clean activities...
Sep 02 06:03:57 Prox1 systemd[1]: apt-daily-upgrade.service: Deactivated successfully.
Sep 02 06:03:57 Prox1 systemd[1]: Finished apt-daily-upgrade.service - Daily apt upgrade and clean activities.
Sep 02 06:17:01 Prox1 CRON[410326]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 02 06:17:01 Prox1 CRON[410327]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Sep 02 06:17:01 Prox1 CRON[410326]: pam_unix(cron:session): session closed for user root
Sep 02 06:25:01 Prox1 CRON[411831]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 02 06:25:01 Prox1 CRON[411832]: (root) CMD (test -x /usr/sbin/anacron || { cd / && run-parts --report /etc/cron.daily; })
Sep 02 06:25:01 Prox1 CRON[411831]: pam_unix(cron:session): session closed for user root
Sep 02 07:17:01 Prox1 CRON[421577]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 02 07:17:01 Prox1 CRON[421578]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Sep 02 07:17:01 Prox1 CRON[421577]: pam_unix(cron:session): session closed for user root
Sep 02 08:17:01 Prox1 CRON[432831]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 02 08:17:01 Prox1 CRON[432832]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Sep 02 08:17:01 Prox1 CRON[432831]: pam_unix(cron:session): session closed for user root
-- Reboot --
Sep 02 08:21:55 Prox1 kernel: Linux version 6.8.4-2-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-2 (2024-04-10T17:36Z) ()
 
ep 03 15:26:46 Prox1 smartd[1213]: Device: /dev/sdd [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 107 to 108
Sep 03 16:12:45 Prox1 systemd[1]: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories...
Sep 03 16:12:45 Prox1 systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Sep 03 16:12:45 Prox1 systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
Sep 03 16:12:45 Prox1 systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully.
Sep 03 16:17:01 Prox1 CRON[306019]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 03 16:17:01 Prox1 CRON[306020]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Sep 03 16:17:01 Prox1 CRON[306019]: pam_unix(cron:session): session closed for user root
-- Reboot --
Sep 03 16:22:19 Prox1 kernel: Linux version 6.8.4-2-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-2 (2024-04-10T17:36Z) ()
Sep 03 16:22:19 Prox1 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.4-2-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on initcall_blacklist=sysfb_init pcie_aspm=off
Sep 03 16:22:19 Prox1 kernel: KERNEL supported cpus:
 
Code:
Sep 07 16:37:08 Prox1 kernel: x86/split lock detection: #AC: CPU 0/KVM/2767 took a split_lock trap at address: 0xfffff80047f179f7

NOT that this is causing your reboots, but you might want to try split_lock_detect=off on your cmdline:
https://pve.proxmox.com/wiki/Host_Bootloader#sysboot_edit_kernel_cmdline

Otherwise, nothing obvious in the log for me.
good idea, i have been seeing this since 8 on this hardware, but will that not just stop reporting thats all ?
 
Code:
Sep 07 16:37:08 Prox1 kernel: x86/split lock detection: #AC: CPU 0/KVM/2767 took a split_lock trap at address: 0xfffff80047f179f7

NOT that this is causing your reboots, but you might want to try split_lock_detect=off on your cmdline:
https://pve.proxmox.com/wiki/Host_Bootloader#sysboot_edit_kernel_cmdline

Otherwise, nothing obvious in the log for me.
i have gone through the logs i couldn't find anything, i hope its not the 13gen bug lol it is 13600K, last thing i can try is the power supply.
my nvr and other home automations run through this. so to test this fully i will need to have a spare pc with at least 8 cores, even my mikrotik chr and truenas scale is on this lol
 
i did put proxmox on surface pro 7 with usb to ethernet, but if i run even 4 cores at full the poor thing heats enough to cook an egg overnight lol
 
Have you tested the RAM? You can use memtest86+ from the Proxmox ISO. I just mention this as a few years back I had a similar issue with a Linux box (not running proxmox) and finally decided to test the RAM. Turns out one module was bad even though it was bought brand new a few months earlier.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!