Hello,
I have made a fresh install of pve 7.4 on a mini pc, and I am experiencing random reboots; here is an example from last night:
I manually rebooted the machine yesterday at 23:13, but then it rebooted itself three times during the night.
Here is an excerpt of
I do not see anything odd.
Things I have tried/noticed so far:
I have a samsung ssd which apparently might not play nice with linux, spewing errors like
I suspected a PSU/cpu related hw failure, so I ran a few stress tests with
One thing I am leaving out at the moment is that I get ACPI errors at boot, such as
As I understand this can happen, and since all the hw I need is working I do not think this is worth pursuing.
I guess my next step is launching a memtest - I will do that and report back, but is there anything else anyone has to suggest?
Thanks!
V
I have made a fresh install of pve 7.4 on a mini pc, and I am experiencing random reboots; here is an example from last night:
Code:
> last -xF reboot shutdown | head
reboot system boot 5.15.102-1-pve Wed Mar 29 06:25:50 2023 still running
reboot system boot 5.15.102-1-pve Wed Mar 29 06:15:02 2023 still running
reboot system boot 5.15.102-1-pve Wed Mar 29 04:00:29 2023 still running
reboot system boot 5.15.102-1-pve Tue Mar 28 23:13:16 2023 still running
shutdown system down 5.15.102-1-pve Tue Mar 28 23:13:00 2023 - Tue Mar 28 23:13:16 2023 (00:00)
[more lines cut]
I manually rebooted the machine yesterday at 23:13, but then it rebooted itself three times during the night.
Here is an excerpt of
/var/log/syslog
:
Code:
Mar 29 06:15:08 mars systemd[1]: Startup finished in 4.159s (firmware) + 6.842s (loader) + 3.684s (kernel) + 6.008s (userspace) = 20.695s.
Mar 29 06:15:14 mars chronyd[888]: Selected source 37.247.53.178 (2.debian.pool.ntp.org)
Mar 29 06:15:14 mars chronyd[888]: System clock TAI offset set to 37 seconds
Mar 29 06:15:33 mars systemd[1]: systemd-fsckd.service: Succeeded.
Mar 29 06:17:01 mars CRON[1387]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Mar 29 06:25:51 mars kernel: [ 0.000000] Linux version 5.15.102-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.102-1 (2023-03-14T13:48Z) ()
I do not see anything odd.
Things I have tried/noticed so far:
I have a samsung ssd which apparently might not play nice with linux, spewing errors like
failed command: READ FPDMA QUEUED
. I have found a "fix" which is echo 1 > /sys/block/sda/device/queue_depth
, basically turning off the queue feature for the SSD. This is applied at each boot via cron, and since I enabled this fix the SSD errors disappeared, but the reboots stayed.I suspected a PSU/cpu related hw failure, so I ran a few stress tests with
stress-ng -a 0 --class cpu --metrics --timeout 60
, but the thing barely gets up to 60 degC and does not reboot.One thing I am leaving out at the moment is that I get ACPI errors at boot, such as
Code:
ACPI BIOS Error (bug): Could not resolve symbol [\_SB.UBTC.RUCC], AE_NOT_FOUND (20210730/psargs-330)
I guess my next step is launching a memtest - I will do that and report back, but is there anything else anyone has to suggest?
Thanks!
V