Crash random proxmox

frenk970 · Mar 20, 2022

hello,
I have random reboot of proxmox and I don't understand what caused it, this is the log before the crash: https://pastebin.com/k8mHhtTa , can you help me understand?

proxmox always reboots with that kind of log strings before reboot is written

Dunuin · Mar 20, 2022

I don't see anything special. It just runs the hourly crons before the reboot. You could check if there is anything in "/etc/cron.hourly" that might cause troubles.

frenk970 · Mar 20, 2022

Dunuin said:
I don't see anything special. It just runs the hourly crons before the reboot. You could check if there is anything in "/etc/cron.hourly" that might cause troubles.

https://pastebin.com/Z0PFVkWL

frenk970 · Mar 20, 2022

Dunuin said:
I don't see anything special. It just runs the hourly crons before the reboot. You could check if there is anything in "/etc/cron.hourly" that might cause troubles.

but the reboot always occurs in conjunction with those lo strings

frenk970 · Mar 20, 2022

I have a VM with GPU passthrough, could it be that the GPU is requested and crashes the system since it is busy?
however I don't see problems of this type in the log and even problems of low RAM if that were not enough for ProxMox (I have occupied 18GB/32GB)

Dunuin · Mar 20, 2022

Did you check with memtest86+ if your RAM is healthy?
Did you check if the PSU is healthy? For example running a GPU + CPU benchmark using alot of electricity doesn't cause reboots?
Most of the time when a system is unstable its one of those two.

frenk970 · Mar 20, 2022

Dunuin said:
Did you check with memtest86+ if your RAM is healthy?
Did you check if the PSU is healthy? For example running a GPU + CPU benchmark using alot of electricity doesn't cause reboots?
Most of the time when a system is unstable its one of those two.

no, I hadn't thought about it, I immediately do the recommended tests

Dunuin · Mar 20, 2022

You could also run a long smart selftest (smartctl -t long /dev/yourDisk) to check if your disks are fine. And updaing the BIOS/UEFI might help too if there was a known firmware problem that is fixed meanwhile.
And in case you are using ZFS you could initialize a scrub (zpool scrub rpool) to see if maybe some files of the OS got corrupted.
And you could try the 5.15 kernel instead of the 5.13 in case you are using very modern hardware. That sometimes fixes problems caused by new hardware.

frenk970 · Mar 21, 2022

Dunuin said:
You could also run a long smart selftest (smartctl -t long /dev/yourDisk) to check if your disks are fine. And updaing the BIOS/UEFI might help too if there was a known firmware problem that is fixed meanwhile.
And in case you are using ZFS you could initialize a scrub (zpool scrub rpool) to see if maybe some files of the OS got corrupted.
And you could try the 5.15 kernel instead of the 5.13 in case you are using very modern hardware. That sometimes fixes problems caused by new hardware.

I am doing the disk tests now, but I have checked and there are no BIOS / UEFI updates and they are at the latest version

frenk970 · Mar 21, 2022

Dunuin said:
You could also run a long smart selftest (smartctl -t long /dev/yourDisk) to check if your disks are fine. And updaing the BIOS/UEFI might help too if there was a known firmware problem that is fixed meanwhile.
And in case you are using ZFS you could initialize a scrub (zpool scrub rpool) to see if maybe some files of the OS got corrupted.
And you could try the 5.15 kernel instead of the 5.13 in case you are using very modern hardware. That sometimes fixes problems caused by new hardware.

zpool scrub rpool reported no errors

frenk970 · Mar 21, 2022

test with memtest86+ passed, so I can exclude the RAM, hopefully it's not the power supply.
The disk tests passed them all.
The server looks perfect

frenk970 · Mar 21, 2022

the configuration is this and it is my old gaming PC and I only use it for home tests, could it be that the power supply does not hold up?

Malvada · Dec 28, 2022

I have this exact same issue. The log looks exactly the same aswell.
my machine crashes every night at the same time, the last entries are the Cron entries mentioned above.
I will try to clear my hourly cron jobs and see if that results in any improvement.

mustava · Mar 4, 2023

I believe I have the same issue. Log looks very similar, i never get any error messages but the sys log before the crash/reboot seems the same every time.

Code:

Mar  4 03:03:46 pve pvedaemon[468000]: <root@pam> successful auth for user 'root@pam'
Mar  4 03:05:54 pve pveproxy[481032]: worker exit
Mar  4 03:05:54 pve pveproxy[2988]: worker 481032 finished
Mar  4 03:05:54 pve pveproxy[2988]: starting 1 worker(s)
Mar  4 03:05:54 pve pveproxy[2988]: worker 489205 started
Mar  4 03:06:07 pve pveproxy[2988]: worker 480114 finished
Mar  4 03:06:07 pve pveproxy[2988]: starting 1 worker(s)
Mar  4 03:06:07 pve pveproxy[2988]: worker 489239 started
Mar  4 03:06:10 pve pveproxy[489238]: got inotify poll request in wrong process - disabling inotify
Mar  4 03:06:10 pve pveproxy[489238]: worker exit
Mar  4 03:10:01 pve CRON[489947]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Mar  4 03:12:17 pve pvedaemon[462470]: <root@pam> successful auth for user 'root@pam'
Mar  4 03:14:34 pve pveproxy[482304]: worker exit
Mar  4 03:14:34 pve pveproxy[2988]: worker 482304 finished
Mar  4 03:14:34 pve pveproxy[2988]: starting 1 worker(s)
Mar  4 03:14:34 pve pveproxy[2988]: worker 490743 started
Mar  4 03:17:01 pve CRON[491174]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Mar  4 03:18:30 pve postfix/qmgr[2939]: 425A91F2E3: from=<root@pve.>, size=1223, nrcpt=1 (queue active)
Mar  4 03:18:30 pve postfix/qmgr[2939]: 440A3B0081: from=<root@pve.>, size=1223, nrcpt=1 (queue active)
Mar  4 03:18:31 pve postfix/smtp[491440]: connect to gmail-smtp-in.l.google.com[2404:6800:4003:c05::1a]:25: Network is unreachable
Mar  4 03:18:46 pve pvedaemon[468454]: <root@pam> successful auth for user 'root@pam'
Mar  4 03:19:01 pve postfix/smtp[491440]: connect to gmail-smtp-in.l.google.com[172.253.118.27]:25: Connection timed out
Mar  4 03:19:01 pve postfix/smtp[491439]: connect to gmail-smtp-in.l.google.com[172.253.118.27]:25: Connection timed out
Mar  4 03:19:01 pve postfix/smtp[491439]: connect to gmail-smtp-in.l.google.com[2404:6800:4003:c05::1a]:25: Network is unreachable
Mar  4 03:19:01 pve postfix/smtp[491439]: connect to alt1.gmail-smtp-in.l.google.com[2607:f8b0:400e:c00::1b]:25: Network is unreachable
Mar  4 03:19:31 pve postfix/smtp[491440]: connect to alt1.gmail-smtp-in.l.google.com[173.194.202.26]:25: Connection timed out
Mar  4 03:19:31 pve postfix/smtp[491440]: connect to alt1.gmail-smtp-in.l.google.com[2607:f8b0:400e:c00::1b]:25: Network is unreachable
Mar  4 03:19:31 pve postfix/smtp[491440]: connect to alt2.gmail-smtp-in.l.google.com[2607:f8b0:4023:c0b::1a]:25: Network is unreachable
Mar  4 03:19:31 pve postfix/smtp[491439]: connect to alt1.gmail-smtp-in.l.google.com[173.194.202.26]:25: Connection timed out
Mar  4 03:20:01 pve postfix/smtp[491439]: connect to alt2.gmail-smtp-in.l.google.com[142.250.141.27]:25: Connection timed out
<Crash/Reboot>

Crash random proxmox

frenk970

Well-Known Member

Dunuin

Distinguished Member

frenk970

Well-Known Member

frenk970

Well-Known Member

frenk970

Well-Known Member

Dunuin

Distinguished Member

frenk970

Well-Known Member

Dunuin

Distinguished Member

frenk970

Well-Known Member

frenk970

Well-Known Member

frenk970

Well-Known Member

frenk970

Well-Known Member

Malvada

Member

mustava

Member

We value your privacy