How to check (which) logs after crash? And hopefully fix the crash :)

Aussi · Jul 30, 2023

Hi all,

Apparently my Proxmox server crashed during my holidays ( being away from home ... ).
My home automations runs on it aswell ... whoops

I've checked the journal logs of that period but that only seems to log the stuff during power on.....

See here a small part of it where you can see it most have crashed the 19th of july and soem familiy of my rebooted the PC on the 21th
Sadly enough later that week it did crash again.....

So any tips on finding out what cause the crash would be really appreciated.

journalctl --since "2023-07-15 00:00:00" --until "2023-07-28 00:00:00":

Code:

Jul 19 00:00:25 sanderspve pvefw-logger[3977078]: starting pvefw logger
Jul 19 00:00:25 sanderspve systemd[1]: Started pvefw-logger.service - Proxmox VE firewall logger.
Jul 19 00:00:25 sanderspve systemd[1]: logrotate.service: Deactivated successfully.
Jul 19 00:00:25 sanderspve systemd[1]: Finished logrotate.service - Rotate log files.
Jul 19 00:17:01 sanderspve CRON[3981497]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 19 00:17:01 sanderspve CRON[3981498]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jul 19 00:17:01 sanderspve CRON[3981497]: pam_unix(cron:session): session closed for user root
Jul 19 00:44:33 sanderspve smartd[575]: Device: /dev/sda [USB JMicron], SMART Usage Attribute: 194 Temperature_Celsius changed from 153 to 157
Jul 19 01:17:01 sanderspve CRON[3997378]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 19 01:17:01 sanderspve CRON[3997379]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jul 19 01:17:01 sanderspve CRON[3997378]: pam_unix(cron:session): session closed for user root
Jul 19 01:44:33 sanderspve smartd[575]: Device: /dev/sda [USB JMicron], SMART Usage Attribute: 194 Temperature_Celsius changed from 157 to 153
Jul 19 02:17:01 sanderspve CRON[4013297]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 19 02:17:01 sanderspve CRON[4013298]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jul 19 02:17:01 sanderspve CRON[4013297]: pam_unix(cron:session): session closed for user root
Jul 19 02:44:33 sanderspve smartd[575]: Device: /dev/sda [USB JMicron], SMART Usage Attribute: 194 Temperature_Celsius changed from 153 to 157
Jul 19 03:10:01 sanderspve CRON[4027384]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 19 03:10:01 sanderspve CRON[4027385]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Jul 19 03:10:01 sanderspve CRON[4027384]: pam_unix(cron:session): session closed for user root
Jul 19 03:17:01 sanderspve CRON[4029239]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 19 03:17:01 sanderspve CRON[4029240]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jul 19 03:17:01 sanderspve CRON[4029239]: pam_unix(cron:session): session closed for user root
Jul 19 04:17:01 sanderspve CRON[4045163]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 19 04:17:01 sanderspve CRON[4045164]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jul 19 04:17:01 sanderspve CRON[4045163]: pam_unix(cron:session): session closed for user root
Jul 19 05:17:01 sanderspve CRON[4061055]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 19 05:17:01 sanderspve CRON[4061056]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Jul 19 05:17:01 sanderspve CRON[4061055]: pam_unix(cron:session): session closed for user root
-- Boot acb3e26728244b67904a0e01af8ee0ed --
Jul 21 18:55:41 sanderspve kernel: Linux version 6.2.16-3-pve (tom@sbuild) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC>
Jul 21 18:55:41 sanderspve kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.2.16-3-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on
Jul 21 18:55:41 sanderspve kernel: KERNEL supported cpus:
Jul 21 18:55:41 sanderspve kernel:   Intel GenuineIntel
Jul 21 18:55:41 sanderspve kernel:   AMD AuthenticAMD
Jul 21 18:55:41 sanderspve kernel:   Hygon HygonGenuine
Jul 21 18:55:41 sanderspve kernel:   Centaur CentaurHauls
Jul 21 18:55:41 sanderspve kernel:   zhaoxin   Shanghai 
Jul 21 18:55:41 sanderspve kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Jul 21 18:55:41 sanderspve kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Jul 21 18:55:41 sanderspve kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Jul 21 18:55:41 sanderspve kernel: x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
Jul 21 18:55:41 sanderspve kernel: x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
Jul 21 18:55:41 sanderspve kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Jul 21 18:55:41 sanderspve kernel: x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
Jul 21 18:55:41 sanderspve kernel: x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
Jul 21 18:55:41 sanderspve kernel: x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
Jul 21 18:55:41 sanderspve kernel: signal: max sigframe size: 2032
Jul 21 18:55:41 sanderspve kernel: BIOS-provided physical RAM map:
Jul 21 18:55:41 sanderspve kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
Jul 21 18:55:41 sanderspve kernel: BIOS-e820: [mem 0x000000000009f000-0x00000000000fffff] reserved
Jul 21 18:55:41 sanderspve kernel: BIOS-e820: [mem 0x0000000000100000-0x0000000079e09fff] usable
Jul 21 18:55:41 sanderspve kernel: BIOS-e820: [mem 0x0000000079e0a000-0x000000007a26bfff] reserved
Jul 21 18:55:41 sanderspve kernel: BIOS-e820: [mem 0x000000007a26c000-0x000000007a2e8fff] ACPI data
Jul 21 18:55:41 sanderspve kernel: BIOS-e820: [mem 0x000000007a2e9000-0x000000007a371fff] ACPI NVS

leesteken · Jul 30, 2023

Looks like there are no logs on Proxmox about the occurence. Maybe power interruption prevented the logs from being saved to disks. Does your server have IPMI which might have some information? Otherwise, you'll could setup remote logging to another server or watch the physical display (maybe with a webcam) when/if it happens again. Have a look at other threads here about unexplained reboots, maybe they have some other tips and tricks.

Aussi · Jul 30, 2023

Sadly enough I dont have IPMI. Just have a simply Asus main router ....
I only use a headless NUC to run as proxmox server....

I didnt hear any other problems regarding power problems , so I think that wasnt the call.
So there are no other options to be checked , other logs for example?

Other question would be is there a way to auto reboot it when it crashed?

Aussi · Aug 24, 2023

leesteken said:
Looks like there are no logs on Proxmox about the occurence. Maybe power interruption prevented the logs from being saved to disks. Does your server have IPMI which might have some information? Otherwise, you'll could setup remote logging to another server or watch the physical display (maybe with a webcam) when/if it happens again. Have a look at other threads here about unexplained reboots, maybe they have some other tips and tricks.

Leesteken , I've been monitoring now the server for a couple of weeks.

And ofcourse a real crash or hang up hasn't happened since I hook up the physical monitor....

I did see some weird log notification about some hardware being detached or something.

Do you have any idea if this might cause the hang up ?
Attached a photo of the log

leesteken · Aug 24, 2023

Aussi said:
I did see some weird log notification about some hardware being detached or something.

Do you have any idea if this might cause the hang up ?
Attached a photo of the log

I don't know and I cannot see an image. I doubt such unrelated things are connected.

Neobin · Aug 25, 2023

Aussi said:
I did see some weird log notification about some hardware being detached or something.

The e1000e hardware unit hang is a known hardware bug:
https://forum.proxmox.com/threads/help-eno1-detected-hardware-unit-hang.117305/post-507518

Aussi · Aug 25, 2023

leesteken said:
I don't know and I cannot see an image. I doubt such unrelated things are connected.

Correct it was too big at the start. Fixed it later on the day, Sorry

Aussi · Aug 25, 2023

Neobin said:
The e1000e hardware unit hang is a known hardware bug:
https://forum.proxmox.com/threads/help-eno1-detected-hardware-unit-hang.117305/post-507518

Thx ! I'll have a check on that topci !

Aussi · Aug 28, 2023

Ofcourse after removing the physical display to my Proxmox server the server starts crashing again .... pffff

Do you guys maybe see something strange in the syslog

Code:

Aug 28 08:24:00 sanderspve kernel: EXT4-fs (loop1): mounted filesystem 76ed2bee-cd03-40fd-9920-d7d9dbb46918 with ordered data mode. Quota mode: none.
Aug 28 08:24:00 sanderspve audit[1851]: AVC apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-102_</var/lib/lxc>" pid=1851 comm="apparmor_parser"
Aug 28 08:24:00 sanderspve kernel: audit: type=1400 audit(1693203840.982:26): apparmor="STATUS" operation="profile_load" profile="/usr/bin/lxc-start" name="lxc-102_</var/lib/lxc>" pid=1851 comm="apparmor_parser"
Aug 28 08:24:01 sanderspve kernel: vmbr0: port 4(fwpr102p0) entered blocking state
Aug 28 08:24:01 sanderspve kernel: vmbr0: port 4(fwpr102p0) entered disabled state
Aug 28 08:24:01 sanderspve kernel: device fwpr102p0 entered promiscuous mode
Aug 28 08:24:01 sanderspve kernel: vmbr0: port 4(fwpr102p0) entered blocking state
Aug 28 08:24:01 sanderspve kernel: vmbr0: port 4(fwpr102p0) entered forwarding state
Aug 28 08:24:01 sanderspve kernel: fwbr102i0: port 1(fwln102i0) entered blocking state
Aug 28 08:24:01 sanderspve kernel: fwbr102i0: port 1(fwln102i0) entered disabled state
Aug 28 08:24:01 sanderspve kernel: device fwln102i0 entered promiscuous mode
Aug 28 08:24:01 sanderspve kernel: fwbr102i0: port 1(fwln102i0) entered blocking state
Aug 28 08:24:01 sanderspve kernel: fwbr102i0: port 1(fwln102i0) entered forwarding state
Aug 28 08:24:01 sanderspve kernel: fwbr102i0: port 2(veth102i0) entered blocking state
Aug 28 08:24:01 sanderspve kernel: fwbr102i0: port 2(veth102i0) entered disabled state
Aug 28 08:24:01 sanderspve kernel: device veth102i0 entered promiscuous mode
Aug 28 08:24:01 sanderspve kernel: eth0: renamed from vethlLQzjY
Aug 28 08:24:01 sanderspve pve-guests[984]: <root@pam> end task UPID:sanderspve:000003D9:000004A0:64EC3D1D:startall::root@pam: OK
Aug 28 08:24:01 sanderspve pvestatd[940]: modified cpu set for lxc/101: 2-3
Aug 28 08:24:02 sanderspve systemd[1]: Finished pve-guests.service - PVE guests.
Aug 28 08:24:02 sanderspve systemd[1]: Starting pvescheduler.service - Proxmox VE scheduler...
Aug 28 08:24:02 sanderspve pvestatd[940]: status update time (37.596 seconds)
Aug 28 08:24:02 sanderspve pmxcfs[834]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/sanderspve/local-lvm: -1
Aug 28 08:24:02 sanderspve pmxcfs[834]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/sanderspve/local: -1
Aug 28 08:24:02 sanderspve pmxcfs[834]: [status] notice: RRDC update error /var/lib/rrdcached/db/pve2-storage/sanderspve/Backup: -1
Aug 28 08:24:02 sanderspve pvescheduler[1952]: starting server
Aug 28 08:24:02 sanderspve systemd[1]: Started pvescheduler.service - Proxmox VE scheduler.
Aug 28 08:24:02 sanderspve systemd[1]: Reached target multi-user.target - Multi-User System.
Aug 28 08:24:02 sanderspve systemd[1]: Reached target graphical.target - Graphical Interface.
Aug 28 08:24:02 sanderspve systemd[1]: Starting systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP...
Aug 28 08:24:02 sanderspve systemd[1]: systemd-update-utmp-runlevel.service: Deactivated successfully.
Aug 28 08:24:02 sanderspve systemd[1]: Finished systemd-update-utmp-runlevel.service - Record Runlevel Change in UTMP.
Aug 28 08:24:02 sanderspve systemd[1]: Startup finished in 11.367s (firmware) + 5.513s (loader) + 2.903s (kernel) + 1min 49.851s (userspace) = 2min 9.636s.
Aug 28 08:24:02 sanderspve chronyd[769]: Selected source 158.101.216.150 (2.debian.pool.ntp.org)
Aug 28 08:24:06 sanderspve kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Aug 28 08:24:06 sanderspve kernel: fwbr102i0: port 2(veth102i0) entered blocking state
Aug 28 08:24:06 sanderspve kernel: fwbr102i0: port 2(veth102i0) entered forwarding state
Aug 28 08:24:11 sanderspve kernel: overlayfs: fs on '/var/lib/docker/overlay2/check-overlayfs-support594583561/lower2' does not support file handles, falling back to xino=off.
Aug 28 08:24:12 sanderspve kernel: overlayfs: fs on '/var/lib/docker/overlay2/metacopy-check4055778659/l1' does not support file handles, falling back to xino=off.
Aug 28 08:24:13 sanderspve kernel: overlayfs: fs on '/var/lib/docker/overlay2/l/AIUYLNCGUREYXWFUQTBBXWAPAF' does not support file handles, falling back to xino=off.
Aug 28 08:24:13 sanderspve kernel: overlayfs: fs on '/var/lib/docker/overlay2/l/4YDCK6W6KN7QCTVXXB5KBHTLUB' does not support file handles, falling back to xino=off.
Aug 28 08:24:14 sanderspve kernel: overlayfs: fs on '/var/lib/docker/overlay2/l/HMWSMZN4BVACECFXFWE2X2PTMY' does not support file handles, falling back to xino=off.
Aug 28 08:24:14 sanderspve kernel: overlayfs: fs on '/var/lib/docker/overlay2/l/GMJSTKJTYMKWJF277BVDKELBZ2' does not support file handles, falling back to xino=off.
Aug 28 08:24:14 sanderspve kernel: Initializing XFRM netlink socket
Aug 28 08:24:16 sanderspve kernel: overlayfs: fs on '/var/lib/docker/overlay2/l/4YDCK6W6KN7QCTVXXB5KBHTLUB' does not support file handles, falling back to xino=off.
Aug 28 08:24:16 sanderspve kernel: overlayfs: fs on '/var/lib/docker/overlay2/l/HMWSMZN4BVACECFXFWE2X2PTMY' does not support file handles, falling back to xino=off.
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 1(vethd1a468c) entered blocking state
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 1(vethd1a468c) entered disabled state
Aug 28 08:24:16 sanderspve kernel: device vethd1a468c entered promiscuous mode
Aug 28 08:24:16 sanderspve kernel: overlayfs: fs on '/var/lib/docker/overlay2/l/GMJSTKJTYMKWJF277BVDKELBZ2' does not support file handles, falling back to xino=off.
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 1(vethd1a468c) entered blocking state
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 1(vethd1a468c) entered forwarding state
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 1(vethd1a468c) entered disabled state
Aug 28 08:24:16 sanderspve kernel: overlayfs: fs on '/var/lib/docker/overlay2/l/AIUYLNCGUREYXWFUQTBBXWAPAF' does not support file handles, falling back to xino=off.
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 2(veth95066de) entered blocking state
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 2(veth95066de) entered disabled state
Aug 28 08:24:16 sanderspve kernel: device veth95066de entered promiscuous mode
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 2(veth95066de) entered blocking state
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 2(veth95066de) entered forwarding state
Aug 28 08:24:16 sanderspve kernel: IPv6: ADDRCONF(NETDEV_CHANGE): br-7f65d358e342: link becomes ready
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 2(veth95066de) entered disabled state
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 3(veth4abfc55) entered blocking state
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 3(veth4abfc55) entered disabled state
Aug 28 08:24:16 sanderspve kernel: device veth4abfc55 entered promiscuous mode
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 3(veth4abfc55) entered blocking state
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 3(veth4abfc55) entered forwarding state
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 4(vethc07df50) entered blocking state
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 4(vethc07df50) entered disabled state
Aug 28 08:24:16 sanderspve kernel: device vethc07df50 entered promiscuous mode
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 4(vethc07df50) entered blocking state
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 4(vethc07df50) entered forwarding state
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 3(veth4abfc55) entered disabled state
Aug 28 08:24:16 sanderspve kernel: br-7f65d358e342: port 4(vethc07df50) entered disabled state
Aug 28 08:24:18 sanderspve kernel: eth0: renamed from vethd04178c
Aug 28 08:24:18 sanderspve kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethd1a468c: link becomes ready
Aug 28 08:24:18 sanderspve kernel: br-7f65d358e342: port 1(vethd1a468c) entered blocking state
Aug 28 08:24:18 sanderspve kernel: br-7f65d358e342: port 1(vethd1a468c) entered forwarding state
Aug 28 08:24:18 sanderspve kernel: eth0: renamed from veth63c9608
Aug 28 08:24:18 sanderspve kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth95066de: link becomes ready
Aug 28 08:24:18 sanderspve kernel: br-7f65d358e342: port 2(veth95066de) entered blocking state
Aug 28 08:24:18 sanderspve kernel: br-7f65d358e342: port 2(veth95066de) entered forwarding state
Aug 28 08:24:18 sanderspve kernel: eth0: renamed from vetha95f46a
Aug 28 08:24:18 sanderspve kernel: IPv6: ADDRCONF(NETDEV_CHANGE): veth4abfc55: link becomes ready
Aug 28 08:24:18 sanderspve kernel: br-7f65d358e342: port 3(veth4abfc55) entered blocking state
Aug 28 08:24:18 sanderspve kernel: br-7f65d358e342: port 3(veth4abfc55) entered forwarding state
Aug 28 08:24:18 sanderspve kernel: eth0: renamed from veth819b205
Aug 28 08:24:18 sanderspve kernel: IPv6: ADDRCONF(NETDEV_CHANGE): vethc07df50: link becomes ready
Aug 28 08:24:18 sanderspve kernel: br-7f65d358e342: port 4(vethc07df50) entered blocking state
Aug 28 08:24:18 sanderspve kernel: br-7f65d358e342: port 4(vethc07df50) entered forwarding state
Aug 28 08:26:11 sanderspve chronyd[769]: Selected source 45.159.204.28 (2.debian.pool.ntp.org)
Aug 28 08:37:51 sanderspve systemd[1]: Starting systemd-tmpfiles-clean.service - Cleanup of Temporary Directories...
Aug 28 08:37:51 sanderspve systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
Aug 28 08:37:51 sanderspve systemd[1]: Finished systemd-tmpfiles-clean.service - Cleanup of Temporary Directories.
Aug 28 08:37:51 sanderspve systemd[1]: run-credentials-systemd\x2dtmpfiles\x2dclean.service.mount: Deactivated successfully.
-- Reboot --
Aug 28 17:58:22 sanderspve kernel: Linux version 6.2.16-3-pve (tom@sbuild) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z) ()
Aug 28 17:58:22 sanderspve kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.2.16-3-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on
Aug 28 17:58:22 sanderspve kernel: KERNEL supported cpus:
Aug 28 17:58:22 sanderspve kernel:   Intel GenuineIntel
Aug 28 17:58:22 sanderspve kernel:   AMD AuthenticAMD
Aug 28 17:58:22 sanderspve kernel:   Hygon HygonGenuine
Aug 28 17:58:22 sanderspve kernel:   Centaur CentaurHauls
Aug 28 17:58:22 sanderspve kernel:   zhaoxin   Shanghai 
Aug 28 17:58:22 sanderspve kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Aug 28 17:58:22 sanderspve kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Aug 28 17:58:22 sanderspve kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Aug 28 17:58:22 sanderspve kernel: x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers'
Aug 28 17:58:22 sanderspve kernel: x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR'
Aug 28 17:58:22 sanderspve kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Aug 28 17:58:22 sanderspve kernel: x86/fpu: xstate_offset[3]:  832, xstate_sizes[3]:   64
Aug 28 17:58:22 sanderspve kernel: x86/fpu: xstate_offset[4]:  896, xstate_sizes[4]:   64
Aug 28 17:58:22 sanderspve kernel: x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format.
Aug 28 17:58:22 sanderspve kernel: signal: max sigframe size: 2032
Aug 28 17:58:22 sanderspve kernel: BIOS-provided physical RAM map:

Neobin · Aug 29, 2023

Aussi said:
I've been monitoring now the server for a couple of weeks.

And ofcourse a real crash or hang up hasn't happened since I hook up the physical monitor....

Aussi said:
Ofcourse after removing the physical display to my Proxmox server the server starts crashing again .... pffff

Maybe a/the similar/same problem?:
https://forum.proxmox.com/threads/p...rdware-transcoding-in-plex.132187/post-581509 and following

Aussi · Jul 19, 2024

Darn it. I have to revive this topic sadly enough.
Again on holidays and my PVE server keeps crashing after 12 hours or more.

And sadly enough I cant find anything back with journalctl --since "2024-07-13 00:00:00" --until "2024-07-19 00:00:00" command.

My first thoughts were that one of the LXC containers is crashing but also in that container I cant find any errors about that.

the strangest thing is that when it happens it partially crashes.

I have 1 Docker LXC , 1 PiHole LXC and HASSIO VM.

When it happens I can still access the Hassio VM for a while but my Docker stuff seems to be unavailable.
And also trying to access the PVE server with tailscale doesn't seem to work when it partially crashes. I have tailscaled installed on the PVE Host.

Does any one have any thoughts on how to look up why it hangs it self after 12 hours or more?

Aussi · Jul 21, 2024

Any one ?

leesteken · Jul 21, 2024

Aussi said:
Any one ?

Maybe it's your hardware or maybe it's because of running Docker in a container: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_pct
I may have missed your hardware information? There are known stability issues with 13th and 14th gen Intel consumer CPUs.
Try running for a long time without Docker?
Try booting the system with a Ubuntu 22.04 installer and run some benchmark software, to see if it crashes (when Proxmox and VMs/CTs are not involved)?

Aussi · Jul 21, 2024

leesteken said:
Maybe it's your hardware or maybe it's because of running Docker in a container: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_pct
I may have missed your hardware information? There are known stability issues with 13th and 14th gen Intel consumer CPUs.
Try running for a long time without Docker?
Try booting the system with a Ubuntu 22.04 installer and run some benchmark software, to see if it crashes (when Proxmox and VMs/CTs are not involved)?

Thx for the reply leesteken!

Strangest thing is that it started happening direct when I went on holiday.

I trigger some stuff I Home Assistant when on holiday.
So there will definitely be more activity on the Docker LXC container due to the fact that Frigate , Double Take and compreface will be activated

Edit: reading your link points out that is wise to be running it on a VM. That I didn't know. Can I easily 'migrated' from LXC to a VM ?

sw-omit · Jul 24, 2024

No real experience with HA, but probably both the easiest and problem-proof method would be to install HA "clean" on a VM, and then make and restore a backup from the container (for example like [1], do check that you have HA-Core though)
That way you still have your old setup completely intact to revert to if something is broken.

[1] https://community.home-assistant.io/t/guide-home-assistant-core-restoring-a-backup/626347

Aussi · Jul 27, 2024

Thanks for the replies guys ! HA , Home Assistant is already running on a VM.... and seems to survive the "horror"
So my guess is that the Docker LXC is crashing hard and also taking a little bit of the PVE host with it ?

So it surprises me that I can access my VM ( Home Assistant in this case ).
But the Docker LXC crashed and i am also unable to access PVE host website.....

Aussi · Jul 27, 2024

So Holidays are over sadly enough....

So time to investigate some more because it happened again just 30 minutes ago...

I have attached a monitor to the Intel NUC to check what's going on.
But I can't really tell what's going wrong to crash partially.....

It kinda looks like my external hard drive is giving errors?

https://ibb.co/t2sJW8G

Aussi · Jul 28, 2024

leesteken said:
Maybe it's your hardware or maybe it's because of running Docker in a container: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_pct
I may have missed your hardware information? There are known stability issues with 13th and 14th gen Intel consumer CPUs.
Try running for a long time without Docker?
Try booting the system with a Ubuntu 22.04 installer and run some benchmark software, to see if it crashes (when Proxmox and VMs/CTs are not involved)?

@leesteken maybe you have a clue if the picture ( https://ibb.co/t2sJW8G ) is the reason why proxmox partially crashes?

leesteken · Jul 28, 2024

Aussi said:
@leesteken maybe you have a clue if the picture ( https://ibb.co/t2sJW8G ) is the reason why proxmox partially crashes?

Looks like your drive is broken? (And I think dmraid is not supported.)

Aussi · Jul 28, 2024

I am doing back ups to that drive. (I know it's not the best way) And it looks like it still does that?

And the drive still passes the S.M.A.R.T.

EDIT:
I ran the backup session again to the HDD and all went well.

If I look at the picture it seems to be having trouble with the PiHole Journal part. Not really know what that is.

How to check (which) logs after crash? And hopefully fix the crash :)

Member

Distinguished Member

Member

Member

Attachments

Distinguished Member

Distinguished Member

Member

Member

Member

Distinguished Member

Member

Member

Distinguished Member

Member

Active Member

Member

Member

Member

Distinguished Member

Member