System Crash Triggered by Update

pveuser42

New Member
Dec 15, 2021
3
0
1
125
Hello,

Periodically my proxmox server will crash, and the tasks at the bottom will show that "Update package database" was started, then the system is booting and starting VM's. I am able to trigger crashes by navigating to updates for the node, and running "refresh" a few times until it crashes. The logs have no information since the system immediately crashes, and go from update started logs, to boot logs.

Any ideas on how I can troubleshoot this or investigate further? I have tried troubleshooting hardware, tweaking bios settings, following advice from a few forums regarding high io causing crashes, and even reinstalling the server. I am currently running Proxmox VE 7.1-8.

Thank you.
 
Thanks for that, it's possible that it is related. I have had this issue on proxmox v6 (latest) and now v7 (latest). I went ahead and disabled my regularly scheduled update.

I also took a look at journalctl around the time of the restart, and there are no logs that seem relevant. A few logs before and after the crash are below.

Code:
Dec 14 04:58:20 vm01 pveproxy[1752]: starting 1 worker(s)
Dec 14 04:58:20 vm01 pveproxy[1752]: worker 2430303 started
Dec 14 05:02:24 vm01 pveproxy[2415378]: worker exit
Dec 14 05:02:24 vm01 pveproxy[1752]: worker 2415378 finished
Dec 14 05:02:24 vm01 pveproxy[1752]: starting 1 worker(s)
Dec 14 05:02:24 vm01 pveproxy[1752]: worker 2433179 started
-- Boot 5a046e35f4114da1af57ddb0e26d1b25 --
Dec 14 05:10:31 vm01 kernel: Linux version 5.13.19-2-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.13
.19-4 (Mon, 29 Nov 2021 12:10:09 +0100) ()
Dec 14 05:10:31 vm01 kernel: Command line: initrd=\EFI\proxmox\5.13.19-2-pve\initrd.img-5.13.19-2-pve root=ZFS=rpool/ROOT/pve-1 boot=zfs
Dec 14 05:10:31 vm01 kernel: KERNEL supported cpus:
Dec 14 05:10:31 vm01 kernel:   Intel GenuineIntel
Dec 14 05:10:31 vm01 kernel:   AMD AuthenticAMD
Dec 14 05:10:31 vm01 kernel:   Hygon HygonGenuine
Dec 14 05:10:31 vm01 kernel:   Centaur CentaurHauls
Dec 14 05:10:31 vm01 kernel:   zhaoxin   Shanghai 
Dec 14 05:10:31 vm01 kernel: x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
Dec 14 05:10:31 vm01 kernel: x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
Dec 14 05:10:31 vm01 kernel: x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
Dec 14 05:10:31 vm01 kernel: x86/fpu: Supporting XSAVE feature 0x200: 'Protection Keys User registers'
Dec 14 05:10:31 vm01 kernel: x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
Dec 14 05:10:31 vm01 kernel: x86/fpu: xstate_offset[9]:  832, xstate_sizes[9]:    8
Dec 14 05:10:31 vm01 kernel: x86/fpu: Enabled xstate features 0x207, context size is 840 bytes, using 'compacted' format.
Dec 14 05:10:31 vm01 kernel: BIOS-provided physical RAM map:
 
Just to update this: I may have tracked the issue down to a PSU issue. I installed a new one, and haven't been able to reproduce the issue *yet*.

Who would have thought that the update process just happens to trigger a PSU issue, and otherwise the machine runs fine... Will report back if it happens again.