My pve node (version 8.2.2) has restarted several times since today. Can you help me check the logs to see what caused it

pp58412345

New Member
Apr 21, 2025
4
0
1
this is journalctl log
root@pve:~# journalctl
Apr 02 21:34:32 pve kernel: Linux version 6.8.4-2-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for D>
Apr 02 21:34:32 pve kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.4-2-pve root=/dev/mapper/pve-root ro quiet
Apr 02 21:34:32 pve kernel: KERNEL supported cpus:
Apr 02 21:34:32 pve kernel: Intel GenuineIntel
Apr 02 21:34:32 pve kernel: AMD AuthenticAMD
Apr 02 21:34:32 pve kernel: Hygon HygonGenuine
Apr 02 21:34:32 pve kernel: Centaur CentaurHauls
Apr 02 21:34:32 pve kernel: zhaoxin Shanghai
Apr 02 21:34:32 pve kernel: BIOS-provided physical RAM map:
Apr 02 21:34:32 pve kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
Apr 02 21:34:32 pve kernel: BIOS-e820: [mem 0x0000000000100000-0x000000007932cfff] usable
Apr 02 21:34:32 pve kernel: BIOS-e820: [mem 0x000000007932d000-0x000000007a08afff] reserved
Apr 02 21:34:32 pve kernel: BIOS-e820: [mem 0x000000007a08b000-0x000000007a0e6fff] ACPI data
Apr 02 21:34:32 pve kernel: BIOS-e820: [mem 0x000000007a0e7000-0x000000007aef7fff] ACPI NVS
Apr 02 21:34:32 pve kernel: BIOS-e820: [mem 0x000000007aef8000-0x000000007b543fff] reserved
Apr 02 21:34:32 pve kernel: BIOS-e820: [mem 0x000000007b544000-0x000000007b591fff] type 20
Apr 02 21:34:32 pve kernel: BIOS-e820: [mem 0x000000007b592000-0x000000007b592fff] usable
Apr 02 21:34:32 pve kernel: BIOS-e820: [mem 0x000000007b593000-0x000000007b618fff] reserved
Apr 02 21:34:32 pve kernel: BIOS-e820: [mem 0x000000007b619000-0x000000007bffffff] usable
Apr 02 21:34:32 pve kernel: BIOS-e820: [mem 0x000000007c000000-0x000000008fffffff] reserved
Apr 02 21:34:32 pve kernel: BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed44fff] reserved
Apr 02 21:34:32 pve kernel: BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved
Apr 02 21:34:32 pve kernel: BIOS-e820: [mem 0x0000000100000000-0x000000107fffffff] usable
Apr 02 21:34:32 pve kernel: NX (Execute Disable) protection: active
Apr 02 21:34:32 pve kernel: APIC: Static calls initialized
Apr 02 21:34:32 pve kernel: efi: EFI v2.4 by American Megatrends
Apr 02 21:34:32 pve kernel: efi: ESRT=0x7b540318 ACPI=0x7a09d000 ACPI 2.0=0x7a09d000 SMBIOS=0xf05e0 SMBIOS 3.0=0x7b3fe000 MOKvar>
Apr 02 21:34:32 pve kernel: efi: Remove mem32: MMIO range=[0x80000000-0x8fffffff] (256MB) from e820 map
Apr 02 21:34:32 pve kernel: e820: remove [mem 0x80000000-0x8fffffff] reserved
Apr 02 21:34:32 pve kernel: efi: Not removing mem33: MMIO range=[0xfed1c000-0xfed44fff] (164KB) from e820 map
Apr 02 21:34:32 pve kernel: efi: Remove mem34: MMIO range=[0xff000000-0xffffffff] (16MB) from e820 map
Apr 02 21:34:32 pve kernel: e820: remove [mem 0xff000000-0xffffffff] reserved
Apr 02 21:34:32 pve kernel: secureboot: Secure boot disabled
Apr 02 21:34:32 pve kernel: SMBIOS 3.0.0 present.
Apr 02 21:34:32 pve kernel: DMI: HUANANZHI /X99-F8 GAMING, BIOS 5.11 09/15/2021
Apr 02 21:34:32 pve kernel: tsc: Fast TSC calibration using PIT
Apr 02 21:34:32 pve kernel: tsc: Detected 2594.015 MHz processor
Apr 02 21:34:32 pve kernel: e820: update [mem 0x00000000-0x00000fff] usable ==> reserved
Apr 02 21:34:32 pve kernel: e820: remove [mem 0x000a0000-0x000fffff] usable
Apr 02 21:34:32 pve kernel: last_pfn = 0x1080000 max_arch_pfn = 0x400000000
Apr 02 21:34:32 pve kernel: MTRR map: 5 entries (3 fixed + 2 variable; max 23), built from 10 variable MTRRs
Apr 02 21:34:32 pve kernel: x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT
Apr 02 21:34:32 pve kernel: last_pfn = 0x7c000 max_arch_pfn = 0x400000000
Apr 02 21:34:32 pve kernel: found SMP MP-table at [mem 0x000fcd90-0x000fcd9f]
Apr 02 21:34:32 pve kernel: esrt: Reserving ESRT space from 0x000000007b540318 to 0x000000007b540350.
Apr 02 21:34:32 pve kernel: Using GB pages for direct mapping
Apr 02 21:34:32 pve kernel: secureboot: Secure boot disabled
Apr 02 21:34:32 pve kernel: RAMDISK: [mem 0x30b01000-0x34577fff]
Apr 02 21:34:32 pve kernel: ACPI: Early table checksum verification disabled
Apr 02 21:34:32 pve kernel: ACPI: RSDP 0x000000007A09D000 000024 (v02 ALASKA)
Apr 02 21:34:32 pve kernel: ACPI: XSDT 0x000000007A09D098 0000AC (v01 ALASKA A M I 01072009 AMI 00010013)
Apr 02 21:34:32 pve kernel: ACPI: FACP 0x000000007A0CFE08 00010C (v05 ALASKA A M I 01072009 AMI 00010013)
Apr 02 21:34:32 pve kernel: ACPI: DSDT 0x000000007A09D1D0 032C38 (v02 ALASKA A M I 01072009 INTL 20091013)
Apr 02 21:34:32 pve kernel: ACPI: FACS 0x000000007AEF6F80 000040
Apr 02 21:34:32 pve kernel: ACPI: APIC 0x000000007A0CFF18 0001E0 (v03 ALASKA A M I 01072009 AMI 00010013)
 
One of the reasons I speculate is that the NVMe hard drive smart failed, indicating that NVM subsystem reliability has been degraded.
Another reason is that the UPS configuration is incorrect, but I have already modified the UPS configuration to


## apcupsd.conf v1.1 ##

UPSNAME ups
#设置为5表示,切换到ups电源5S后开始关闭虚拟机,然后关闭宿主机,0为不启用,建议设置
TIMEOUT 600
#每隔5s输出ups状态到日志中
STATTIME 5
#开启日志,日志文件为/var/log/apcupsd.status
LOGSTATS on
#线缆类型为usb
UPSCABLE usb

#usb接口,自动识别
UPSTYPE usb
DEVICE

#还要注意把下面这行注释掉,不然不会自动发现usb
# DEVICE /dev/ttyS0

#断电6s后才识别为正在使用电池,防止短时间断电导致错误
ONBATTERYDELAY 60

#电池电量低于5%时关闭主机,建议修改为95,可以尽早关机
BATTERYLEVEL 30

#预计电量剩余时间小于3分钟时关闭主机,建议设置60或600,尽早关机
MINUTES 3