[SOLVED] PVE backups crash the host system since two days (external + internal backup targets)

Hi Fiona,

Thx for the tip,

here is the log

[ 519.233988] mce: [Hardware Error]: CPU 16: Machine Check Exception: 4 Bank 1: bc800800060c0859
[ 519.234330] mce: [Hardware Error]: TSC 1ed356b2f76 ADDR 12fd3e500 MISC d01a000000000000 IPID 100b000000000
[ 519.234583] mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1686319070 SOCKET 0 APIC 9 microcode a20120a
[ 519.234828] mce: [Hardware Error]: Machine check: Uncorrected unrecoverable error in kernel context
[ 519.235074] Kernel panic - not syncing: Fatal local machine check
[ 519.236289] Kernel Offset: 0x26000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 519.259097] pstore: backend (efi_pstore) writing error (-5)
[ 519.259347] Rebooting in 30 seconds..

Host booted with 6.2 kernel
 
Hi,
haven't seen this before either, but from a quick search it sounds like you can install mcelog to get more detailed logs and that it might be a CPU-related hardware issue. Can you check if you got the latest BIOS updates and microcode installed?
 
More detailed logs after install rasdeamon

[ 1707.303702] mce: Uncorrected hardware memory error in user-access at 22b239600
[ 1707.303711] mce: [Hardware Error]: Machine check events logged
[ 1707.304284] [Hardware Error]: Uncorrected, software restartable error.
[ 1707.304513] [Hardware Error]: CPU:17 (19:21:2) MC0_STATUS[-|UE|MiscV|AddrV|-|-|-|-|Poison|-]: 0xbc00080001010135
[ 1707.304750] [Hardware Error]: Error Addr: 0x000000022b239600
[ 1707.304981] [Hardware Error]: IPID: 0x001000b000000000
[ 1707.305211] [Hardware Error]: Load Store Unit Ext. Error Code: 1, An ECC error or L2 poison was detected on a data cache read by a load.
[ 1707.305693] [Hardware Error]: cache level: L1, tx: DATA, mem-tx: DRD