[SOLVED] Hypervisor kernel panic during backup to PBS

May 9, 2017
18
0
41
During backups to PBS the hypervisor will do a hard crash, it is not consistent at which point it does it.
Sometimes a backup succeeds, and sometimes it will not. But after a few backups 1 will fail and fully crash the hypervisor.

Does anyone have any idea where I can start at debugging this?
Journalctl live tail from the moment of the crash and backup task log attached.

AMD Ryzen 5 7600 3.8 GHz 6-Core Processor
MSI PRO X670-P WIFI ATX AM5 Motherboard
Corsair Vengeance 64 GB (2 x 32 GB) DDR5-6400 CL32 Memory
Seagate Exos 7E10 512e/4Kn 8 TB 3.5" 7200 RPM Internal Hard Drive x3 (zraid1)
Samsung PM893 1.92 TB 2.5" Solid State Drive x2 (OS, mirror)
Cooler Master MasterBox NR600 (w/o ODD) ATX Mid Tower Case
Cooler Master MWE GOLD 750 V2 FULL MODULAR 750 W 80+ Gold Certified Fully Modular ATX Power Supply
Code:
proxmox-ve: 8.1.0 (running kernel: 6.5.13-1-pve)
pve-manager: 8.1.4 (running version: 8.1.4/ec5affc9e41f1d79)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.5.13-1-pve-signed: 6.5.13-1
proxmox-kernel-6.5: 6.5.13-1
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
proxmox-kernel-6.5.11-7-pve-signed: 6.5.11-7
proxmox-kernel-6.5.11-6-pve-signed: 6.5.11-6
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
ceph-fuse: 17.2.7-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.2
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.1.1
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.5
libpve-network-perl: 0.9.5
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.1.0
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.4-1
proxmox-backup-file-restore: 3.1.4-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.5
proxmox-widget-toolkit: 4.1.4
pve-cluster: 8.0.5
pve-container: 5.0.8
pve-docs: 8.1.4
pve-edk2-firmware: 4.2023.08-4
pve-firewall: 5.0.3
pve-firmware: 3.9-2
pve-ha-manager: 4.0.3
pve-i18n: 3.2.1
pve-qemu-kvm: 8.1.5-3
pve-xtermjs: 5.3.0-3
qemu-server: 8.0.10
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.2-pve2
 

Attachments

Could be a memory failure. Maybe run memtest and/or replace/remove memory modules.
Ran Memtest86 and found out it was indeed a memory instability running DDR5 with EXPO enabled.
I'm not pushing pulling any real load or pushing for maximum performance, so it's no issue to disable EXPO and live with the small performance hit.

Thanks for pointing me in the right direction @leesteken !
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!