Proxmox does crash under heavy load

git192prox

Member
Sep 6, 2020
18
3
23
62
Virtual Environment 8.2.2
RAM: 8GB

pve-manager/8.2.2/ (running kernel: 6.8.4-3-pve)

Linux prox 6.8.4-3-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.4-3 (2024-05-02T11:55Z) x86_64 GNU/Linux

BOOT_IMAGE=/vmlinuz-6.8.4-3-pve root=ZFS=/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet
---------------------------------------------------------------------------------------------------------------------------------------------------------


running vm during crash: Oracle Linux Server release 8.10

I had the same issue on another hardware - so i did fully replace mainboard, Power-Supply, memory - ALL.

But again - when system (proxmox) is under heavy load (CPU near 100%) - system will crash randomly.

Could not find relevant details via "dmesg"

voltage & temperatures seem to be normal:

Vcore Voltage: 1.25 V (min = +0.85 V, max = +1.60 V)
+3.3 Voltage: 3.26 V (min = +2.97 V, max = +3.63 V)
+5 Voltage: 4.94 V (min = +4.50 V, max = +5.50 V)
+12 Voltage: 12.21 V (min = +10.20 V, max = +13.80 V)
Vcore Voltage: 1.14 V (min = +0.85 V, max = +1.60 V)
+3.3 Voltage: 3.26 V (min = +2.97 V, max = +3.63 V)
+5 Voltage: 4.94 V (min = +4.50 V, max = +5.50 V)
+12 Voltage: 12.21 V (min = +10.20 V, max = +13.80 V)
Vcore Voltage: 1.12 V (min = +0.85 V, max = +1.60 V)
+3.3 Voltage: 3.26 V (min = +2.97 V, max = +3.63 V)
+5 Voltage: 4.94 V (min = +4.50 V, max = +5.50 V)
+12 Voltage: 12.26 V (min = +10.20 V, max = +13.80 V)
Vcore Voltage: 1.12 V (min = +0.85 V, max = +1.60 V)
+3.3 Voltage: 3.26 V (min = +2.97 V, max = +3.63 V)
+5 Voltage: 4.94 V (min = +4.50 V, max = +5.50 V)
+12 Voltage: 12.26 V (min = +10.20 V, max = +13.80 V)
Vcore Voltage: 1.11 V (min = +0.85 V, max = +1.60 V)
+3.3 Voltage: 3.26 V (min = +2.97 V, max = +3.63 V)
+5 Voltage: 4.94 V (min = +4.50 V, max = +5.50 V)
+12 Voltage: 12.26 V (min = +10.20 V, max = +13.80 V)
Vcore Voltage: 1.25 V (min = +0.85 V, max = +1.60 V)
+3.3 Voltage: 3.26 V (min = +2.97 V, max = +3.63 V)
+5 Voltage: 4.94 V (min = +4.50 V, max = +5.50 V)
+12 Voltage: 12.21 V (min = +10.20 V, max = +13.80 V)
Vcore Voltage: 1.23 V (min = +0.85 V, max = +1.60 V)
+3.3 Voltage: 3.25 V (min = +2.97 V, max = +3.63 V)
+5 Voltage: 4.90 V (min = +4.50 V, max = +5.50 V)
+12 Voltage: 12.26 V (min = +10.20 V, max = +13.80 V)
Vcore Voltage: 1.24 V (min = +0.85 V, max = +1.60 V)
+3.3 Voltage: 3.25 V (min = +2.97 V, max = +3.63 V)
+5 Voltage: 4.90 V (min = +4.50 V, max = +5.50 V)
+12 Voltage: 12.21 V (min = +10.20 V, max = +13.80 V)
Vcore Voltage: 1.17 V (min = +0.85 V, max = +1.60 V)
+3.3 Voltage: 3.25 V (min = +2.97 V, max = +3.63 V)
+5 Voltage: 4.92 V (min = +4.50 V, max = +5.50 V)
+12 Voltage: 12.26 V (min = +10.20 V, max = +13.80 V)
Vcore Voltage: 1.23 V (min = +0.85 V, max = +1.60 V)
+3.3 Voltage: 3.25 V (min = +2.97 V, max = +3.63 V)
+5 Voltage: 4.92 V (min = +4.50 V, max = +5.50 V)
+12 Voltage: 12.21 V (min = +10.20 V, max = +13.80 V)
Vcore Voltage: 1.25 V (min = +0.85 V, max = +1.60 V)
+3.3 Voltage: 3.25 V (min = +2.97 V, max = +3.63 V)
+5 Voltage: 4.94 V (min = +4.50 V, max = +5.50 V)
+12 Voltage: 12.21 V (min = +10.20 V, max = +13.80 V)

temp1: +52.0°C (high = +95.0°C, hyst = +3.0°C)
temp1: +53.0°C (high = +95.0°C, hyst = +3.0°C)
temp1: +52.0°C (high = +95.0°C, hyst = +3.0°C)
temp1: +52.0°C (high = +95.0°C, hyst = +3.0°C)
temp1: +52.0°C (high = +95.0°C, hyst = +3.0°C)
temp1: +52.0°C (high = +95.0°C, hyst = +3.0°C)
temp1: +52.0°C (high = +95.0°C, hyst = +3.0°C)
temp1: +53.0°C (high = +95.0°C, hyst = +3.0°C)
temp1: +54.0°C (high = +95.0°C, hyst = +3.0°C)
temp1: +54.0°C (high = +95.0°C, hyst = +3.0°C)
temp1: +54.0°C (high = +95.0°C, hyst = +3.0°C)
temp1: +53.0°C (high = +95.0°C, hyst = +3.0°C)
 
proxmox:

description: SERVER
product: P5K-E
width: 64 bits
capabilities: smbios-2.4 dmi-2.4 smp vsyscall32
RAM: 8GB

configuration: boot=normal chassis=desktop family=To Be Filled By O.E.M. sku=To Be Filled By O.E.M. uuid=xxxxxxx
description: BIOS
vendor: American Megatrends Inc.
version: 1305
date: 06/19/2009
size: 64KiB
capacity: 2MiB

capabilities: isa pci pnp apm upgrade shadowing escd cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer int10video acpi usb ls120boot zipboot biosbootspecification

product: Intel(R) Core(TM)2 Quad CPU @ 2.40GHz
vendor: Intel Corp.
physical id: 4
bus info: cpu@0
version: 6.15.7

capabilities: fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm pbe syscall nx x86-64 constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm pti tpr_shadow dtherm cpufreq

configuration: microcode=104
description: DIMM DDR Synchronous 800 MHz (1.2 ns)
description: PCI bridge
product: 82G33/G31/P35/P31 Express PCI Express Root Port
vendor: Intel Corporation
description: SATA controller
product: 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA Controller [AHCI mode]
vendor: Intel Corporation


VM:
------
agent: 1
balloon: 1000
boot: order=scsi0;ide2;net0
cores: 4
cpu: host
description: SERVER
ide2: none,media=cdrom
machine: pc,viommu=virtio
memory: 1850
name: bacula
net0: virtio=xx:xx:xx:xx:xx,bridge=vmbr0
numa: 1
onboot: 1
ostype: l26
parent: snap_xxx
protection: 1
scsi0: local-zfs:vm-101-disk-0,discard=on,replicate=0,size=100G
scsihw: virtio-scsi-pci
smbios1: uuid=xxx-xxx-3443-4333-yyyyyy
sockets: 1
vcpus: 4
virtio0: zpool1:vm-101-disk-2,backup=0,discard=on,replicate=0,size=6500G
virtio1: zpool2:vm-101-disk-0,backup=0,discard=on,replicate=0,size=2700G
virtio2: local-zfs:vm-101-disk-1,backup=0,replicate=0,size=500G
 
have got automatic crash monitoring/reporting - today got:

!!! ATTENTION: PROXMOX CRASHED !!!

- so numa: 0 did not help
 
yes, that's right .. i also read, there could be some threats regarding swap on top of zfs.

My config :

~# zfs list rpool/swap

NAME USED AVAIL REFER MOUNTPOINT
rpool/swap 31.6G 385G 11.0G -
 
Having a swap memory on the same zpool as your root disk can cause such behaviour. If you don't need to use swap, disable it completely when using ZFS. If you need swap, you should create a separate zpool (can be on a free partition of the same disks) and create your swap there.

I'd recommend disabling swap or creating swap on a different zpool if needed and check if your server still crashes under heavy load.
 
yes, mgabriel .. i read about it.

i did:
  • unpin kernel (previous action: Set kernel '6.5.13-5-pve' in /etc/kernel/proxmox-boot-pin)
  • destroy rpool/swap
  • create a new ext4 partition (20G)
  • activating swap on this ext4 partition (20G)

Let's see, if it works.
 
ok - gents - so far - no crashes anymore

So, i think re-swap to non-zfs (rpool) partition (ext4) solved the issue.

i'll track the behave some more weeks .. then i think i could close the case.

So far - thanks to all involved/helped on this topic :)
 
so, after monitoring the environment for a while - i can confirm, that the issue has been fixed by moving the zfs-swap area on rpool to a non-zfs (ext4) file system.

Thanks guys for help & support. I do appreciate a lot.

Cheers
Mike
 
  • Like
Reactions: mgabriel

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!