Proxmox crashed twice in 2 days

FlorinMarian

Active Member
Nov 13, 2017
69
2
28
27
Hi, guys!
I have a physical server colocated somewhere in Romania, an HP Cloudline CL3100 G3.

It has 2x new SSD disks on RAID1 and 4x refurbished SATA disks on RAIDZ-2, promoxmox installed on RAID1.

I use this system linked with whmcs module and in both situations, host node got frozen randomly when server received command to create new KVM instances.

Package versions:
Code:
proxmox-ve: 7.0-2 (running kernel: 5.11.22-6-pve)
pve-manager: 7.0-13 (running version: 7.0-13/7aa7e488)
pve-kernel-helper: 7.1-3
pve-kernel-5.11: 7.0-9
pve-kernel-5.11.22-6-pve: 5.11.22-11
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph-fuse: 15.2.13-pve1
corosync: 3.1.5-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve1
libproxmox-acme-perl: 1.4.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-11
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-3
libpve-storage-perl: 7.0-13
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.11-1
proxmox-backup-file-restore: 2.0.11-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.1-1
pve-docs: 7.0-5
pve-edk2-firmware: 3.20210831-1
pve-firewall: 4.2-4
pve-firmware: 3.3-2
pve-ha-manager: 3.3-1
pve-i18n: 2.5-1
pve-qemu-kvm: 6.1.0-1
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-17
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.1.1-pve1

I'll be happy to attach any another output.

Best regards, Florin.
 

FlorinMarian

Active Member
Nov 13, 2017
69
2
28
27
Whats in the "syslog" and "syslog.1" at that time when it crashes?
Just those lines before freeze
Code:
Nov  2 22:19:00 sv2 systemd[1]: Starting Proxmox VE replication runner...
Nov  2 22:19:01 sv2 systemd[1]: pvesr.service: Succeeded.
Nov  2 22:19:01 sv2 systemd[1]: Finished Proxmox VE replication runner.
Nov  2 22:19:03 sv2 dhcpd[2091]: DHCPDISCOVER from e4:8d:8c:de:09:74 via vmbr0: network 0.0.0.0/0: no free leases
Nov  2 22:19:05 sv2 pvedaemon[590782]: <root@pam> successful auth for user 'root@pam'
Nov  2 22:19:06 sv2 pvedaemon[590782]: <root@pam> successful auth for user 'root@pam'
Nov  2 22:19:07 sv2 dhcpd[2091]: DHCPDISCOVER from e4:8d:8c:de:09:74 via vmbr0: network 0.0.0.0/0: no free leases
Nov  2 22:19:07 sv2 pvedaemon[463928]: <root@pam> successful auth for user 'root@pam'
Nov  2 22:19:13 sv2 dhcpd[2091]: DHCPDISCOVER from e4:8d:8c:de:09:74 via vmbr0: network 0.0.0.0/0: no free leases
Nov  2 22:19:19 sv2 dhcpd[2091]: DHCPDISCOVER from e4:8d:8c:de:09:74 via vmbr0: network 0.0.0.0/0: no free leases
 

itNGO

Well-Known Member
Jun 12, 2020
524
114
48
44
Germany
it-ngo.com
Hi,
when there is no Kernel-Crash or anything in SYSLOG consider that the Mainboard or System just died/hangs. Possible Hardware-Failure with full freeze or power loss?
 
  • Like
Reactions: FlorinMarian

FlorinMarian

Active Member
Nov 13, 2017
69
2
28
27
Hi,
when there is no Kernel-Crash or anything in SYSLOG consider that the Mainboard or System just died/hangs. Possible Hardware-Failure with full freeze or power loss?
Shouldn't be related to hardware because it gets frozen only when receive from whmcs VM creation query.
On kern.log isn't anything no crash time.
 

FlorinMarian

Active Member
Nov 13, 2017
69
2
28
27
I come with an update.
On IPMI it reported an "ECC uncorrectable" and I started to manually check DIMM by DIMM but until now, everything is fine with 4/8 DIMMs.
I would like to ask, what else should I check if all DIMMs will be treated by Memtest86 as working normally?
Thank you!
 

leesteken

Famous Member
May 31, 2020
1,709
351
88
Maybe it only happens in specific conditions, which are not triggered by the memory tests.
It could be a voltage drop or spike, which might be the power supply or even a wall power issue. Is there a UPS?
Or maybe a timing issue. Check the memory speeds and timings against what the CPU supports with 8 DIMMs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!