Proxmox VE 4.1 crashes often

MisterFantastico

New Member
Apr 5, 2016
19
0
1
29
Hello,

i bought an new supermicro server and installed debian 8.2 (jessie) with promox ve 4.1.
This system is now in productive use.
After 2 days the server "crashes". The server doesn't reply to ping or other packages. Only an hardreset helps. There arent any noticeable entrys in the log. It seems that the server doesnt working at this time.
Server crashes infrequently.
I havent an kvm to look what hapens on the console.

Before Promox i've installed windows server on the hardware (online 1 month without crash / freeze)

Linux XXXXXX 4.2.8-1-pve #1 SMP Sat Mar 19 10:44:29 CET 2016 x86_64 GNU/Linux

Hardware Info:
Intel(R) Xeon(R) CPU E3-1230 v5
Supermicro X11SSL-F Mainboard
2 x 3 TB HDD @ RAID 1 (Onboard LSI RAID Controller)
32 Gbyte DDR4 RAM

root@XXXX:~# pveversion -v
proxmox-ve: 4.1-41 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-22 (running version: 4.1-22/aca130cf)
pve-kernel-4.2.8-1-pve: 4.2.8-41
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-36
qemu-server: 4.0-64
pve-firmware: 1.1-7
libpve-common-perl: 4.0-54
libpve-access-control: 4.0-13
libpve-storage-perl: 4.0-45
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-9
pve-container: 1.0-52
pve-firewall: 2.0-22
pve-ha-manager: 1.0-25
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1

Can someone help me?
 
Last edited:
Hi,

do you see anything in the logs?
 
Hello,

no there arent any entrys.
Normal entrys like cron ... then the server crashes. There arent any log entrys since the reboot.
 
Like this:
Apr 3 21:40:01 skylake CRON[17990]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
Apr 3 21:44:43 skylake systemd-timesyncd[488]: interval/delta/delay/jitter/drift 2048s/-0.000s/0.026s/0.000s/-15ppm
Apr 3 21:45:01 skylake CRON[21226]: (root) CMD (/usr/local/bin/live_monitoring_system)
Apr 3 21:45:01 skylake CRON[21228]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
Apr 3 21:50:01 skylake CRON[24524]: (root) CMD (/usr/local/bin/live_monitoring_system)
Apr 3 21:50:01 skylake CRON[24526]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
Apr 3 21:54:43 skylake rrdcached[982]: flushing old values
Apr 3 21:54:43 skylake rrdcached[982]: rotating journals
Apr 3 21:54:43 skylake rrdcached[982]: started new journal /var/lib/rrdcached/journal/rrd.journal.1459713283.604914
Apr 3 21:54:43 skylake rrdcached[982]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1459706083.604858
Apr 3 21:55:01 skylake CRON[27819]: (root) CMD (/usr/local/bin/live_monitoring_system)
Apr 3 21:55:01 skylake CRON[27821]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
Apr 3 22:00:01 skylake CRON[31150]: (root) CMD (/usr/local/bin/live_monitoring_system)
Apr 3 22:00:01 skylake CRON[31152]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
Apr 3 22:05:01 skylake CRON[2018]: (root) CMD (/usr/local/bin/live_monitoring_system)
Apr 3 22:05:01 skylake CRON[2020]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
Apr 3 22:10:01 skylake CRON[5532]: (root) CMD (/usr/local/bin/live_monitoring_system)
Apr 3 22:10:01 skylake CRON[5534]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
Apr 3 22:15:01 skylake CRON[8905]: (root) CMD (/usr/local/bin/live_monitoring_system)
Apr 3 22:15:01 skylake CRON[8907]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
Apr 3 22:17:01 skylake CRON[11363]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Apr 3 22:18:51 skylake systemd-timesyncd[488]: interval/delta/delay/jitter/drift 2048s/+0.000s/0.027s/0.000s/-15ppm
Apr 3 22:20:01 skylake CRON[12175]: (root) CMD (/usr/local/bin/live_monitoring_system)
Apr 3 22:20:01 skylake CRON[12177]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
Apr 3 22:49:46 skylake rsyslogd: [origin software="rsyslogd" swVersion="8.4.2" x-pid="1029" x-info="http://www.rsyslog.com"] start
Apr 3 22:49:46 skylake kernel: [ 0.000000] Initializing cgroup subsys cpuset
Apr 3 22:49:46 skylake kernel: [ 0.000000] Initializing cgroup subsys cpu
Apr 3 22:49:46 skylake kernel: [ 0.000000] Initializing cgroup subsys cpuacct
Apr 3 22:49:46 skylake kernel: [ 0.000000] Linux version 4.2.8-1-pve (root@elsa) (gcc version 4.9.2 (Debian 4.9.2-10) ) #1 SMP Sat Mar 19 10:44:29 CET 2016 ()
Apr 3 22:49:46 skylake kernel: [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.2.8-1-pve root=UUID=ae349422-bd63-4566-9bde-8dbfb6cad3d4 ro
Apr 3 22:49:46 skylake kernel: [ 0.000000] KERNEL supported cpus:
Apr 3 22:49:46 skylake systemd[1]: Started udev Coldplug all Devices.
Apr 3 22:49:46 skylake systemd[1]: Starting udev Wait for Complete Device Initialization...
Apr 3 22:49:46 skylake systemd-modules-load[252]: Module 'fuse' is builtin
Apr 3 22:49:46 skylake systemd-modules-load[252]: Inserted module 'vhost_net'
Apr 3 22:49:46 skylake systemd[1]: Started Load Kernel Modules.
Apr 3 22:49:46 skylake systemd[1]: Mounting FUSE Control File System...
 
Have you disabled all power saving in the bios?
last bios update?
Cpu on performance mode in bios?
 
Any special action by munin and live_monitoring? Any special restriction on snmp/xinet or iptables?
Hi, no ... live_monitoring check's the server load (i've configured it later so the crashes are occur before live_monitoring was installed) ... munin is the normal munin agent without any plugins (also the server crashes are occur before munin was installed) ... i didnt configure any iptables rules
 
Look in the attachement - this is the log of my monitoring system - proxmox crashes very very often
 

Attachments

  • crashes.PNG
    crashes.PNG
    19.3 KB · Views: 21
How about your memory usage? Is this node running VM's or is it only Proxmox VE that is installed?
Yes this Server is running some VM's - its a productive system.

root@XXXX:~# free -m
total used free shared buffers cached
Mem: 32089 31900 188 128 432 14625
-/+ buffers/cache: 16842 15247
Swap: 9487 0 9487

This is a brandnew server. Running memtest86+ for a weekend before ive installed proxmox ... its all ok. Before proxmox i am using microsoft hyper-v and there arent any crashes so i think this isnt a hardware problem.
 
seems you are low on free mem
you might test the impact of forcing the system to flush cache to disk.

Not sure what would be really the impact on a proxmox system...so I would probably start running a cron job which monitors free memory and see what is the last limit before a crash, and check if thern is a pattern.

You might create the script shell below and cron it every 10 minutes:

#!/bin/sh

myfile=/root/memstat.csv

memfree=$(free -m | sed -n '2p'| awk '{print $4}')
echo "$(date +'%Y-%m-%d %H:%M');$memfree" >> $myfile
 
seems you are low on free mem
you might test the impact of forcing the system to flush cache to disk.

Not sure what would be really the impact on a proxmox system...so I would probably start running a cron job which monitors free memory and see what is the last limit before a crash, and check if thern is a pattern.

You might create the script shell below and cron it every 10 minutes:

#!/bin/sh

myfile=/root/memstat.csv

memfree=$(free -m | sed -n '2p'| awk '{print $4}')
echo "$(date +'%Y-%m-%d %H:%M');$memfree" >> $myfile
Thank you. I've created the cronjob. But i think this is a normal linux ram caching.
http://www.linuxatemyram.com/

Look in my attachement - proxmox webinterface shows the real ram load
 

Attachments

  • Unbenannt.PNG
    Unbenannt.PNG
    12.8 KB · Views: 15

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!