Proxmox host crashed? Found in a powered down state...

den

Member
Feb 19, 2015
64
1
6
hi guys.. my proxmox server somehow was found turned off. It was in a powered down state automatically. And I dont know how.

I suspect a crash. but if it was to crash, then it should be in a hung state or should reboot im thinking. but not powered down.

how can i check what happened?
do you want the syslog?
can i check elsewhere?
what else do you need to check?

im running v3.4, with ZFS softwear raid (raid1 - mirror) (via your build process).
 
hi thanks for the reply..


below is the syslog. i cannot find anything.. well i dont understand how to read it. i really appreciate any help.

http://pastebin.com/raw.php?i=ui3J2Mpy


The last logged item before the issue was at 12:17:01
And the entry at 12:39:57 must be the start of the server coming back up.



 
Sorry, can't help. I had the exact same issue a few days ago and thought it was a sudden fluke on the hardware. RAM and DISK CHecks brought nothing to light. My Proxmox Logs show nothing as well, looks like a complete freeze without error indicating the arrival of the sudden death.

But this box is running the latest from testing. None of my stable / subscription machine have every shown anything like it.
 
interesting.. thanks for the reply.
i'm using Proxmox VE 3.4. is that not the stable version?
 
Hi,
I have proxmox VE 3.4 and I have exactly the same problem, I found anything abnormal in my syslog, but My server crashes every 3-4 days. anyone can help?
 
Try sticking to the original build from the iso without upgrading / updating. That I so build seems to be more stable. Only upgrade when a new image is out.
Else get a subscription... And upgrade.
 
How about some computer specs? (CPU, memory - ecc or non-ecc, motherboard, hard drives w/ age, etc.)
 
How about some computer specs? (CPU, memory - ecc or non-ecc, motherboard, hard drives w/ age, etc.)
hi, I have a server Dell PowerEdge R220 with 1 cpu intel xeon 8 cores, 32g memery ECC, 2x1T sata3, server run in proxmox ve 3.4 upgraded from 3.3 with 4 kvm machine.
My syslog reports no problems, server became offline about 17h23, and I reboot server in 17h42, here is my syslog:
Apr 24 16:07:28 sv391 kernel: tg3 0000:01:00.1: eth1: Link is down
Apr 24 16:09:30 sv391 kernel: tg3 0000:01:00.1: eth1: Link is up at 100 Mbps, full duplex
Apr 24 16:09:30 sv391 kernel: tg3 0000:01:00.1: eth1: Flow control is on for TX and on for RX
Apr 24 16:09:30 sv391 kernel: tg3 0000:01:00.1: eth1: EEE is disabled
Apr 24 16:17:01 sv391 /USR/SBIN/CRON[439567]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Apr 24 16:23:56 sv391 rrdcached[2665]: flushing old values
Apr 24 16:23:56 sv391 rrdcached[2665]: rotating journals
Apr 24 16:23:56 sv391 rrdcached[2665]: started new journal /var/lib/rrdcached/journal/rrd.journal.1429885436.244686
Apr 24 16:23:56 sv391 rrdcached[2665]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1429878236.244697
Apr 24 16:33:50 sv391 kernel: tg3 0000:01:00.1: eth1: Link is down
Apr 24 16:35:52 sv391 kernel: tg3 0000:01:00.1: eth1: Link is up at 100 Mbps, full duplex
Apr 24 16:35:52 sv391 kernel: tg3 0000:01:00.1: eth1: Flow control is on for TX and on for RX
Apr 24 16:35:52 sv391 kernel: tg3 0000:01:00.1: eth1: EEE is disabled
Apr 24 16:50:47 sv391 kernel: tg3 0000:01:00.1: eth1: Link is down
Apr 24 16:52:48 sv391 kernel: tg3 0000:01:00.1: eth1: Link is up at 100 Mbps, full duplex
Apr 24 16:52:48 sv391 kernel: tg3 0000:01:00.1: eth1: Flow control is on for TX and on for RX
Apr 24 16:52:48 sv391 kernel: tg3 0000:01:00.1: eth1: EEE is disabled
Apr 24 17:17:01 sv391 /USR/SBIN/CRON[447538]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Apr 24 17:23:56 sv391 rrdcached[2665]: flushing old values
Apr 24 17:23:56 sv391 rrdcached[2665]: rotating journals
Apr 24 17:23:56 sv391 rrdcached[2665]: started new journal /var/lib/rrdcached/journal/rrd.journal.1429889036.244711
Apr 24 17:23:56 sv391 rrdcached[2665]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1429881836.244704
Apr 24 17:42:29 sv391 kernel: imklog 5.8.11, log source = /proc/kmsg started.
Apr 24 17:42:29 sv391 rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2553" x-info="http://www.rsyslog.com"] start
Apr 24 17:42:29 sv391 kernel: Initializing cgroup subsys cpuset
Apr 24 17:42:29 sv391 kernel: Initializing cgroup subsys cpu
Apr 24 17:42:29 sv391 kernel: Linux version 2.6.32-37-pve (root@lola) (gcc version 4.7.2 (Debian 4.7.2-5) ) #1 SMP Wed Mar 18 08:19:56 CET 2015
Apr 24 17:42:29 sv391 kernel: Command line: BOOT_IMAGE=/vmlinuz-2.6.32-37-pve root=UUID=83340bac-8016-4e54-be1a-4f6d73b7d5cf ro quiet
Apr 24 17:42:29 sv391 kernel: KERNEL supported cpus:
Apr 24 17:42:29 sv391 kernel: Intel GenuineIntel
Apr 24 17:42:29 sv391 kernel: AMD AuthenticAMD
Apr 24 17:42:29 sv391 kernel: Centaur CentaurHauls
Apr 24 17:42:29 sv391 kernel: BIOS-provided physical RAM map:

Thanks
 
So looks like I spoke too quick. it went for months without a crash.. and all of a sudden, looks like it froze.
the server summary shows monitoring has stopped as well.

at Apr 27 22:17:01 was the last event before it was powered back on.

Any help anyone?? I have the full log if anyone needs it.

SYSLOG
Code:
[COLOR=#000000][FONT=tahoma]Apr 27 20:10:03 lion rrdcached[2848]: flushing old values[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 20:10:03 lion rrdcached[2848]: rotating journals[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 20:10:03 lion rrdcached[2848]: started new journal /var/lib/rrdcached/journal/rrd.journal.1430129403.008559[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 20:10:03 lion rrdcached[2848]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1430122203.008555[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 20:17:01 lion /USR/SBIN/CRON[480300]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:10:03 lion rrdcached[2848]: flushing old values[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:10:03 lion rrdcached[2848]: rotating journals[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:10:03 lion rrdcached[2848]: started new journal /var/lib/rrdcached/journal/rrd.journal.1430133003.008564[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:10:03 lion rrdcached[2848]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1430125803.008547[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:13:28 lion pvedaemon[438615]: <root@pam> successful auth for user 'dilan.jayawardana@pve'[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:13:55 lion pvedaemon[442108]: <dilan.jayawardana@pve> starting task UPID:lion:00077706:19E123A2:553E19F3:vncproxy:313:dilan.jayawardana@pve:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:13:55 lion pvedaemon[489222]: starting vnc proxy UPID:lion:00077706:19E123A2:553E19F3:vncproxy:313:dilan.jayawardana@pve:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:17:02 lion /USR/SBIN/CRON[489715]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:27:21 lion kernel: vmbr0: port 5(tap313i0) entering disabled state[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:27:21 lion halevt: Running: halevt-umount -u /org/freedesktop/Hal/devices/net_0a_89_d6_a1_bd_78; halevt-umount -s[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:27:22 lion pvedaemon[491374]: starting vnc proxy UPID:lion:00077F6E:19E25EC1:553E1D1A:vncproxy:313:dilan.jayawardana@pve:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:27:22 lion pvedaemon[438615]: <dilan.jayawardana@pve> starting task UPID:lion:00077F6E:19E25EC1:553E1D1A:vncproxy:313:dilan.jayawardana@pve:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:27:23 lion ntpd[2777]: Deleting interface #190 tap313i0, fe80::889:d6ff:fea1:bd78#123, interface stats: received=0, sent=0, dropped=0, active_time=44653 secs[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:27:23 lion ntpd[2777]: peers refreshed[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:27:23 lion qm[491376]: VM 313 qmp command failed - VM 313 not running[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 21:27:23 lion pvedaemon[491374]: command '/bin/nc -l -p 5902 -w 10 -c '/usr/sbin/qm vncproxy 313 2>/dev/null'' failed: exit code 255[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 22:10:03 lion rrdcached[2848]: flushing old values[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 27 22:10:03 lion rrdcached[2848]: rotating journals[/FONT][/COLOR]
[B][COLOR=#0000cd][FONT=tahoma]Apr 27 22:10:03 lion rrdcached[2848]: started new journal /var/lib/rrdcached/journal/rrd.journal.1430136603.008637[/FONT]
[FONT=tahoma]Apr 27 22:10:03 lion rrdcached[2848]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1430129403.008559[/FONT][/COLOR][/B]
[COLOR=#ff0000][B][FONT=tahoma]Apr 27 22:17:01 lion /USR/SBIN/CRON[499119]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)[/FONT][/B][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 28 09:27:48 lion kernel: imklog 5.8.11, log source = /proc/kmsg started.[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 28 09:27:48 lion rsyslogd: [origin software="rsyslogd" swVersion="5.8.11" x-pid="2685" x-info="http://www.rsyslog.com"] start[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 28 09:27:48 lion kernel: Initializing cgroup subsys cpuset[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 28 09:27:48 lion kernel: Initializing cgroup subsys cpu[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 28 09:27:48 lion kernel: Linux version 2.6.32-37-pve (root@lola) (gcc version 4.7.2 (Debian 4.7.2-5) ) #1 SMP Wed Feb 11 10:00:27 CET 2015[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 28 09:27:48 lion kernel: Command line: BOOT_IMAGE=/ROOT/pve-1@/boot/vmlinuz-2.6.32-37-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs quiet[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 28 09:27:48 lion kernel: KERNEL supported cpus:[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 28 09:27:48 lion kernel:  Intel GenuineIntel[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 28 09:27:48 lion kernel:  AMD AuthenticAMD[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 28 09:27:48 lion kernel:  Centaur CentaurHauls[/FONT][/COLOR]
[COLOR=#000000][FONT=tahoma]Apr 28 09:27:48 lion kernel: BIOS-provided physical RAM map:
[/FONT][/COLOR]...


VERSION
Code:
proxmox-ve-2.6.32: 3.3-147 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-1 (running version: 3.4-1/3f2d890e)
pve-kernel-2.6.32-37-pve: 2.6.32-147
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.3-20
pve-firmware: 1.1-3
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-31
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-12
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
 
Last edited:
Hmm... looks like these servers crashed after the same event.

Apr 24 17:23:56 sv391 rrdcached[2665]: started new journal /var/lib/rrdcached/journal/rrd.journal.1429889036.244711
Apr 24 17:23:56 sv391 rrdcached[2665]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1429881836.244704

den said:
Apr 27 22:10:03 lion rrdcached[2848]: started new journal /var/lib/rrdcached/journal/rrd.journal.1430136603.008637
Apr 27 22:10:03 lion rrdcached[2848]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1430129403.008559
 
Hello, I have the same problem - the automatic restart of the server, or someone did it solve?
 
not solved. well I've rebuild mine to 4.2 now, so i hope the problem goes away.
 
I'm seeing what I think are these same hangs/crashes every 24 to 48 hours or so too.
My machine stays powered on but everything is hung and unresponsive.
Neither syslog or the BMC logs show anything interesting.
This is running 4.2...

I originally thought this was a memory issue, but I swapped to different sticks (ecc reg) and ran memtest for quite a few passes without issues or errors.

The other thing I thought could cause it was high IOH temperature (my board is an X8DTH-if which has two IOH chips right next to each other that get really hot.).
To combat this I stuck an 80mm fan directly on top, but this hang/crash seems to still occur.
I'm going to attempt a fresh install from the latest ISO, but if that doesn't fix it, I'm just going to something else

The last entry in syslog always is the bit about

Code:
rrdcached[4234]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1462685004.325075
I see this task runs every hour, so it might just be that there is nothing else that may be hitting syslog between this and the hang, but it's the only thing that I have to work with.
 
I have the same problem, and also all been checked. One thing that has not checked a RAID. Do you have a solution hardware or software?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!