Proxmox VE 1.9 AER "recovery" loop - leads to partial crash

ryan-tw

New Member
Oct 6, 2010
5
0
1
Hi all

We appear to have an issue with a new server installation.

This server was running in our office perfectly well for over a week leading up until the install date.
On Saturday the 10.03 we took this server to the client site, reconfigured networks for their IP range etc and all appeared fine.

Late last night (Sunday 11.03) our monitoring software showed loss of connection to that box around 11.30PM. At first we thought they may have an internet related issue but went on-site first thing this morning to check further.
Unfortunately it appeared not to be internet but instead the Proxmox host itself.

I could not gain any access to this box on the console, just a black screen. Upon visual check of the internet router the LAN light (which connects for vmbr0) was off indicating no connection. It was almost like the server was just powered off but it was most certainly running.
After a hard reset things came back to normal but upon checking log files it became apparent that this server was still operating to some degree and even our KVM guests were operational...albeit no networking.

If anyone has ideas on what may have occured or caused this issue please feel free to suggest something. Below is the pveversion output, some general details and I have attached a snip of the syslog on that Proxmox host from just before the issue occured...the end of this log attachment shows looping error messages which continued from that time up until the manual restart around 8am this morning (Monday 12.03).

General details:
Dual socket Xeon E5645's
Supermicro Mainboard
24GB RAM
Adaptec A5405 Controller with 1 x 1TB SATA Array & 1 x 300GB SAS Array
2 x onboard Intel 82576 Gigabit Network Cards (plus a IPMI card which is not in use but enabled in BIOS)

We run 3 KVM guests if that is of any interest, 2 x Win2008 and 1 x Linux all of which were running when this occured. No other cron jobs / vzdumps etc were scheduled around the time of the incident.

pveversion -v
pve-manager: 1.9-26 (pve-manager/1.9/6567)
running kernel: 2.6.32-7-pve
proxmox-ve-2.6.32: 1.9-55+ovzfix-2
pve-kernel-2.6.32-6-pve: 2.6.32-55+ovzfix-1
pve-kernel-2.6.32-7-pve: 2.6.32-55+ovzfix-2
qemu-server: 1.1-32
pve-firmware: 1.0-15
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.29-3pve1
vzdump: 1.2-16
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-2
ksm-control-daemon: 1.0-6

The syslog attachment shows host PROXMOXHOST (I renamed this for client confidentiality etc).

On a side note we ourselves run a slightly older hardware version than this above server but very similar, same NICs. It also runs 1.9 with 2.6.32-7 kernel and we do not see this problem at our end.

Thanks in advance.
 

Attachments

  • syslog.txt
    9 KB · Views: 5

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!