Host kernel panic during backups

rwadi

New Member
Aug 5, 2013
8
0
1
Hi All,

I have an ongoing issue where one of two hosts kernel panics during backups. This issue does not happen all the time, but is happening enough to be a pain.

From what I can tell so far the panic is happening when the LVM snapshot of the OpenVZ container is happening, the console of the host displayed the following:

Code:
kernel:Kernel panic - not syncing: Fatal exception

The vzlog for the container in question shows:

Code:
Nov 21 21:30:52 INFO: Starting Backup of VM 101 (openvz)
Nov 21 21:30:52 INFO: CTID 101 exist mounted running
Nov 21 21:30:52 INFO: status = running
Nov 21 21:30:52 INFO: backup mode: snapshot
Nov 21 21:30:52 INFO: ionice priority: 7
Nov 21 21:30:52 INFO: creating lvm snapshot of /dev/mapper/pve-data ('/dev/pve/vzsnap-pm1-0')
The memory in the host has been tested and is good, I have a second host with the same hardware and software setup and have yet to see an issue with it. Both hosts are backing up to a Synology Rackstation over NFS.

Any ideas of what could be causing this? It has been an ongoing issue since I deployed Proxmox 2.2 on this host (since upgrade to 2.3, 3.0 and now 3.1)

The output of pvversion -v is:

Code:
proxmox-ve-2.6.32: 3.1-114 (running kernel: 2.6.32-26-pve)
pve-manager: 3.1-24 (running version: 3.1-24/060bd5a6)
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-19-pve: 2.6.32-96
pve-kernel-2.6.32-16-pve: 2.6.32-82
pve-kernel-2.6.32-22-pve: 2.6.32-107
pve-kernel-2.6.32-17-pve: 2.6.32-83
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-2
pve-cluster: 3.0-8
qemu-server: 3.1-8
pve-firmware: 1.0-23
libpve-common-perl: 3.0-9
libpve-access-control: 3.0-8
libpve-storage-perl: 3.0-18
pve-libspice-server1: 0.12.4-2
vncterm: 1.1-6
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.1-1
 
You need to know why the kernel panics.

Hopefully you have a serial port, maybe there are other methods but this is the one I am familiar with.

First you need to direct the console to the serial port on the problem machine.
edit /etc/default/grub, the bold section is what you are adding:
Code:
GRUB_CMDLINE_LINUX_DEFAULT="[B]console=ttyS0,9600n8 console=tty0 [/B]quiet"

Now update grub:
Code:
update-grub

connect serial cable from problem server to another server.
On the other server install some utilities:
Code:
apt-get install screen ttylog

open a screen session:
Code:
screen

in the screen session run ttylog directing the output to a file:
Code:
ttylog -b 9600 -d /dev/ttyS0 > file.txt

To exit screen you type ctrl+a then press d
To reattach to screen run
Code:
screen -r

Now reboot the problem node
If you look at the file.txt you should see that it is logging the kernel messages when the problem node is starting up, if so all is well, if not something is wrong.
To view the contents of the file while ttylog is running use tail:
Code:
tail file.txt

Once you have logged the kernel panic post it here and hopefully someone can decipher it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!