[SOLVED] Kernel Panicking, Cannot enable crash dumps

FuriousGeorge

Renowned Member
Sep 25, 2012
84
2
73
There doesn't seem to be any documentation for enabling crash dumps in debian, and what I've cobbled together does not seem to be working.

Code:
# cat /etc/default/kexec
# Defaults for kexec initscript
# sourced by /etc/init.d/kexec and /etc/init.d/kexec-load

# Load a kexec kernel (true/false)
LOAD_KEXEC=true

# Kernel and initrd image
KERNEL_IMAGE="/boot/vmlinuz-4.4.6-1-pve"
INITRD="/boot/initrd.img-4.4.6-1-pve"

# If empty, use current /proc/cmdline
APPEND=""

# Load the default kernel from grub config (true/false)
USE_GRUB_CONFIG=false

Code:
# cat /etc/default/grub
# If you change this file, run 'update-grub' afterwards to update
# /boot/grub/grub.cfg.
# For full documentation of the options in this file, see:
#   info -f grub -n 'Simple configuration'

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Proxmox Virtual Environment"
GRUB_CMDLINE_LINUX_DEFAULT="crashkernel=128M"
GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/pve-1 boot=zfs crashkernel=128M nmi_watchdog=1"

# Disable os-prober, it might add menu entries for each guest
# root FS on a local partition
GRUB_DISABLE_OS_PROBER=true

Code:
# cat /etc/default/kdump-tools
# kdump-tools configuration
# ---------------------------------------------------------------------------
# USE_KDUMP - controls kdump will be configured
#     0 - kdump kernel will not be loaded
#     1 - kdump kernel will be loaded and kdump is configured
# KDUMP_SYSCTL - controls when a panic occurs, using the sysctl
#     interface.  The contents of this variable should be the
#     "variable=value ..." portion of the 'sysctl -w ' command.
#     If not set, the default value "kernel.panic_on_oops=1" will
#     be used.  Disable this feature by setting KDUMP_SYSCTL=" "
#     Example - also panic on oom:
#         KDUMP_SYSCTL="kernel.panic_on_oops=1 vm.panic_on_oom=1"
#
USE_KDUMP=1
KDUMP_SYSCTL="kernel.panic_on_oops=1"


# ---------------------------------------------------------------------------
# Kdump Kernel:
# KDUMP_KERNEL - A full pathname to a kdump kernel.
# KDUMP_INITRD - A full pathname to the kdump initrd (if used).
#     If these are not set, kdump-config will try to use the current kernel
#     and initrd if it is relocatable.  Otherwise, you will need to specify
#     these manually.
#KDUMP_KERNEL=
#KDUMP_INITRD=


# ---------------------------------------------------------------------------
# vmcore Handling:
# KDUMP_COREDIR - local path to save the vmcore to.
# KDUMP_FAIL_CMD - This variable can be used to cause a reboot or
#     start a shell if saving the vmcore fails.  If not set, "reboot -f"
#     is the default.
#     Example - start a shell if the vmcore copy fails:
#         KDUMP_FAIL_CMD="echo 'makedumpfile FAILED.'; /bin/bash; reboot -f"
KDUMP_COREDIR="/var/crash"
KDUMP_FAIL_CMD="reboot -f"


# ---------------------------------------------------------------------------
# Makedumpfile options:
# DEBUG_KERNEL - a debug version of the running kernel.  If not set,
#     kdump-config will use /usr/lib/debug/vmlinux-$(uname -r) if it is
#     available.  If it is not available, makedumpfile will be limited to
#     dumping all pages in memory.
# MAKEDUMP_ARGS - extra arguments passed to makedumpfile (8).  The default,
#     if unset, is to pass '-c -d 31' telling makedumpfile to use compression
#     and reduce the corefile to in-use kernel pages only.
#DEBUG_KERNEL=
#MAKEDUMP_ARGS="-c -d 31"


# ---------------------------------------------------------------------------
# Kexec/Kdump args
# KDUMP_KEXEC_ARGS - Additional arguments to the kexec command used to load
#     the kdump kernel
#     Example - Use this option on x86 systems with PAE and more than
#     4 gig of memory:
#         KDUMP_KEXEC_ARGS="--elf64-core-headers"
# KDUMP_CMDLINE - The default is to use the contents of /proc/cmdline.
#     Set this variable to override /proc/cmdline.
# KDUMP_CMDLINE_APPEND - Additional arguments to append to the command line
#     for the kdump kernel.  If unset, it defaults to "irqpoll maxcpus=1 nousb"
#KDUMP_KEXEC_ARGS=""
#KDUMP_CMDLINE=""
#KDUMP_CMDLINE_APPEND="irqpoll maxcpus=1 nousb systemd.unit=kdump-tools.service"

# --------------------------------------------

The wiki suggests using netconsole. However, this does not work for me either:

Code:
# modprobe netconsole netconsole=@10.5.0.250/,@10.5.0.251/
modprobe: ERROR: could not insert 'netconsole': Device or resource busy

Any help is appreciated.
 
Last edited:
Crash-Dump works perfectly in Debian. This is my running and working configuration (had crashes in ZFS with HP MSA-60 and it dump's correctly).
  • Kernel-Commandline
Code:
crashkernel=256M
  • /etc/default/kdump-tools
Code:
USE_KDUMP=1
KDUMP_COREDIR="/var/crash"
KDUMP_SYSCTL="kernel.panic_on_oops=1 kernel.panic_on_unrecovered_nmi=1"
DEBUG_KERNEL=/vmlinuz
MAKEDUMP_ARGS="-c --message-level 7 -d 11,31"
  • Reboot and Check
Code:
root@backup ~ > uname -a
Linux backup 4.4.8-1-pve #1 SMP Tue May 31 07:12:32 CEST 2016 x86_64 GNU/Linux

root@backup ~ > service kdump-tools status
● kdump-tools.service - Kernel crash dump capture service
   Loaded: loaded (/lib/systemd/system/kdump-tools.service; enabled)
   Active: active (exited) since So 2016-06-12 13:37:37 CEST; 2 weeks 4 days ago
  Process: 3522 ExecStart=/etc/init.d/kdump-tools start (code=exited, status=0/SUCCESS)
Main PID: 3522 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/kdump-tools.service

Jun 12 13:37:37 backup kdump-tools[3522]: Starting kdump-tools: loaded kdump kernel.

General problem is that you need a debug kernel to analyze further, yet you get dmesg in the same folder as the core dump.
 
I've made some progress.

DEBUG_KERNEL was not set on my end. I copied my current kernel to / and called in vmlinuz, so as to match your config exactly.

Code:
# service kdump-tools status
● kdump-tools.service - Kernel crash dump capture service
   Loaded: loaded (/lib/systemd/system/kdump-tools.service; enabled)
   Active: active (exited) since Tue 2016-07-05 02:12:28 EDT; 36s ago
  Process: 2534 ExecStart=/etc/init.d/kdump-tools start (code=exited, status=0/SUCCESS)
 Main PID: 2534 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/kdump-tools.service
Jul 05 02:12:28 ads-proxmox-2 kdump-tools[2534]: Starting kdump-tools: loaded kdump kernel.
Jul 05 02:12:28 ads-proxmox-2 systemd[1]: Started Kernel crash dump capture service.

Then I simulate a crash:

Code:
# sync &&  echo c | tee /proc/sysrq-trigger

However, whereas I expect a kernel dump and the associated files in /var/crash, I instead get a link to my debug kernel which was not there before:

Code:
# ls -la /var/crash/
total 18
drwxr-xr-x  2 root root   4 Jul  5 02:14 .
drwxr-xr-x 12 root root  14 Jun 21 02:04 ..
lrwxrwxrwx  1 root root   8 Jul  5 02:14 kernel_link -> /vmlinuz
-rw-r--r--  1 root root 285 Jul  5 02:14 kexec_cmd

I'm new to this, so any help is much appreciated.
 
I've made some progress.

DEBUG_KERNEL was not set on my end. I copied my current kernel to / and called in vmlinuz, so as to match your config exactly.

I just symlinked mine :-D

I test via a real crash, not a triggered one. I found myself in the same situation as you did. I uploaded a zip file (no tar's allowed) which contains a kernel module to crash your system. Just build it with the build-script.
 

Attachments

I just symlinked mine :-D

I test via a real crash, not a triggered one. I found myself in the same situation as you did. I uploaded a zip file (no tar's allowed) which contains a kernel module to crash your system. Just build it with the build-script.

The thing crash on me before I could test the module. Once again, all I got was a link to the debug kernel. The date matches the time of the crash.
 
How much space is available on /var/crash? Maybe the crash does not fit?

Please try the crashdump and the simulation inside a VM to play around and the apply to the crashing host.
 
I had to reinstalled the system, and it worked. All I had to do was enable kdump in /etc/default/kdump-tools, and enable 256M of ram for the dumps in /etc/default/grub, then updated-grub.

Something I did must have been preventing this from working in the previous installation.