Crash - Call Trace analysis

tincboy

Renowned Member
Apr 13, 2010
466
6
83
Yesterday one of my servers went down two times, and requires reboot.
In syslog file I've found two Call Trace right before server crashes,
Here it's a sample log:
Code:
Aug  7 01:17:01 c36 kernel: Pid: 21434, comm: sh Not tainted 2.6.35-2-pve #1 DH67CL/
Aug  7 01:17:01 c36 kernel: RIP: 0010:[<ffffffff81112a54>]  [<ffffffff81112a54>] mem_cgroup_charge_statistics+0x1f/0x4f
Aug  7 01:17:01 c36 kernel: RSP: 0018:ffff8805a81dfc58  EFLAGS: 00010202
Aug  7 01:17:01 c36 kernel: RAX: ffffffffffffffff RBX: ffff8807f4a616f0 RCX: 247c89028bec4589
Aug  7 01:17:01 c36 kernel: RDX: 0000000000000000 RSI: ffff88080d0ccdd0 RDI: ffff8805fc469000
Aug  7 01:17:01 c36 kernel: RBP: ffff8805a81dfc58 R08: ffff880001f185b0 R09: 000000000000000e
Aug  7 01:17:01 c36 kernel: R10: 000000000000001c R11: ffff8804b18e8558 R12: ffff8805fc469000
Aug  7 01:17:01 c36 kernel: R13: 00000000f2585801 R14: ffff88080d0ccdd0 R15: ffffea0012b47c80
Aug  7 01:17:01 c36 kernel: FS:  00007fdcb5fdf700(0000) GS:ffff880001f00000(0000) knlGS:0000000000000000
Aug  7 01:17:01 c36 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug  7 01:17:01 c36 kernel: CR2: 00007f1e0c0613a0 CR3: 00000007f1e9d000 CR4: 00000000000426e0
Aug  7 01:17:01 c36 kernel: DR0: 0000000000000003 DR1: 00000000000000b0 DR2: 0000000000000001
Aug  7 01:17:01 c36 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug  7 01:17:01 c36 kernel: Process sh (pid: 21434, threadinfo ffff8805a81de000, task ffff8807f4a616f0)
Aug  7 01:17:01 c36 kernel: Stack:
Aug  7 01:17:01 c36 kernel: ffff8805a81dfc98 ffffffff81114bae 0000000012b47c80 ffff8807f2585808
Aug  7 01:17:01 c36 kernel: <0> ffffea0012b47c80 ffff8807f2585820 00000000000add90 000000000007c2e8
Aug  7 01:17:01 c36 kernel: <0> ffff8805a81dfca8 ffffffff81114c09 ffff8805a81dfcd8 ffffffff810e0ad0
Aug  7 01:17:01 c36 kernel: Call Trace:
Aug  7 01:17:01 c36 kernel: [<ffffffff81114bae>] __mem_cgroup_uncharge_common+0x1ab/0x1f6
Aug  7 01:17:01 c36 kernel: [<ffffffff81114c09>] mem_cgroup_uncharge_cache_page+0x10/0x12
Aug  7 01:17:01 c36 kernel: [<ffffffff810e0ad0>] __remove_mapping+0xd0/0xf4
Aug  7 01:17:01 c36 kernel: [<ffffffff810e0b0a>] remove_mapping+0x16/0x2f
Aug  7 01:17:01 c36 kernel: [<ffffffff810dfb75>] invalidate_inode_page+0x84/0x8d
Aug  7 01:17:01 c36 kernel: [<ffffffff810dfc0c>] invalidate_mapping_pages+0x8e/0x114
Aug  7 01:17:01 c36 kernel: [<ffffffff81138853>] drop_pagecache_sb+0x7f/0xd4
Aug  7 01:17:01 c36 kernel: [<ffffffff811387d4>] ? drop_pagecache_sb+0x0/0xd4
Aug  7 01:17:01 c36 kernel: [<ffffffff8111c253>] iterate_supers+0x77/0xc0
Aug  7 01:17:01 c36 kernel: [<ffffffff811387ac>] drop_caches_sysctl_handler+0x30/0x58
Aug  7 01:17:01 c36 kernel: [<ffffffff8116aa9c>] proc_sys_call_handler+0x90/0xb6
Aug  7 01:17:01 c36 kernel: [<ffffffff8116aad6>] proc_sys_write+0x14/0x16
Aug  7 01:17:01 c36 kernel: [<ffffffff8111ae72>] vfs_write+0xb0/0x10a
Aug  7 01:17:01 c36 kernel: [<ffffffff8111af9a>] sys_write+0x4c/0x75
Aug  7 01:17:01 c36 kernel: [<ffffffff81009d32>] system_call_fastpath+0x16/0x1b
Aug  7 01:17:01 c36 kernel: Code: de 27 3a 00 58 5b 41 5c 41 5d c9 c3 55 48 89 e5 0f 1f 44 00 00 48 8b 8f 10 03 00 00 80 fa 01 19 c0 83 c8 01 f6 06 02 48 98 74 06 <65> 48 01 01 eb 05 65 48 01 41 08 84 d2 48 8b 8f 10 03 00 00 74
Aug  7 01:17:01 c36 kernel: RIP  [<ffffffff81112a54>] mem_cgroup_charge_statistics+0x1f/0x4f
Aug  7 01:17:01 c36 kernel: RSP <ffff8805a81dfc58>
Aug  7 01:17:01 c36 kernel: ---[ end trace bb7b44cc5df6739e ]---

How can I found what is wrong with that server? is it hardware issue?
 
2.6.35-2-pve is VERY oudated.

upgrade to latest 2.x.
 
2.6.35-2-pve is VERY oudated.

upgrade to latest 2.x.
This server is most updated version, I've upgrade it from Proxmox 1.9
Code:
pve-manager: 2.1-13 (pve-manager/2.1/bdd3663d)
running kernel: 2.6.35-2-pve
proxmox-ve-2.6.32: 2.1-72
pve-kernel-2.6.32-13-pve: 2.6.32-72
pve-kernel-2.6.32-6-pve: 2.6.32-47
pve-kernel-2.6.35-2-pve: 2.6.35-13
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-47
pve-firmware: 1.0-17
libpve-common-perl: 1.0-28
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-29
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-6
ksm-control-daemon: 1.1-1
 
but you still run the old kernel

> running kernel: 2.6.35-2-pve

reboot into the latest kernel by choosing it on the boot loader - pve-kernel-2.6.32-13-pve - and then remove the old one.

> aptitude remove pve-kernel-2.6.35-2-pve
 
I've no 2.6.32.13 in my menu.list
Code:
## ## End Default Options ##


title           Chainload into GRUB 2
root            (hd0,0)
kernel          /grub/core.img


title           ؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤ
root


title           When you have verified GRUB 2 works, you can use this command to
root


title           complete the upgrade:  upgrade-from-grub-legacy
root


title           ؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤؤ
root


title           Debian GNU/Linux, kernel 2.6.35-2-pve
root            (hd0,0)
kernel          /vmlinuz-2.6.35-2-pve root=/dev/mapper/pve-root ro
initrd          /initrd.img-2.6.35-2-pve


title           Debian GNU/Linux, kernel 2.6.32-6-pve
root            (hd0,0)
kernel          /vmlinuz-2.6.32-6-pve root=/dev/mapper/pve-root ro
initrd          /initrd.img-2.6.32-6-pve


title           Debian GNU/Linux, kernel memtest86+
root            (hd0,0)
kernel          /memtest86+.bin


### END DEBIAN AUTOMAGIC KERNELS LIST
Do you think using 2.6.32.13 will fix the Call Trace and server down issue?
 
grub2 does not really have a menu.lst - did you followed the upgrade howto? (e.g. upgrade-from-grub-legacy)