Kernel Panic

Ashley

Member
Jun 28, 2016
267
15
18
34
Hello,

I have 3 Proxmox compute nodes on a CEPH Cluster (no OSD's on the compute nodes),over the last few days I have been getting a Kernel Panic on one particular server, it normally happens around 24 hours time, server reboots it self and recovers however would be good to find the issue. The last reboot has changed the kernel from 4.4.35-1-pve to 4.4.35-2-pve, so not sure if this particular issue is fixed in this release.

From the log's at the time I am able to find the following :

Jan 29 09:00:26 n3 kernel: [259633.174020] Modules linked in: veth rbd libceph nfsv3 ip_set ip6table_filter ip6_tables iptable_filter ip_tables softdog x_tables nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_ad$
Jan 29 09:00:26 n3 kernel: [259633.176755] CPU: 30 PID: 12509 Comm: kvm Tainted: P O L 4.4.35-1-pve #1
Jan 29 09:00:26 n3 kernel: [259633.176757] Hardware name: HP ProLiant DL160 Gen9/ProLiant DL160 Gen9, BIOS U20 09/12/2016
Jan 29 09:00:26 n3 kernel: [259633.176760] 0000000000000086 00000000bee9a772 ffff883fff585b90 ffffffff813f9743
Jan 29 09:00:26 n3 kernel: [259633.176763] 0000000000000000 0000000000000000 ffff883fff585ba8 ffffffff8113bfbf
Jan 29 09:00:26 n3 kernel: [259633.176766] ffff883ff1c10000 ffff883fff585be0 ffffffff81184eb8 0000000000000001
Jan 29 09:00:26 n3 kernel: [259633.176769] Call Trace:
Jan 29 09:00:26 n3 kernel: [259633.176771] <NMI> [<ffffffff813f9743>] dump_stack+0x63/0x90
Jan 29 09:00:26 n3 kernel: [259633.176788] [<ffffffff8113bfbf>] watchdog_overflow_callback+0xbf/0xd0
Jan 29 09:00:26 n3 kernel: [259633.176793] [<ffffffff81184eb8>] __perf_event_overflow+0x88/0x1d0
Jan 29 09:00:26 n3 kernel: [259633.176796] [<ffffffff81185a84>] perf_event_overflow+0x14/0x20
Jan 29 09:00:26 n3 kernel: [259633.176802] [<ffffffff8100c6a1>] intel_pmu_handle_irq+0x1e1/0x490
Jan 29 09:00:26 n3 kernel: [259633.176809] [<ffffffff811cf09c>] ? vunmap_page_range+0x20c/0x330
Jan 29 09:00:26 n3 kernel: [259633.176812] [<ffffffff811cf1d1>] ? unmap_kernel_range_noflush+0x11/0x20
Jan 29 09:00:26 n3 kernel: [259633.176816] [<ffffffff814c6fde>] ? ghes_copy_tofrom_phys+0x11e/0x2a0
Jan 29 09:00:26 n3 kernel: [259633.176818] [<ffffffff814c71f8>] ? ghes_read_estatus+0x98/0x170
Jan 29 09:00:26 n3 kernel: [259633.176826] [<ffffffff810058dd>] perf_event_nmi_handler+0x2d/0x50
Jan 29 09:00:26 n3 kernel: [259633.176832] [<ffffffff810325d6>] nmi_handle+0x66/0x120
Jan 29 09:00:26 n3 kernel: [259633.176836] [<ffffffff81032b40>] default_do_nmi+0x40/0x100
Jan 29 09:00:26 n3 kernel: [259633.176838] [<ffffffff81032ce2>] do_nmi+0xe2/0x130
Jan 29 09:00:26 n3 kernel: [259633.176845] [<ffffffff8185e751>] end_repeat_nmi+0x1a/0x1e
Jan 29 09:00:26 n3 kernel: [259633.176851] [<ffffffff81406a19>] ? delay_tsc+0x39/0x50
Jan 29 09:00:26 n3 kernel: [259633.176854] [<ffffffff81406a19>] ? delay_tsc+0x39/0x50
Jan 29 09:00:26 n3 kernel: [259633.176857] [<ffffffff81406a19>] ? delay_tsc+0x39/0x50
Jan 29 09:00:26 n3 kernel: [259633.176858] <<EOE>> [<ffffffff8140691f>] __delay+0xf/0x20
Jan 29 09:00:26 n3 kernel: [259633.176899] [<ffffffffc0585aeb>] wait_lapic_expire+0x12b/0x130 [kvm]
Jan 29 09:00:26 n3 kernel: [259633.176915] [<ffffffffc0569a28>] kvm_arch_vcpu_ioctl_run+0x608/0x1460 [kvm]
Jan 29 09:00:26 n3 kernel: [259633.176929] [<ffffffffc0550eca>] kvm_vcpu_ioctl+0x31a/0x5e0 [kvm]
Jan 29 09:00:26 n3 kernel: [259633.176933] [<ffffffff810c3f18>] ? __wake_up_locked_key+0x18/0x20
Jan 29 09:00:26 n3 kernel: [259633.176938] [<ffffffff8125caf0>] ? eventfd_write+0xd0/0x270
Jan 29 09:00:26 n3 kernel: [259633.176941] [<ffffffff81222f22>] do_vfs_ioctl+0x2d2/0x4b0
Jan 29 09:00:26 n3 kernel: [259633.176945] [<ffffffff8118ae6b>] ? fire_user_return_notifiers+0x3b/0x50
Jan 29 09:00:26 n3 kernel: [259633.176949] [<ffffffff81003360>] ? exit_to_usermode_loop+0xb0/0xd0
Jan 29 09:00:26 n3 kernel: [259633.176951] [<ffffffff81223179>] SyS_ioctl+0x79/0x90
Jan 29 09:00:26 n3 kernel: [259633.176954] [<ffffffff81003c38>] ? syscall_return_slowpath+0x98/0x110
Jan 29 09:00:26 n3 kernel: [259633.176958] [<ffffffff8185c276>] entry_SYSCALL_64_fastpath+0x16/0x75
Jan 29 09:00:26 n3 kernel: [259644.705814] Modules linked in: veth rbd libceph nfsv3 ip_set ip6table_filter ip6_tables iptable_filter ip_tables softdog x_tables nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_ad$
Jan 29 09:00:26 n3 kernel: [259644.705892] CPU: 14 PID: 12511 Comm: kvm Tainted: P O L 4.4.35-1-pve #1
Jan 29 09:00:26 n3 kernel: [259644.705894] Hardware name: HP ProLiant DL160 Gen9/ProLiant DL160 Gen9, BIOS U20 09/12/2016
Jan 29 09:00:26 n3 kernel: [259644.705896] task: ffff88364ab64600 ti: ffff883bb1b08000 task.ti: ffff883bb1b08000
Jan 29 09:00:26 n3 kernel: [259644.705898] RIP: 0010:[<ffffffff8110445b>] [<ffffffff8110445b>] smp_call_function_single+0xdb/0x130
Jan 29 09:00:26 n3 kernel: [259644.705909] RSP: 0018:ffff883bb1b0bc58 EFLAGS: 00000202
Jan 29 09:00:26 n3 kernel: [259644.705911] RAX: 0000000000000000 RBX: 000000000000001e RCX: 0000000000000000
Jan 29 09:00:26 n3 kernel: [259644.705912] RDX: 0000000000000003 RSI: 0000000000000200 RDI: 0000000000000292
Jan 29 09:00:26 n3 kernel: [259644.705914] RBP: ffff883bb1b0bca0 R08: 0000000000000007 R09: 0000000000000000
Jan 29 09:00:26 n3 kernel: [259644.705915] R10: 0000000000000008 R11: 0000000000000000 R12: ffffffff810727a0
Jan 29 09:00:26 n3 kernel: [259644.705917] R13: 000000000000001e R14: ffffffff810727a0 R15: ffff883bb1b0bcf8
Jan 29 09:00:26 n3 kernel: [259644.705919] FS: 00007f36bfdff700(0000) GS:ffff883fff380000(0000) knlGS:0000000000000000
Jan 29 09:00:26 n3 kernel: [259644.705921] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 29 09:00:26 n3 kernel: [259644.705923] CR2: 00007f9739906000 CR3: 0000000a2a62d000 CR4: 00000000003426e0
Jan 29 09:00:26 n3 kernel: [259644.705924] Stack:
Jan 29 09:00:26 n3 kernel: [259644.705926] ffff88364ab64600 0000000000000000 0000000000000000 ffffffff810727a0
Jan 29 09:00:26 n3 kernel: [259644.705929] ffff883bb1b0bcf8 0000000000000003 000000007765096e ffff88006f6b06d0
Jan 29 09:00:26 n3 kernel: [259644.705931] 000000000000000e ffff883bb1b0bce8 ffffffff81104873 000000000000000e
Jan 29 09:00:26 n3 kernel: [259644.705934] Call Trace:
Jan 29 09:00:26 n3 kernel: [259644.705943] [<ffffffff810727a0>] ? do_flush_tlb_all+0x40/0x40
Jan 29 09:00:26 n3 kernel: [259644.705947] [<ffffffff81104873>] smp_call_function_many+0x213/0x260
Jan 29 09:00:26 n3 kernel: [259644.705950] [<ffffffff81072965>] native_flush_tlb_others+0x65/0x170
Jan 29 09:00:26 n3 kernel: [259644.705953] [<ffffffff81072ba3>] flush_tlb_mm_range+0x63/0x160
Jan 29 09:00:26 n3 kernel: [259644.705960] [<ffffffff811bd78c>] tlb_flush_mmu_tlbonly+0x6c/0xd0
Jan 29 09:00:26 n3 kernel: [259644.705963] [<ffffffff811be5e4>] tlb_finish_mmu+0x14/0x50
Jan 29 09:00:26 n3 kernel: [259644.705966] [<ffffffff811c043a>] zap_page_range+0xda/0x130
Jan 29 09:00:26 n3 kernel: [259644.705970] [<ffffffff811d398e>] SyS_madvise+0x38e/0x720
Jan 29 09:00:26 n3 kernel: [259644.705974] [<ffffffff81103b65>] ? SyS_futex+0x85/0x180
Jan 29 09:00:26 n3 kernel: [259644.705980] [<ffffffff8185c276>] entry_SYSCALL_64_fastpath+0x16/0x75


There is quite a few more Call Trace's before the server will hard reboot, if you need the full list of all traces do let me know.

Thanks,
Ashley
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!