proxmox kernel: [ 9888.120017] BUG: unable to handle kernel paging request at 00000

sysV

New Member
Feb 4, 2016
7
0
1
41
Strange kernel BUG

I have 3 identical hw node with proxmox
Debian Jessie
proxmox-ve 4.1-34
pve-kernel-4.2.6-1-pve

32GB RAM

and all of them from time to time are hung with message
Feb 6 14:41:46 proxmox5 kernel: [ 9888.120017] BUG: unable to handle kernel paging request at 00000000fc0e5000

Can I do anything to prevent hung servers ?
 
Can you provide more information? We cannot help with only one line of error. Normally, there is some "cut here" message with the full backtrace.

BTW: Are you using swap on ZFS?
 
Hello. Thanks for reply.
I see such messages :

Code:
Feb  6 07:07:26 proxmox4 kernel: [56364.643663] BUG: unable to handle kernel paging request at 00000000fc0a1000
Feb  6 07:07:26 proxmox4 kernel: [56364.643694] IP: [<ffffffffc031a497>] kvm_zap_rmapp+0x47/0x60 [kvm]
Feb  6 07:07:26 proxmox4 kernel: [56364.643734] PGD 0
Feb  6 07:07:26 proxmox4 kernel: [56364.643753] Oops: 0000 [#1] SMP
Feb  6 07:07:26 proxmox4 kernel: [56364.643774] Modules linked in: binfmt_misc xt_REDIRECT nf_nat_redirect iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_set ip6table_filter ip6_tables iptable_filter
ip_tables x_tables softdog nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfnetlink_log nfnetlink intel_rapl ios
f_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd input_leds serio_raw i2c_i801 wmi 8250_fintek intel_lpss_acpi fujits
u_laptop intel_lpss video mac_hid acpi_pad tcp_htcp vhost_net vhost macvtap macvlan autofs4 btrfs xor raid6_pq raid1 e1000e(O) ptp ahci pps_core libahci pinctrl_sunrisepoint i2c_hid pinctrl_intel hid
Feb  6 07:07:26 proxmox4 kernel: [56364.644119] CPU: 3 PID: 68 Comm: ksmd Tainted: G           O    4.2.6-1-pve #1
Feb  6 07:07:26 proxmox4 kernel: [56364.644159] Hardware name: FUJITSU D3401-H1/D3401-H1, BIOS V5.0.0.11 R1.7.0.SR.2 for D3401-H1x                11/25/2015
Feb  6 07:07:26 proxmox4 kernel: [56364.644205] task: ffff88080cec5880 ti: ffff88080c94c000 task.ti: ffff88080c94c000
Feb  6 07:07:26 proxmox4 kernel: [56364.644246] RIP: 0010:[<ffffffffc031a497>]  [<ffffffffc031a497>] kvm_zap_rmapp+0x47/0x60 [kvm]
Feb  6 07:07:26 proxmox4 kernel: [56364.644300] RSP: 0018:ffff88080c94fbd8  EFLAGS: 00010206
Feb  6 07:07:26 proxmox4 kernel: [56364.644330] RAX: 0000000000000000 RBX: ffffc90005bd0c40 RCX: 0000000000037488
Feb  6 07:07:26 proxmox4 kernel: [56364.644356] RDX: 00000000fc0a1000 RSI: 00000000fc0a1000 RDI: ffff880025af8000
Feb  6 07:07:26 proxmox4 kernel: [56364.644383] RBP: ffff88080c94fbe8 R08: 0000000000000001 R09: 0000000000000000
Feb  6 07:07:26 proxmox4 kernel: [56364.644413] R10: 000000000efd2f80 R11: 00000000d1daac1c R12: ffff880025af8000
Feb  6 07:07:26 proxmox4 kernel: [56364.644439] R13: ffffffffc031a4b0 R14: 0000000000000000 R15: ffffc90012be1198
Feb  6 07:07:26 proxmox4 kernel: [56364.644466] FS:  0000000000000000(0000) GS:ffff8808314c0000(0000) knlGS:0000000000000000
Feb  6 07:07:26 proxmox4 kernel: [56364.644507] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb  6 07:07:26 proxmox4 kernel: [56364.644531] CR2: 00000000fc0a1000 CR3: 0000000112cda000 CR4: 00000000003426e0
Feb  6 07:07:26 proxmox4 kernel: [56364.644558] Stack:
Feb  6 07:07:26 proxmox4 kernel: [56364.644575]  0000000000000000 ffff880025af8000 ffff88080c94fbf8 ffffffffc031a4be
Feb  6 07:07:26 proxmox4 kernel: [56364.644618]  ffff88080c94fcb8 ffffffffc03172bc ffff880025af8048 ffff880025af8048
Feb  6 07:07:26 proxmox4 kernel: [56364.644665]  ffff880025af8038 00007f7bbe289000 00007f7bbe288000 ffffc90012beb008
Feb  6 07:07:26 proxmox4 kernel: [56364.644707] Call Trace:
Feb  6 07:07:26 proxmox4 kernel: [56364.644737]  [<ffffffffc031a4be>] kvm_unmap_rmapp+0xe/0x20 [kvm]
Feb  6 07:07:26 proxmox4 kernel: [56364.644769]  [<ffffffffc03172bc>] kvm_handle_hva_range+0x13c/0x1b0 [kvm]
Feb  6 07:07:26 proxmox4 kernel: [56364.644803]  [<ffffffffc0321397>] kvm_unmap_hva_range+0x17/0x20 [kvm]
Feb  6 07:07:26 proxmox4 kernel: [56364.644833]  [<ffffffffc02fce23>] kvm_mmu_notifier_invalidate_range_start+0x53/0x90 [kvm]
Feb  6 07:07:26 proxmox4 kernel: [56364.644877]  [<ffffffff811d63b9>] __mmu_notifier_invalidate_range_start+0x59/0x80
Feb  6 07:07:26 proxmox4 kernel: [56364.644918]  [<ffffffff811d810a>] try_to_merge_with_ksm_page+0x4ba/0x6c0
Feb  6 07:07:26 proxmox4 kernel: [56364.644945]  [<ffffffff811d8995>] ksm_scan_thread+0x685/0x1030
Feb  6 07:07:26 proxmox4 kernel: [56364.644971]  [<ffffffff810bd790>] ? wait_woken+0x90/0x90
Feb  6 07:07:26 proxmox4 kernel: [56364.644995]  [<ffffffff811d8310>] ? try_to_merge_with_ksm_page+0x6c0/0x6c0
Feb  6 07:07:26 proxmox4 kernel: [56364.645022]  [<ffffffff8109acaa>] kthread+0xea/0x100
Feb  6 07:07:26 proxmox4 kernel: [56364.645046]  [<ffffffff8109abc0>] ? kthread_create_on_node+0x1f0/0x1f0
Feb  6 07:07:26 proxmox4 kernel: [56364.645073]  [<ffffffff8180879f>] ret_from_fork+0x3f/0x70
Feb  6 07:07:26 proxmox4 kernel: [56364.645097]  [<ffffffff8109abc0>] ? kthread_create_on_node+0x1f0/0x1f0
Feb  6 07:07:26 proxmox4 kernel: [56364.645123] Code: eb 15 4c 89 e7 e8 4a ff ff ff 48 8b 13 b8 01 00 00 00 48 85 d2 74 1b f6 c2 01 48 89 d6 74 07 48 83 e2 fe 48 8b 32 48 85 f6 74 07 <f6> 06 01 75 d2 0f 0b 5b 41 5c 5d c3 31 c0 c3 66 2e 0f 1f 84 00
Feb  6 07:07:26 proxmox4 kernel: [56364.645249] RIP  [<ffffffffc031a497>] kvm_zap_rmapp+0x47/0x60 [kvm]
Feb  6 07:07:26 proxmox4 kernel: [56364.645285]  RSP <ffff88080c94fbd8>
Feb  6 07:07:26 proxmox4 kernel: [56364.645305] CR2: 00000000fc0a1000
Feb  6 07:07:26 proxmox4 kernel: [56364.645820] ---[ end trace 8e231bd79531436a ]---

I dont use zfs. Swap - yes, but it is not actively used.
For example:
Code:
root@proxmox4 ~ # free -m  | grep Swap
               total       used       free     shared    buffers     cached
Swap:        16367       1354      15013
 
And several times i see such messages :
Code:
Feb  5 08:16:50 proxmox2 kernel: [65634.606129] pte_list_remove:  ffff8803727cc2a0 1->BUG
Feb  5 08:16:50 proxmox2 kernel: [65634.606171] ------------[ cut here ]------------
Feb  5 08:16:50 proxmox2 kernel: [65634.606194] kernel BUG at arch/x86/kvm/mmu.c:972!
Feb  5 08:16:50 proxmox2 kernel: [65634.606217] invalid opcode: 0000 [#3] SMP
Feb  5 08:16:50 proxmox2 kernel: [65634.606239] Modules linked in: ip_set ip6table_filter ip6_tables xt_REDIRECT nf_nat_redirect iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables softdog nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfnetlink_log nfnetlink intel_rapl iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd input_leds serio_raw i2c_i801 wmi 8250_fintek fujitsu_laptop video intel_lpss_acpi intel_lpss acpi_pad mac_hid tcp_htcp vhost_net vhost macvtap macvlan autofs4 btrfs xor raid6_pq raid1 e1000e(O) ptp pps_core ahci libahci pinctrl_sunrisepoint i2c_hid pinctrl_intel hid
Feb  5 08:16:50 proxmox2 kernel: [65634.606576] CPU: 2 PID: 1700 Comm: kvm Tainted: G      D    O    4.2.6-1-pve #1
Feb  5 08:16:50 proxmox2 kernel: [65634.606616] Hardware name: FUJITSU D3401-H1/D3401-H1, BIOS V5.0.0.11 R1.7.0.SR.2 for D3401-H1x                11/25/2015
Feb  5 08:16:50 proxmox2 kernel: [65634.606662] task: ffff880806e93b00 ti: ffff8807ca88c000 task.ti: ffff8807ca88c000
Feb  5 08:16:50 proxmox2 kernel: [65634.606702] RIP: 0010:[<ffffffffc03185e4>]  [<ffffffffc03185e4>] pte_list_remove+0x114/0x130 [kvm]
Feb  5 08:16:50 proxmox2 kernel: [65634.606757] RSP: 0018:ffff8807ca88fb48  EFLAGS: 00010246
Feb  5 08:16:50 proxmox2 kernel: [65634.606780] RAX: 0000000000000029 RBX: ffff8803727cc2a0 RCX: 0000000000000029
Feb  5 08:16:50 proxmox2 kernel: [65634.606806] RDX: 0000000000000000 RSI: ffff88083148ea58 RDI: ffff88083148ea58
Feb  5 08:16:50 proxmox2 kernel: [65634.606832] RBP: ffff8807ca88fb48 R08: 0000000000000000 R09: 0000000000000000
Feb  5 08:16:50 proxmox2 kernel: [65634.606859] R10: 000000000000050d R11: ffffffff81a61f40 R12: ffff8807ca930000
Feb  5 08:16:50 proxmox2 kernel: [65634.606885] R13: ffff88035fc27bd0 R14: ffff8807ca932b08 R15: ffff88035fc273f0
Feb  5 08:16:50 proxmox2 kernel: [65634.606912] FS:  00007f507f5e8b40(0000) GS:ffff880831480000(0000) knlGS:0000000000000000
Feb  5 08:16:50 proxmox2 kernel: [65634.606953] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb  5 08:16:50 proxmox2 kernel: [65634.606977] CR2: 000000000ba77000 CR3: 00000007cfee8000 CR4: 00000000003426e0
Feb  5 08:16:50 proxmox2 kernel: [65634.607003] Stack:
Feb  5 08:16:50 proxmox2 kernel: [65634.607020]  ffff8807ca88fb78 ffffffffc031c444 0000000000000002 00000001f3caff77
Feb  5 08:16:50 proxmox2 kernel: [65634.607062]  ffff8803727cc2a0 ffff8807ca930000 ffff8807ca88fba8 ffffffffc031c8dd
Feb  5 08:16:50 proxmox2 kernel: [65634.607104]  ffff88035fc27bd0 00000000000002a8 ffff88035fc27bd0 ffff8807ca930000
Feb  5 08:16:50 proxmox2 kernel: [65634.607146] Call Trace:
Feb  5 08:16:50 proxmox2 kernel: [65634.607176]  [<ffffffffc031c444>] drop_spte+0x84/0x90 [kvm]
Feb  5 08:16:50 proxmox2 kernel: [65634.607206]  [<ffffffffc031c8dd>] mmu_page_zap_pte+0xdd/0xf0 [kvm]
Feb  5 08:16:50 proxmox2 kernel: [65634.607237]  [<ffffffffc031c946>] kvm_mmu_prepare_zap_page+0x56/0x320 [kvm]
Feb  5 08:16:50 proxmox2 kernel: [65634.607271]  [<ffffffffc0323c97>] kvm_mmu_invalidate_zap_all_pages+0xc7/0x120 [kvm]
Feb  5 08:16:50 proxmox2 kernel: [65634.607317]  [<ffffffffc03157be>] kvm_arch_flush_shadow_memslot+0xe/0x10 [kvm]
Feb  5 08:16:50 proxmox2 kernel: [65634.607362]  [<ffffffffc02ff8ef>] __kvm_set_memory_region+0x8ef/0xa80 [kvm]
Feb  5 08:16:50 proxmox2 kernel: [65634.607393]  [<ffffffffc02ffaaf>] kvm_set_memory_region+0x2f/0x50 [kvm]
Feb  5 08:16:50 proxmox2 kernel: [65634.607423]  [<ffffffffc02ffed9>] kvm_vm_ioctl+0x409/0x730 [kvm]
Feb  5 08:16:50 proxmox2 kernel: [65634.607450]  [<ffffffff81211bd0>] ? poll_select_copy_remaining+0x140/0x140
Feb  5 08:16:50 proxmox2 kernel: [65634.607477]  [<ffffffff810a6100>] ? wake_up_q+0x70/0x70
Feb  5 08:16:50 proxmox2 kernel: [65634.607501]  [<ffffffff81210f94>] do_vfs_ioctl+0x2c4/0x4a0
Feb  5 08:16:50 proxmox2 kernel: [65634.607526]  [<ffffffff810fbc15>] ? SyS_futex+0x85/0x180
Feb  5 08:16:50 proxmox2 kernel: [65634.607550]  [<ffffffff812111e9>] SyS_ioctl+0x79/0x90
Feb  5 08:16:50 proxmox2 kernel: [65634.607574]  [<ffffffff81808372>] entry_SYSCALL_64_fastpath+0x16/0x75
Feb  5 08:16:50 proxmox2 kernel: [65634.607600] Code: 8b 01 48 89 06 eb c4 48 89 fe 31 c0 48 c7 c7 a0 94 34 c0 e8 3a 6a 4e c1 0f 0b 48 89 fe 31 c0 48 c7 c7 80 94 34 c0 e8 27 6a 4e c1 <0f> 0b 48 89 fe 31 c0 48 c7 c7 56 86 34 c0 e8 14 6a 4e c1 0f 0b
Feb  5 08:16:50 proxmox2 kernel: [65634.607715] RIP  [<ffffffffc03185e4>] pte_list_remove+0x114/0x130 [kvm]
Feb  5 08:16:50 proxmox2 kernel: [65634.607750]  RSP <ffff8807ca88fb48>
Feb  5 08:16:50 proxmox2 kernel: [65634.608264] ---[ end trace 09545810173fdfb5 ]---
 
Thank you.

I can't disable ksm - I heavily use it. I have written bug report hope it helps.