Kernel panic maybe by a kernel bug

Eris

Renowned Member
Dec 16, 2015
12
0
66
42
itronic.at
Tomorrow morning one of our node crashed while making backups with the following stacktrace.

I maybe found a related entry at https://webcache.googleusercontent....org/patch/10174717/+&cd=1&hl=de&ct=clnk&gl=at
But I'm not sure if this patch is already included in kernel that runs on this node.



Linux pm02 4.15.17-3-pve #1 SMP PVE 4.15.17-14 (Wed, 27 Jun 2018 17:18:05 +0200) x86_64 GNU/Linux​

2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058681] WARNING: CPU: 11 PID: 14737 at arch/x86/kvm/mmu.c:734 mmu_spte_clear_track_bits+0x90/0x120 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058684] Modules linked in: xt_mac xt_NFLOG veth binfmt_misc nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6
_tables ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_multiport xt_conntrack nf_conntrack libcrc32c ip_set_hash_net ip_set dm_round_robin iptable_filter softdog nfnetlink_log nfnetlink edac_mce_amd mgag200 tt
m kvm_amd drm_kms_helper kvm snd_pcm drm snd_timer snd i2c_algo_bit fb_sys_fops ipmi_si syscopyarea sysfillrect ipmi_devintf joydev input_leds irqbypass sysimgblt soundcore ipmi_msghandler serio_raw pcspkr k10temp shpchp mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua vho
st_net
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058726] vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi sunrpc scsi_transport_iscsi ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) hid_generic usbkbd usbmo
use usbhid hid pata_acpi psmouse pata_atiixp igb(O) i2c_piix4 dca ahci ptp libahci pps_core
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058756] CPU: 11 PID: 14737 Comm: kvm Tainted: P O 4.15.17-3-pve #1
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058757] Hardware name: Supermicro H8DGT /H8DGT , BIOS 3.5a 04/08/2015
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058778] RIP: 0010:mmu_spte_clear_track_bits+0x90/0x120 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058780] RSP: 0018:ffffb68da99aba98 EFLAGS: 00010246
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058782] RAX: 0000000000000000 RBX: 0000000000400000 RCX: ffff9b91dffd54df
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058783] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffe6cb00010000
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058784] RBP: ffffb68da99abab8 R08: 0000000000000000 R09: 0000000000000001
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058785] R10: 0000000000000000 R11: 0400000000000000 R12: 0000000000000400
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058786] R13: 0000000000000007 R14: 000000000003abfb R15: 0000000000000000
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058788] FS: 00007f91f573b700(0000) GS:ffff9b97dfcc0000(0000) knlGS:0000000000000000
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058789] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058790] CR2: 00007fb1af4a0480 CR3: 000000060b630000 CR4: 00000000000006e0
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058791] Call Trace:
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058814] drop_spte+0x1a/0xb0 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058834] mmu_set_spte+0xbc/0x2e0 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058855] __direct_map.part.127+0x1a3/0x220 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058875] tdp_page_fault+0x264/0x290 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058895] kvm_mmu_page_fault+0x62/0x160 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058902] npf_interception+0x4c/0xa0 [kvm_amd]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058905] handle_exit+0x128/0xa10 [kvm_amd]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058925] kvm_arch_vcpu_ioctl_run+0x92e/0x16c0 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058929] ? svm_vcpu_load+0x115/0x140 [kvm_amd]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058947] ? kvm_arch_vcpu_load+0x68/0x250 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058964] kvm_vcpu_ioctl+0x339/0x620 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058980] ? kvm_vcpu_ioctl+0x339/0x620 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058985] ? __wake_up_locked_key+0x1b/0x20
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058987] do_vfs_ioctl+0xa6/0x620
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058991] ? SyS_futex+0x83/0x180
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058993] SyS_ioctl+0x79/0x90
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058996] do_syscall_64+0x73/0x130
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.058999] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059000] RIP: 0033:0x7fa2ad550dd7
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059001] RSP: 002b:00007fa1a03fc538 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059003] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fa2ad550dd7
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059004] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001e
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059005] RBP: 00007fa2a1f3a000 R08: 0000555e5b3e0dd0 R09: 00000000000000ff
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059006] R10: 00007fa2c5a63000 R11: 0000000000000246 R12: 0000000000000000
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059007] R13: 00007fa2c5a62000 R14: 0000000000000000 R15: 00007fa2a1f3a000
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059009] Code: 27 fe ff 84 c0 75 26 4c 89 e0 48 c1 e0 06 48 03 05 8e 26 b5 da 48 8b 50 20 48 8d 4a ff 83 e2 01 48 0f 45 c1 8b 40 1c 85 c0 75 02 <0f> 0b 48 b8 00 00 00 00 00 00 00 40 48 21 d8 49 89 c5 75 50 48
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059040] ---[ end trace ee69285ea2f6dcb1 ]---
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059044] pte_list_remove: 0000000078ba78a3 0->BUG
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059100] ------------[ cut here ]------------
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059102] kernel BUG at arch/x86/kvm/mmu.c:1209!
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059138] invalid opcode: 0000 [#1] SMP NOPTI
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059168] Modules linked in: xt_mac xt_NFLOG veth binfmt_misc nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache ebtable_filter ebtables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6
_tables ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_multiport xt_conntrack nf_conntrack libcrc32c ip_set_hash_net ip_set dm_round_robin iptable_filter softdog nfnetlink_log nfnetlink edac_mce_amd mgag200 tt
m kvm_amd drm_kms_helper kvm snd_pcm drm snd_timer snd i2c_algo_bit fb_sys_fops ipmi_si syscopyarea sysfillrect ipmi_devintf joydev input_leds irqbypass sysimgblt soundcore ipmi_msghandler serio_raw pcspkr k10temp shpchp mac_hid dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua vho
st_net
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059574] vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi sunrpc scsi_transport_iscsi ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) hid_generic usbkbd usbmo
use usbhid hid pata_acpi psmouse pata_atiixp igb(O) i2c_piix4 dca ahci ptp libahci pps_core
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059750] CPU: 11 PID: 14737 Comm: kvm Tainted: P W O 4.15.17-3-pve #1
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059794] Hardware name: Supermicro H8DGT /H8DGT , BIOS 3.5a 04/08/2015
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059863] RIP: 0010:pte_list_remove+0x11c/0x120 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059896] RSP: 0018:ffffb68da99abab8 EFLAGS: 00010286
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059928] RAX: 0000000000000028 RBX: ffff9b90e0e73fd8 RCX: 0000000000000006
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.059970] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff9b97dfcd6490
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060012] RBP: ffffb68da99abab8 R08: 0000000000000000 R09: 000000000000069d
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060053] R10: ffff9b97b6500008 R11: 00000000ffffffff R12: ffff9b93c3240000
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060096] R13: 0000000000000007 R14: 000000000003abfb R15: 0000000000000000
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060137] FS: 00007f91f573b700(0000) GS:ffff9b97dfcc0000(0000) knlGS:0000000000000000
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060184] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060219] CR2: 00007fb1af4a0480 CR3: 000000060b630000 CR4: 00000000000006e0
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060261] Call Trace:
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060302] drop_spte+0x80/0xb0 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060348] mmu_set_spte+0xbc/0x2e0 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060395] __direct_map.part.127+0x1a3/0x220 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060448] tdp_page_fault+0x264/0x290 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060495] kvm_mmu_page_fault+0x62/0x160 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060529] npf_interception+0x4c/0xa0 [kvm_amd]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060561] handle_exit+0x128/0xa10 [kvm_amd]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060609] kvm_arch_vcpu_ioctl_run+0x92e/0x16c0 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060645] ? svm_vcpu_load+0x115/0x140 [kvm_amd]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.060693] ? kvm_arch_vcpu_load+0x68/0x250 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.062335] kvm_vcpu_ioctl+0x339/0x620 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.063907] ? kvm_vcpu_ioctl+0x339/0x620 [kvm]
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.065415] ? __wake_up_locked_key+0x1b/0x20
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.066890] do_vfs_ioctl+0xa6/0x620
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.068333] ? SyS_futex+0x83/0x180
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.069754] SyS_ioctl+0x79/0x90
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.071176] do_syscall_64+0x73/0x130
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.072595] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.074022] RIP: 0033:0x7fa2ad550dd7
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.075424] RSP: 002b:00007fa1a03fc538 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.076845] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fa2ad550dd7
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.078248] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001e
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.079626] RBP: 00007fa2a1f3a000 R08: 0000555e5b3e0dd0 R09: 00000000000000ff
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.081000] R10: 00007fa2c5a63000 R11: 0000000000000246 R12: 0000000000000000
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.082374] R13: 00007fa2c5a62000 R14: 0000000000000000 R15: 00007fa2a1f3a000
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.083708] Code: b0 f1 12 c1 e8 96 6a 7f d9 0f 0b 48 89 fe 48 c7 c7 90 f1 12 c1 e8 85 6a 7f d9 0f 0b 48 89 fe 48 c7 c7 46 e0 12 c1 e8 74 6a 7f d9 <0f> 0b 66 90 0f 1f 44 00 00 48 8b 06 48 39 c6 0f 84 bc 00 00 00
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.086464] RIP: pte_list_remove+0x11c/0x120 [kvm] RSP: ffffb68da99abab8
2018-07-26T02:09:21+02:00 pm02 kernel: [1763716.087828] ---[ end trace ee69285ea2f6dcb2 ]---​

The best on this is that that VMs get auto migrated by HA but fails to start because there boot disk config automatically changed to a devices that doesn't exists (from scsi0 to virtio0). I'm not sure why this happens.
Maybe the ISCSI MSA (with multipath) didn't connect fast enough on boot but even then it shouldn't change the boot device or?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!