Some VMs just crashing after upgrade 3.4->4.2

BloodyIron

Renowned Member
Jan 14, 2013
302
27
93
it.lanified.com
Hi Folks,

I recently upgraded my 2-node cluster from 3.4 to 4.2. At first appearance it all went really well, except now I have instability in just a few VMs, and I cannot ascertain why.

It only consistently happens with specific ones, namely my ownCloud VM, when doing regular stuff like upgrading packages with apt. The whole VM crashes, and I get a trace from /var/log/syslog on the proxmox node itself.

This is the kind of error I get:

Code:
Apr  8 13:01:47 REDACTED kernel: [  434.002921] ------------[ cut here ]------------
Apr  8 13:01:47 REDACTED kernel: [  434.002951] WARNING: CPU: 4 PID: 2244 at arch/x86/kvm/emulate.c:5410 x86_emulate_insn+0xbb2/0xe30 [kvm]()
Apr  8 13:01:47 REDACTED kernel: [  434.002953] Modules linked in: nfsv3 ip_set ip6table_filter ip6_tables binfmt_misc iptable_filter ip_tables x_tables softdog nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfnetlink_log nfnetlink amdkfd amd_iommu_v2 radeon kvm_amd snd_hda_codec_hdmi snd_hda_intel kvm ttm snd_hda_codec drm_kms_helper input_leds drm snd_hda_core snd_hwdep snd_pcm crct10dif_pclmul crc32_pclmul i2c_algo_bit pcspkr aesni_intel serio_raw ppdev aes_x86_64 snd_timer snd lrw soundcore parport_pc gf128mul glue_helper parport asus_atk0110 ablk_helper 8250_fintek cryptd wmi shpchp k10temp fam15h_power i2c_piix4 edac_core edac_mce_amd mac_hid vhost_net vhost macvtap macvlan autofs4 dm_mirror dm_region_hash dm_log hid_generic usbkbd usbmouse usbhid hid pata_acpi psmouse pata_atiixp ahci r8169 mii libahci
Apr  8 13:01:47 REDACTED kernel: [  434.003007] CPU: 4 PID: 2244 Comm: kvm Not tainted 4.2.8-1-pve #1
Apr  8 13:01:47 REDACTED kernel: [  434.003010] Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 2101    12/02/2014
Apr  8 13:01:47 REDACTED kernel: [  434.003012]  0000000000000000 000000003d928ed0 ffff8806678b3b98 ffffffff81803b2b
Apr  8 13:01:47 REDACTED kernel: [  434.003014]  0000000000000000 0000000000000000 ffff8806678b3bd8 ffffffff8107bc4a
Apr  8 13:01:47 REDACTED kernel: [  434.003017]  ffff8800a7d0e3a0 ffff8800a7d0e3a0 0000000000000006 ffffffffc0359840
Apr  8 13:01:47 REDACTED kernel: [  434.003019] Call Trace:
Apr  8 13:01:47 REDACTED kernel: [  434.003027]  [<ffffffff81803b2b>] dump_stack+0x45/0x57
Apr  8 13:01:47 REDACTED kernel: [  434.003031]  [<ffffffff8107bc4a>] warn_slowpath_common+0x8a/0xc0
Apr  8 13:01:47 REDACTED kernel: [  434.003034]  [<ffffffff8107bd7a>] warn_slowpath_null+0x1a/0x20
Apr  8 13:01:47 REDACTED kernel: [  434.003047]  [<ffffffffc0349622>] x86_emulate_insn+0xbb2/0xe30 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003059]  [<ffffffffc032cb4d>] x86_emulate_instruction+0x1bd/0x730 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003064]  [<ffffffffc03933d0>] ? nested_svm_get_tdp_cr3+0x20/0x20 [kvm_amd]
Apr  8 13:01:47 REDACTED kernel: [  434.003067]  [<ffffffffc0393f52>] ud_interception+0x22/0x40 [kvm_amd]
Apr  8 13:01:47 REDACTED kernel: [  434.003070]  [<ffffffffc03933d0>] ? nested_svm_get_tdp_cr3+0x20/0x20 [kvm_amd]
Apr  8 13:01:47 REDACTED kernel: [  434.003074]  [<ffffffffc0396fd2>] handle_exit+0x132/0x990 [kvm_amd]
Apr  8 13:01:47 REDACTED kernel: [  434.003084]  [<ffffffffc032218c>] ? kvm_set_cr8+0x1c/0x20 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003087]  [<ffffffffc03933d0>] ? nested_svm_get_tdp_cr3+0x20/0x20 [kvm_amd]
Apr  8 13:01:47 REDACTED kernel: [  434.003099]  [<ffffffffc0330f26>] kvm_arch_vcpu_ioctl_run+0x656/0x1200 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003110]  [<ffffffffc032a868>] ? kvm_arch_vcpu_load+0x58/0x1a0 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003119]  [<ffffffffc03198c0>] kvm_vcpu_ioctl+0x320/0x5c0 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003123]  [<ffffffff8102cedc>] ? x86_pmu_enable+0x25c/0x2e0
Apr  8 13:01:47 REDACTED kernel: [  434.003132]  [<ffffffff81173662>] ? perf_pmu_enable+0x22/0x30
Apr  8 13:01:47 REDACTED kernel: [  434.003134]  [<ffffffff81174dbb>] ? perf_event_context_sched_in+0x8b/0xb0
Apr  8 13:01:47 REDACTED kernel: [  434.003140]  [<ffffffff81211834>] do_vfs_ioctl+0x2c4/0x4a0
Apr  8 13:01:47 REDACTED kernel: [  434.003144]  [<ffffffff810fc2c5>] ? SyS_futex+0x85/0x180
Apr  8 13:01:47 REDACTED kernel: [  434.003147]  [<ffffffff81211a89>] SyS_ioctl+0x79/0x90
Apr  8 13:01:47 REDACTED kernel: [  434.003150]  [<ffffffff8180ab72>] entry_SYSCALL_64_fastpath+0x16/0x75
Apr  8 13:01:47 REDACTED kernel: [  434.003152] ---[ end trace ad7354c7a139129b ]---
Apr  8 13:01:47 REDACTED kernel: [  434.003154] ------------[ cut here ]------------
Apr  8 13:01:47 REDACTED kernel: [  434.003165] WARNING: CPU: 4 PID: 2244 at arch/x86/kvm/x86.c:337 exception_type+0x49/0x50 [kvm]()
Apr  8 13:01:47 REDACTED kernel: [  434.003166] Modules linked in: nfsv3 ip_set ip6table_filter ip6_tables binfmt_misc iptable_filter ip_tables x_tables softdog nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfnetlink_log nfnetlink amdkfd amd_iommu_v2 radeon kvm_amd snd_hda_codec_hdmi snd_hda_intel kvm ttm snd_hda_codec drm_kms_helper input_leds drm snd_hda_core snd_hwdep snd_pcm crct10dif_pclmul crc32_pclmul i2c_algo_bit pcspkr aesni_intel serio_raw ppdev aes_x86_64 snd_timer snd lrw soundcore parport_pc gf128mul glue_helper parport asus_atk0110 ablk_helper 8250_fintek cryptd wmi shpchp k10temp fam15h_power i2c_piix4 edac_core edac_mce_amd mac_hid vhost_net vhost macvtap macvlan autofs4 dm_mirror dm_region_hash dm_log hid_generic usbkbd usbmouse usbhid hid pata_acpi psmouse pata_atiixp ahci r8169 mii libahci
Apr  8 13:01:47 REDACTED kernel: [  434.003204] CPU: 4 PID: 2244 Comm: kvm Tainted: G        W       4.2.8-1-pve #1
Apr  8 13:01:47 REDACTED kernel: [  434.003205] Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 2101    12/02/2014
Apr  8 13:01:47 REDACTED kernel: [  434.003207]  0000000000000000 000000003d928ed0 ffff8806678b3bb8 ffffffff81803b2b
Apr  8 13:01:47 REDACTED kernel: [  434.003209]  0000000000000000 0000000000000000 ffff8806678b3bf8 ffffffff8107bc4a
Apr  8 13:01:47 REDACTED kernel: [  434.003211]  0000000000000000 0000000000000000 0000000000000046 0000000000000000
Apr  8 13:01:47 REDACTED kernel: [  434.003214] Call Trace:
Apr  8 13:01:47 REDACTED kernel: [  434.003216]  [<ffffffff81803b2b>] dump_stack+0x45/0x57
Apr  8 13:01:47 REDACTED kernel: [  434.003219]  [<ffffffff8107bc4a>] warn_slowpath_common+0x8a/0xc0
Apr  8 13:01:47 REDACTED kernel: [  434.003222]  [<ffffffff8107bd7a>] warn_slowpath_null+0x1a/0x20
Apr  8 13:01:47 REDACTED kernel: [  434.003232]  [<ffffffffc0321a59>] exception_type+0x49/0x50 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003243]  [<ffffffffc032cd23>] x86_emulate_instruction+0x393/0x730 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003246]  [<ffffffffc03933d0>] ? nested_svm_get_tdp_cr3+0x20/0x20 [kvm_amd]
Apr  8 13:01:47 REDACTED kernel: [  434.003249]  [<ffffffffc0393f52>] ud_interception+0x22/0x40 [kvm_amd]
Apr  8 13:01:47 REDACTED kernel: [  434.003252]  [<ffffffffc03933d0>] ? nested_svm_get_tdp_cr3+0x20/0x20 [kvm_amd]
Apr  8 13:01:47 REDACTED kernel: [  434.003255]  [<ffffffffc0396fd2>] handle_exit+0x132/0x990 [kvm_amd]
Apr  8 13:01:47 REDACTED kernel: [  434.003265]  [<ffffffffc032218c>] ? kvm_set_cr8+0x1c/0x20 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003269]  [<ffffffffc03933d0>] ? nested_svm_get_tdp_cr3+0x20/0x20 [kvm_amd]
Apr  8 13:01:47 REDACTED kernel: [  434.003280]  [<ffffffffc0330f26>] kvm_arch_vcpu_ioctl_run+0x656/0x1200 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003292]  [<ffffffffc032a868>] ? kvm_arch_vcpu_load+0x58/0x1a0 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003300]  [<ffffffffc03198c0>] kvm_vcpu_ioctl+0x320/0x5c0 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003303]  [<ffffffff8102cedc>] ? x86_pmu_enable+0x25c/0x2e0
Apr  8 13:01:47 REDACTED kernel: [  434.003306]  [<ffffffff81173662>] ? perf_pmu_enable+0x22/0x30
Apr  8 13:01:47 REDACTED kernel: [  434.003308]  [<ffffffff81174dbb>] ? perf_event_context_sched_in+0x8b/0xb0
Apr  8 13:01:47 REDACTED kernel: [  434.003311]  [<ffffffff81211834>] do_vfs_ioctl+0x2c4/0x4a0
Apr  8 13:01:47 REDACTED kernel: [  434.003314]  [<ffffffff810fc2c5>] ? SyS_futex+0x85/0x180
Apr  8 13:01:47 REDACTED kernel: [  434.003316]  [<ffffffff81211a89>] SyS_ioctl+0x79/0x90
Apr  8 13:01:47 REDACTED kernel: [  434.003319]  [<ffffffff8180ab72>] entry_SYSCALL_64_fastpath+0x16/0x75
Apr  8 13:01:47 REDACTED kernel: [  434.003321] ---[ end trace ad7354c7a139129c ]---

(second half of snip is in second post, THANKS 10,000 CHAR LIMIT)

Now, I've updated the BIOS to try to address this, to the latest version, but it did not correct it.

I'm using kernel 4.2.8-1-pve, and I upgraded yesterday.

The two VMs this is happening to I am now considering 100% unstable, as I can't do any fucking work with them at all.

The only idea that I have so far is rolling back to an earlier kernel, but otherwise I have no clue what is causing this.

I was NOT having this kind of crashing before the major upgrade.

Please help.
 
second block of crash log:

Code:
Apr  8 13:01:47 REDACTED kernel: [  434.003323] ------------[ cut here ]------------
Apr  8 13:01:47 REDACTED kernel: [  434.003333] WARNING: CPU: 4 PID: 2244 at arch/x86/kvm/x86.c:337 exception_type+0x49/0x50 [kvm]()
Apr  8 13:01:47 REDACTED kernel: [  434.003334] Modules linked in: nfsv3 ip_set ip6table_filter ip6_tables binfmt_misc iptable_filter ip_tables x_tables softdog nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfnetlink_log nfnetlink amdkfd amd_iommu_v2 radeon kvm_amd snd_hda_codec_hdmi snd_hda_intel kvm ttm snd_hda_codec drm_kms_helper input_leds drm snd_hda_core snd_hwdep snd_pcm crct10dif_pclmul crc32_pclmul i2c_algo_bit pcspkr aesni_intel serio_raw ppdev aes_x86_64 snd_timer snd lrw soundcore parport_pc gf128mul glue_helper parport asus_atk0110 ablk_helper 8250_fintek cryptd wmi shpchp k10temp fam15h_power i2c_piix4 edac_core edac_mce_amd mac_hid vhost_net vhost macvtap macvlan autofs4 dm_mirror dm_region_hash dm_log hid_generic usbkbd usbmouse usbhid hid pata_acpi psmouse pata_atiixp ahci r8169 mii libahci
Apr  8 13:01:47 REDACTED kernel: [  434.003372] CPU: 4 PID: 2244 Comm: kvm Tainted: G        W       4.2.8-1-pve #1
Apr  8 13:01:47 REDACTED kernel: [  434.003373] Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 2101    12/02/2014
Apr  8 13:01:47 REDACTED kernel: [  434.003375]  0000000000000000 000000003d928ed0 ffff8806678b3ca8 ffffffff81803b2b
Apr  8 13:01:47 REDACTED kernel: [  434.003377]  0000000000000000 0000000000000000 ffff8806678b3ce8 ffffffff8107bc4a
Apr  8 13:01:47 REDACTED kernel: [  434.003379]  ffff8800a7d0eea0 ffff8800a7d0bdc0 0000000000000000 0000000000000000
Apr  8 13:01:47 REDACTED kernel: [  434.003381] Call Trace:
Apr  8 13:01:47 REDACTED kernel: [  434.003384]  [<ffffffff81803b2b>] dump_stack+0x45/0x57
Apr  8 13:01:47 REDACTED kernel: [  434.003387]  [<ffffffff8107bc4a>] warn_slowpath_common+0x8a/0xc0
Apr  8 13:01:47 REDACTED kernel: [  434.003389]  [<ffffffff8107bd7a>] warn_slowpath_null+0x1a/0x20
Apr  8 13:01:47 REDACTED kernel: [  434.003399]  [<ffffffffc0321a59>] exception_type+0x49/0x50 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003411]  [<ffffffffc033159f>] kvm_arch_vcpu_ioctl_run+0xccf/0x1200 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003422]  [<ffffffffc032a868>] ? kvm_arch_vcpu_load+0x58/0x1a0 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003431]  [<ffffffffc03198c0>] kvm_vcpu_ioctl+0x320/0x5c0 [kvm]
Apr  8 13:01:47 REDACTED kernel: [  434.003433]  [<ffffffff8102cedc>] ? x86_pmu_enable+0x25c/0x2e0
Apr  8 13:01:47 REDACTED kernel: [  434.003436]  [<ffffffff81173662>] ? perf_pmu_enable+0x22/0x30
Apr  8 13:01:47 REDACTED kernel: [  434.003438]  [<ffffffff81174dbb>] ? perf_event_context_sched_in+0x8b/0xb0
Apr  8 13:01:47 REDACTED kernel: [  434.003441]  [<ffffffff81211834>] do_vfs_ioctl+0x2c4/0x4a0
Apr  8 13:01:47 REDACTED kernel: [  434.003443]  [<ffffffff810fc2c5>] ? SyS_futex+0x85/0x180
Apr  8 13:01:47 REDACTED kernel: [  434.003446]  [<ffffffff81211a89>] SyS_ioctl+0x79/0x90
Apr  8 13:01:47 REDACTED kernel: [  434.003448]  [<ffffffff8180ab72>] entry_SYSCALL_64_fastpath+0x16/0x75
Apr  8 13:01:47 REDACTED kernel: [  434.003450] ---[ end trace ad7354c7a139129d ]---
Apr  8 13:01:47 REDACTED kernel: [  434.003453] KVM: FAILED VMRUN WITH VMCB:
Apr  8 13:01:47 REDACTED kernel: [  434.003477] VMCB Control Area:
Apr  8 13:01:47 REDACTED kernel: [  434.003493] cr_read:            0011
Apr  8 13:01:47 REDACTED kernel: [  434.003510] cr_write:           0011
Apr  8 13:01:47 REDACTED kernel: [  434.003557] dr_read:            00ff
Apr  8 13:01:47 REDACTED kernel: [  434.003574] dr_write:           00ff
Apr  8 13:01:47 REDACTED kernel: [  434.003591] exceptions:         000600c2
Apr  8 13:01:47 REDACTED kernel: [  434.003609] intercepts:         00002e7fbdc48037
Apr  8 13:01:47 REDACTED kernel: [  434.003630] pause filter count: 3000
Apr  8 13:01:47 REDACTED kernel: [  434.003647] iopm_base_pa:       0000000804f54000
Apr  8 13:01:47 REDACTED kernel: [  434.003668] msrpm_base_pa:      00000006683ae000
Apr  8 13:01:47 REDACTED kernel: [  434.003692] tsc_offset:         fffffe89fd2c2cba
Apr  8 13:01:47 REDACTED kernel: [  434.003713] asid:               1589
Apr  8 13:01:47 REDACTED kernel: [  434.003733] tlb_ctl:            0
Apr  8 13:01:47 REDACTED kernel: [  434.003751] int_ctl:            010f0100
Apr  8 13:01:47 REDACTED kernel: [  434.003771] int_vector:         00000000
Apr  8 13:01:47 REDACTED kernel: [  434.003789] int_state:          00000000
Apr  8 13:01:47 REDACTED kernel: [  434.003808] exit_code:          ffffffff
Apr  8 13:01:47 REDACTED kernel: [  434.003827] exit_info1:         0000000000000000
Apr  8 13:01:47 REDACTED kernel: [  434.003849] exit_info2:         0000000000000000
Apr  8 13:01:47 REDACTED kernel: [  434.003871] exit_int_info:      00000000
Apr  8 13:01:47 REDACTED kernel: [  434.003889] exit_int_info_err:  00000000
Apr  8 13:01:47 REDACTED kernel: [  434.003908] nested_ctl:         1
Apr  8 13:01:47 REDACTED kernel: [  434.003924] nested_cr3:         000000066a12b000
Apr  8 13:01:47 REDACTED kernel: [  434.003945] event_inj:          800003ff
Apr  8 13:01:47 REDACTED kernel: [  434.003963] event_inj_err:      00000000
Apr  8 13:01:47 REDACTED kernel: [  434.003982] lbr_ctl:            0
Apr  8 13:01:47 REDACTED kernel: [  434.003998] next_rip:           0000000000000000
Apr  8 13:01:47 REDACTED kernel: [  434.004022] VMCB State Save Area:
Apr  8 13:01:47 REDACTED kernel: [  434.004039] es:   s: 0000 a: 0000 l: ffffffff b: 0000000000000000
Apr  8 13:01:47 REDACTED kernel: [  434.004065] cs:   s: 0010 a: 029b l: ffffffff b: 0000000000000000
Apr  8 13:01:47 REDACTED kernel: [  434.004091] ss:   s: 0018 a: 0c93 l: ffffffff b: 0000000000000000
Apr  8 13:01:47 REDACTED kernel: [  434.004120] ds:   s: 0000 a: 0000 l: ffffffff b: 0000000000000000
Apr  8 13:01:47 REDACTED kernel: [  434.004146] fs:   s: 0000 a: 0000 l: ffffffff b: 00007f55eadcd7c0
Apr  8 13:01:47 REDACTED kernel: [  434.004172] gs:   s: 0000 a: 0000 l: ffffffff b: ffff88007fc00000
Apr  8 13:01:47 REDACTED kernel: [  434.004200] gdtr: s: 0000 a: 0000 l: 0000007f b: ffff88007fc0a000
Apr  8 13:01:47 REDACTED kernel: [  434.004226] ldtr: s: 0000 a: 0000 l: 0000ffff b: 0000000000000000
Apr  8 13:01:47 REDACTED kernel: [  434.004252] idtr: s: 0000 a: 0000 l: 00000fff b: ffffffffff576000
Apr  8 13:01:47 REDACTED kernel: [  434.004279] tr:   s: 0040 a: 008b l: 00002087 b: ffff88007fc11900
Apr  8 13:01:47 REDACTED kernel: [  434.004307] cpl:            0                efer:         0000000000001d01
Apr  8 13:01:47 REDACTED kernel: [  434.004335] cr0:            000000008005003b cr2:          00007fd2b8883000
Apr  8 13:01:47 REDACTED kernel: [  434.004363] cr3:            0000000036be0000 cr4:          00000000000007f0
Apr  8 13:01:47 REDACTED kernel: [  434.004391] dr6:            00000000ffff0ff0 dr7:          0000000000000400
Apr  8 13:01:47 REDACTED kernel: [  434.004420] rip:            ffffffff8104ed58 rflags:       0000000000000046
Apr  8 13:01:47 REDACTED kernel: [  434.004480] rsp:            ffff88003679f8d0 rax:          0000000000000005
Apr  8 13:01:47 REDACTED kernel: [  434.004509] star:           0023001000000000 lstar:        ffffffff8172c5f0
Apr  8 13:01:47 REDACTED kernel: [  434.004538] cstar:          ffffffff8172e510 sfmask:       0000000000043700
Apr  8 13:01:47 REDACTED kernel: [  434.005563] kernel_gs_base: 0000000000000000 sysenter_cs:  0000000000000010
Apr  8 13:01:47 REDACTED kernel: [  434.006629] sysenter_esp:   0000000000000000 sysenter_eip: 000000008172e2e0
Apr  8 13:01:47 REDACTED kernel: [  434.007639] gpat:           0007040600070406 dbgctl:       0000000000000000
Apr  8 13:01:47 REDACTED kernel: [  434.008663] br_from:        0000000000000000 br_to:        0000000000000000
Apr  8 13:01:47 REDACTED kernel: [  434.009678] excp_from:      0000000000000000 excp_to:      0000000000000000
 
Switching to 4.2.2-1-pve seems to have corrected it. I cannot reproduce the crashing no matter how hard I kick the crap out of my environment.

FYI I'm on an FX-8320 with ASUS M5A78L-M/USB3 motherboards.

I'm not sure which kernel version the issue starts at, but sometime before 4.2.8-1 and after 4.2.2-1

How should I pass this info to the devs?
 
We encounter the same problem on several Proxmox 4.4 servers with kernel 4.4.49-1-pve. The same VM's keep crashing and on the host you can find the following data in dmesg:

Code:
[  326.151126] WARNING: CPU: 1 PID: 2865 at arch/x86/kvm/emulate.c:5581 x86_emulate_insn+0xbb2/0xe30 [kvm]()
[  326.151128] Modules linked in: nfsv3 veth ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_multiport xt_conntrack nf_conntrack ip_set_hash_net ip_set iptable_filter ip_tables x_tables softdog nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfnetlink_log nfnetlink kvm_amd kvm irqbypass snd_pcm joydev crct10dif_pclmul crc32_pclmul input_leds ghash_clmulni_intel amd64_edac_mod snd_timer aesni_intel shpchp edac_mce_amd aes_x86_64 i2c_piix4 snd edac_core k10temp fam15h_power lrw soundcore 8250_fintek gf128mul pcspkr glue_helper
[  326.151171]  mac_hid serio_raw ablk_helper cryptd vhost_net vhost macvtap macvlan ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 hid_generic usbkbd usbmouse usbhid hid psmouse pata_acpi igb(O) dca ahci ptp pata_atiixp libahci pps_core arcmsr fjes
[  326.151192] CPU: 1 PID: 2865 Comm: kvm Tainted: G           O    4.4.49-1-pve #1
[  326.151194] Hardware name: Supermicro H8DGU/H8DGU, BIOS 3.5        11/25/2013
[  326.151198]  0000000000000286 00000000d9cf98dd ffff8803e2c0bc18 ffffffff813fa693
[  326.151201]  0000000000000000 ffffffffc037a33b ffff8803e2c0bc50 ffffffff81081916
[  326.151203]  ffff8804036026a0 0000000000000006 ffffffffc036fa40 0000000000000000
[  326.151206] Call Trace:
[  326.151215]  [<ffffffff813fa693>] dump_stack+0x63/0x90
[  326.151220]  [<ffffffff81081916>] warn_slowpath_common+0x86/0xc0
[  326.151223]  [<ffffffff81081a5a>] warn_slowpath_null+0x1a/0x20
[  326.151242]  [<ffffffffc035de52>] x86_emulate_insn+0xbb2/0xe30 [kvm]
[  326.151258]  [<ffffffffc0340f1d>] x86_emulate_instruction+0x1bd/0x6e0 [kvm]
[  326.151264]  [<ffffffffc01e1ebe>] ud_interception+0x1e/0x40 [kvm_amd]
[  326.151267]  [<ffffffffc01e4e18>] handle_exit+0x158/0xa40 [kvm_amd]
[  326.151280]  [<ffffffffc0335dc6>] ? kvm_set_cr8+0x26/0x40 [kvm]
[  326.151295]  [<ffffffffc0344bb0>] kvm_arch_vcpu_ioctl_run+0x760/0x1460 [kvm]
[  326.151309]  [<ffffffffc033eaba>] ? kvm_arch_vcpu_load+0x5a/0x220 [kvm]
[  326.151321]  [<ffffffffc032beca>] kvm_vcpu_ioctl+0x31a/0x5e0 [kvm]
[  326.151324]  [<ffffffff810c3f88>] ? __wake_up_locked_key+0x18/0x20
[  326.151327]  [<ffffffff8125d220>] ? eventfd_write+0xd0/0x270
[  326.151330]  [<ffffffff812235b2>] do_vfs_ioctl+0x2d2/0x4b0
[  326.151333]  [<ffffffff81103c05>] ? SyS_futex+0x85/0x180
[  326.151335]  [<ffffffff81223809>] SyS_ioctl+0x79/0x90
[  326.151339]  [<ffffffff81860336>] entry_SYSCALL_64_fastpath+0x16/0x75
[  326.151340] ---[ end trace eeb281b939fdf943 ]---
[  326.151342] ------------[ cut here ]------------

This problem started after the upgrade from Proxmox 3 to 4. Any idea what's going on?
 
Hi,

please send the output of
pveversion -v
 
Here's the output:

Code:
proxmox-ve: 4.4-86 (running kernel: 4.4.49-1-pve)
pve-manager: 4.4-13 (running version: 4.4-13/7ea56165)
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.4.19-1-pve: 4.4.19-66
pve-kernel-4.4.49-1-pve: 4.4.49-86
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-49
qemu-server: 4.0-110
pve-firmware: 1.1-11
libpve-common-perl: 4.0-94
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-97
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80

We managed to temporary fix this problem by starting the VM's on the console without the '+kvm_pv_unhalt' option.
 
Can you send the VM config what makes the problem
 
Sure :)

Code:
args: -vnc 127.0.0.1:853,password
boot: cdn
bootdisk: virtio0
cores: 4
ide2: local:iso/gparted-live-0.20.0-2-i486.iso,media=cdrom,size=219M
kvm: 1
memory: 16384
name: mta2.hix.nl
net0: e1000=3A:AB:CA:4B:A0:39,bridge=vmbr0,firewall=1
net1: e1000=5A:C9:47:94:64:CA,bridge=vmbr1
onboot: 1
ostype: l26
reboot: 1
smbios1: uuid=7bcc22d6-920d-4a07-99b9-93289ac24699
sockets: 1
virtio0: local:858/vm-858-disk-1.raw,format=raw,size=160G

We have 3 VM's with problems, all Ubuntu 14.x.

Update: During a migration of a Debian VM the same problem :)
 
Last edited:
We encounter the same problem on several Proxmox 4.4 servers with kernel 4.4.49-1-pve. The same VM's keep crashing and on the host you can find the following data in dmesg:

Code:
[  326.151126] WARNING: CPU: 1 PID: 2865 at arch/x86/kvm/emulate.c:5581 x86_emulate_insn+0xbb2/0xe30 [kvm]()
[  326.151128] Modules linked in: nfsv3 veth ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_mac ipt_REJECT nf_reject_ipv4 xt_physdev xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_tcpudp xt_mark xt_set xt_addrtype xt_multiport xt_conntrack nf_conntrack ip_set_hash_net ip_set iptable_filter ip_tables x_tables softdog nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfnetlink_log nfnetlink kvm_amd kvm irqbypass snd_pcm joydev crct10dif_pclmul crc32_pclmul input_leds ghash_clmulni_intel amd64_edac_mod snd_timer aesni_intel shpchp edac_mce_amd aes_x86_64 i2c_piix4 snd edac_core k10temp fam15h_power lrw soundcore 8250_fintek gf128mul pcspkr glue_helper
[  326.151171]  mac_hid serio_raw ablk_helper cryptd vhost_net vhost macvtap macvlan ipmi_si ipmi_poweroff ipmi_devintf ipmi_msghandler autofs4 hid_generic usbkbd usbmouse usbhid hid psmouse pata_acpi igb(O) dca ahci ptp pata_atiixp libahci pps_core arcmsr fjes
[  326.151192] CPU: 1 PID: 2865 Comm: kvm Tainted: G           O    4.4.49-1-pve #1
[  326.151194] Hardware name: Supermicro H8DGU/H8DGU, BIOS 3.5        11/25/2013
[  326.151198]  0000000000000286 00000000d9cf98dd ffff8803e2c0bc18 ffffffff813fa693
[  326.151201]  0000000000000000 ffffffffc037a33b ffff8803e2c0bc50 ffffffff81081916
[  326.151203]  ffff8804036026a0 0000000000000006 ffffffffc036fa40 0000000000000000
[  326.151206] Call Trace:
[  326.151215]  [<ffffffff813fa693>] dump_stack+0x63/0x90
[  326.151220]  [<ffffffff81081916>] warn_slowpath_common+0x86/0xc0
[  326.151223]  [<ffffffff81081a5a>] warn_slowpath_null+0x1a/0x20
[  326.151242]  [<ffffffffc035de52>] x86_emulate_insn+0xbb2/0xe30 [kvm]
[  326.151258]  [<ffffffffc0340f1d>] x86_emulate_instruction+0x1bd/0x6e0 [kvm]
[  326.151264]  [<ffffffffc01e1ebe>] ud_interception+0x1e/0x40 [kvm_amd]
[  326.151267]  [<ffffffffc01e4e18>] handle_exit+0x158/0xa40 [kvm_amd]
[  326.151280]  [<ffffffffc0335dc6>] ? kvm_set_cr8+0x26/0x40 [kvm]
[  326.151295]  [<ffffffffc0344bb0>] kvm_arch_vcpu_ioctl_run+0x760/0x1460 [kvm]
[  326.151309]  [<ffffffffc033eaba>] ? kvm_arch_vcpu_load+0x5a/0x220 [kvm]
[  326.151321]  [<ffffffffc032beca>] kvm_vcpu_ioctl+0x31a/0x5e0 [kvm]
[  326.151324]  [<ffffffff810c3f88>] ? __wake_up_locked_key+0x18/0x20
[  326.151327]  [<ffffffff8125d220>] ? eventfd_write+0xd0/0x270
[  326.151330]  [<ffffffff812235b2>] do_vfs_ioctl+0x2d2/0x4b0
[  326.151333]  [<ffffffff81103c05>] ? SyS_futex+0x85/0x180
[  326.151335]  [<ffffffff81223809>] SyS_ioctl+0x79/0x90
[  326.151339]  [<ffffffff81860336>] entry_SYSCALL_64_fastpath+0x16/0x75
[  326.151340] ---[ end trace eeb281b939fdf943 ]---
[  326.151342] ------------[ cut here ]------------

This problem started after the upgrade from Proxmox 3 to 4. Any idea what's going on?

maybe this kvm bug is related ?
https://patchwork.kernel.org/patch/9521019/
 
  • Like
Reactions: uFx
I actually haven't seen this problem in a long time. I did a major version for Proxmox VE upgrade a few months ago, and undid the kernel version pinning I was doing. I haven't seen the issue at all! I'm on "Virtual Environment 4.4-13/7ea56165" and haven't seen stability issues at all!

I have VMs with Ubuntu 12.04, 14.04, 16.04, and all sorts of other versions and OS' too! Most VMs with Opteron5 CPU, some KVM CPU.
 
Thanks @BloodyIron! We still have the same problem but we can fix it by starting the VM without the ,+kvm_pv_unhalt option.

What do you mean by kernel version pinning? We are using kernel 4.4.49-1-pve at the moment.
 
I configured grub to only boot a specific version of the Linux kernel, as newer versions (at the time) were causing instability in a few of my VMs. Fortunately this had been solved since then. I am not entirely sure what solved it, but I am no longer "pinning" my Linux kernel version.


Thanks @BloodyIron! We still have the same problem but we can fix it by starting the VM without the ,+kvm_pv_unhalt option.

What do you mean by kernel version pinning? We are using kernel 4.4.49-1-pve at the moment.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!