Hi Folks,
I recently upgraded my 2-node cluster from 3.4 to 4.2. At first appearance it all went really well, except now I have instability in just a few VMs, and I cannot ascertain why.
It only consistently happens with specific ones, namely my ownCloud VM, when doing regular stuff like upgrading packages with apt. The whole VM crashes, and I get a trace from /var/log/syslog on the proxmox node itself.
This is the kind of error I get:
(second half of snip is in second post, THANKS 10,000 CHAR LIMIT)
Now, I've updated the BIOS to try to address this, to the latest version, but it did not correct it.
I'm using kernel 4.2.8-1-pve, and I upgraded yesterday.
The two VMs this is happening to I am now considering 100% unstable, as I can't do any fucking work with them at all.
The only idea that I have so far is rolling back to an earlier kernel, but otherwise I have no clue what is causing this.
I was NOT having this kind of crashing before the major upgrade.
Please help.
I recently upgraded my 2-node cluster from 3.4 to 4.2. At first appearance it all went really well, except now I have instability in just a few VMs, and I cannot ascertain why.
It only consistently happens with specific ones, namely my ownCloud VM, when doing regular stuff like upgrading packages with apt. The whole VM crashes, and I get a trace from /var/log/syslog on the proxmox node itself.
This is the kind of error I get:
Code:
Apr 8 13:01:47 REDACTED kernel: [ 434.002921] ------------[ cut here ]------------
Apr 8 13:01:47 REDACTED kernel: [ 434.002951] WARNING: CPU: 4 PID: 2244 at arch/x86/kvm/emulate.c:5410 x86_emulate_insn+0xbb2/0xe30 [kvm]()
Apr 8 13:01:47 REDACTED kernel: [ 434.002953] Modules linked in: nfsv3 ip_set ip6table_filter ip6_tables binfmt_misc iptable_filter ip_tables x_tables softdog nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfnetlink_log nfnetlink amdkfd amd_iommu_v2 radeon kvm_amd snd_hda_codec_hdmi snd_hda_intel kvm ttm snd_hda_codec drm_kms_helper input_leds drm snd_hda_core snd_hwdep snd_pcm crct10dif_pclmul crc32_pclmul i2c_algo_bit pcspkr aesni_intel serio_raw ppdev aes_x86_64 snd_timer snd lrw soundcore parport_pc gf128mul glue_helper parport asus_atk0110 ablk_helper 8250_fintek cryptd wmi shpchp k10temp fam15h_power i2c_piix4 edac_core edac_mce_amd mac_hid vhost_net vhost macvtap macvlan autofs4 dm_mirror dm_region_hash dm_log hid_generic usbkbd usbmouse usbhid hid pata_acpi psmouse pata_atiixp ahci r8169 mii libahci
Apr 8 13:01:47 REDACTED kernel: [ 434.003007] CPU: 4 PID: 2244 Comm: kvm Not tainted 4.2.8-1-pve #1
Apr 8 13:01:47 REDACTED kernel: [ 434.003010] Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 2101 12/02/2014
Apr 8 13:01:47 REDACTED kernel: [ 434.003012] 0000000000000000 000000003d928ed0 ffff8806678b3b98 ffffffff81803b2b
Apr 8 13:01:47 REDACTED kernel: [ 434.003014] 0000000000000000 0000000000000000 ffff8806678b3bd8 ffffffff8107bc4a
Apr 8 13:01:47 REDACTED kernel: [ 434.003017] ffff8800a7d0e3a0 ffff8800a7d0e3a0 0000000000000006 ffffffffc0359840
Apr 8 13:01:47 REDACTED kernel: [ 434.003019] Call Trace:
Apr 8 13:01:47 REDACTED kernel: [ 434.003027] [<ffffffff81803b2b>] dump_stack+0x45/0x57
Apr 8 13:01:47 REDACTED kernel: [ 434.003031] [<ffffffff8107bc4a>] warn_slowpath_common+0x8a/0xc0
Apr 8 13:01:47 REDACTED kernel: [ 434.003034] [<ffffffff8107bd7a>] warn_slowpath_null+0x1a/0x20
Apr 8 13:01:47 REDACTED kernel: [ 434.003047] [<ffffffffc0349622>] x86_emulate_insn+0xbb2/0xe30 [kvm]
Apr 8 13:01:47 REDACTED kernel: [ 434.003059] [<ffffffffc032cb4d>] x86_emulate_instruction+0x1bd/0x730 [kvm]
Apr 8 13:01:47 REDACTED kernel: [ 434.003064] [<ffffffffc03933d0>] ? nested_svm_get_tdp_cr3+0x20/0x20 [kvm_amd]
Apr 8 13:01:47 REDACTED kernel: [ 434.003067] [<ffffffffc0393f52>] ud_interception+0x22/0x40 [kvm_amd]
Apr 8 13:01:47 REDACTED kernel: [ 434.003070] [<ffffffffc03933d0>] ? nested_svm_get_tdp_cr3+0x20/0x20 [kvm_amd]
Apr 8 13:01:47 REDACTED kernel: [ 434.003074] [<ffffffffc0396fd2>] handle_exit+0x132/0x990 [kvm_amd]
Apr 8 13:01:47 REDACTED kernel: [ 434.003084] [<ffffffffc032218c>] ? kvm_set_cr8+0x1c/0x20 [kvm]
Apr 8 13:01:47 REDACTED kernel: [ 434.003087] [<ffffffffc03933d0>] ? nested_svm_get_tdp_cr3+0x20/0x20 [kvm_amd]
Apr 8 13:01:47 REDACTED kernel: [ 434.003099] [<ffffffffc0330f26>] kvm_arch_vcpu_ioctl_run+0x656/0x1200 [kvm]
Apr 8 13:01:47 REDACTED kernel: [ 434.003110] [<ffffffffc032a868>] ? kvm_arch_vcpu_load+0x58/0x1a0 [kvm]
Apr 8 13:01:47 REDACTED kernel: [ 434.003119] [<ffffffffc03198c0>] kvm_vcpu_ioctl+0x320/0x5c0 [kvm]
Apr 8 13:01:47 REDACTED kernel: [ 434.003123] [<ffffffff8102cedc>] ? x86_pmu_enable+0x25c/0x2e0
Apr 8 13:01:47 REDACTED kernel: [ 434.003132] [<ffffffff81173662>] ? perf_pmu_enable+0x22/0x30
Apr 8 13:01:47 REDACTED kernel: [ 434.003134] [<ffffffff81174dbb>] ? perf_event_context_sched_in+0x8b/0xb0
Apr 8 13:01:47 REDACTED kernel: [ 434.003140] [<ffffffff81211834>] do_vfs_ioctl+0x2c4/0x4a0
Apr 8 13:01:47 REDACTED kernel: [ 434.003144] [<ffffffff810fc2c5>] ? SyS_futex+0x85/0x180
Apr 8 13:01:47 REDACTED kernel: [ 434.003147] [<ffffffff81211a89>] SyS_ioctl+0x79/0x90
Apr 8 13:01:47 REDACTED kernel: [ 434.003150] [<ffffffff8180ab72>] entry_SYSCALL_64_fastpath+0x16/0x75
Apr 8 13:01:47 REDACTED kernel: [ 434.003152] ---[ end trace ad7354c7a139129b ]---
Apr 8 13:01:47 REDACTED kernel: [ 434.003154] ------------[ cut here ]------------
Apr 8 13:01:47 REDACTED kernel: [ 434.003165] WARNING: CPU: 4 PID: 2244 at arch/x86/kvm/x86.c:337 exception_type+0x49/0x50 [kvm]()
Apr 8 13:01:47 REDACTED kernel: [ 434.003166] Modules linked in: nfsv3 ip_set ip6table_filter ip6_tables binfmt_misc iptable_filter ip_tables x_tables softdog nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfnetlink_log nfnetlink amdkfd amd_iommu_v2 radeon kvm_amd snd_hda_codec_hdmi snd_hda_intel kvm ttm snd_hda_codec drm_kms_helper input_leds drm snd_hda_core snd_hwdep snd_pcm crct10dif_pclmul crc32_pclmul i2c_algo_bit pcspkr aesni_intel serio_raw ppdev aes_x86_64 snd_timer snd lrw soundcore parport_pc gf128mul glue_helper parport asus_atk0110 ablk_helper 8250_fintek cryptd wmi shpchp k10temp fam15h_power i2c_piix4 edac_core edac_mce_amd mac_hid vhost_net vhost macvtap macvlan autofs4 dm_mirror dm_region_hash dm_log hid_generic usbkbd usbmouse usbhid hid pata_acpi psmouse pata_atiixp ahci r8169 mii libahci
Apr 8 13:01:47 REDACTED kernel: [ 434.003204] CPU: 4 PID: 2244 Comm: kvm Tainted: G W 4.2.8-1-pve #1
Apr 8 13:01:47 REDACTED kernel: [ 434.003205] Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 2101 12/02/2014
Apr 8 13:01:47 REDACTED kernel: [ 434.003207] 0000000000000000 000000003d928ed0 ffff8806678b3bb8 ffffffff81803b2b
Apr 8 13:01:47 REDACTED kernel: [ 434.003209] 0000000000000000 0000000000000000 ffff8806678b3bf8 ffffffff8107bc4a
Apr 8 13:01:47 REDACTED kernel: [ 434.003211] 0000000000000000 0000000000000000 0000000000000046 0000000000000000
Apr 8 13:01:47 REDACTED kernel: [ 434.003214] Call Trace:
Apr 8 13:01:47 REDACTED kernel: [ 434.003216] [<ffffffff81803b2b>] dump_stack+0x45/0x57
Apr 8 13:01:47 REDACTED kernel: [ 434.003219] [<ffffffff8107bc4a>] warn_slowpath_common+0x8a/0xc0
Apr 8 13:01:47 REDACTED kernel: [ 434.003222] [<ffffffff8107bd7a>] warn_slowpath_null+0x1a/0x20
Apr 8 13:01:47 REDACTED kernel: [ 434.003232] [<ffffffffc0321a59>] exception_type+0x49/0x50 [kvm]
Apr 8 13:01:47 REDACTED kernel: [ 434.003243] [<ffffffffc032cd23>] x86_emulate_instruction+0x393/0x730 [kvm]
Apr 8 13:01:47 REDACTED kernel: [ 434.003246] [<ffffffffc03933d0>] ? nested_svm_get_tdp_cr3+0x20/0x20 [kvm_amd]
Apr 8 13:01:47 REDACTED kernel: [ 434.003249] [<ffffffffc0393f52>] ud_interception+0x22/0x40 [kvm_amd]
Apr 8 13:01:47 REDACTED kernel: [ 434.003252] [<ffffffffc03933d0>] ? nested_svm_get_tdp_cr3+0x20/0x20 [kvm_amd]
Apr 8 13:01:47 REDACTED kernel: [ 434.003255] [<ffffffffc0396fd2>] handle_exit+0x132/0x990 [kvm_amd]
Apr 8 13:01:47 REDACTED kernel: [ 434.003265] [<ffffffffc032218c>] ? kvm_set_cr8+0x1c/0x20 [kvm]
Apr 8 13:01:47 REDACTED kernel: [ 434.003269] [<ffffffffc03933d0>] ? nested_svm_get_tdp_cr3+0x20/0x20 [kvm_amd]
Apr 8 13:01:47 REDACTED kernel: [ 434.003280] [<ffffffffc0330f26>] kvm_arch_vcpu_ioctl_run+0x656/0x1200 [kvm]
Apr 8 13:01:47 REDACTED kernel: [ 434.003292] [<ffffffffc032a868>] ? kvm_arch_vcpu_load+0x58/0x1a0 [kvm]
Apr 8 13:01:47 REDACTED kernel: [ 434.003300] [<ffffffffc03198c0>] kvm_vcpu_ioctl+0x320/0x5c0 [kvm]
Apr 8 13:01:47 REDACTED kernel: [ 434.003303] [<ffffffff8102cedc>] ? x86_pmu_enable+0x25c/0x2e0
Apr 8 13:01:47 REDACTED kernel: [ 434.003306] [<ffffffff81173662>] ? perf_pmu_enable+0x22/0x30
Apr 8 13:01:47 REDACTED kernel: [ 434.003308] [<ffffffff81174dbb>] ? perf_event_context_sched_in+0x8b/0xb0
Apr 8 13:01:47 REDACTED kernel: [ 434.003311] [<ffffffff81211834>] do_vfs_ioctl+0x2c4/0x4a0
Apr 8 13:01:47 REDACTED kernel: [ 434.003314] [<ffffffff810fc2c5>] ? SyS_futex+0x85/0x180
Apr 8 13:01:47 REDACTED kernel: [ 434.003316] [<ffffffff81211a89>] SyS_ioctl+0x79/0x90
Apr 8 13:01:47 REDACTED kernel: [ 434.003319] [<ffffffff8180ab72>] entry_SYSCALL_64_fastpath+0x16/0x75
Apr 8 13:01:47 REDACTED kernel: [ 434.003321] ---[ end trace ad7354c7a139129c ]---
(second half of snip is in second post, THANKS 10,000 CHAR LIMIT)
Now, I've updated the BIOS to try to address this, to the latest version, but it did not correct it.
I'm using kernel 4.2.8-1-pve, and I upgraded yesterday.
The two VMs this is happening to I am now considering 100% unstable, as I can't do any fucking work with them at all.
The only idea that I have so far is rolling back to an earlier kernel, but otherwise I have no clue what is causing this.
I was NOT having this kind of crashing before the major upgrade.
Please help.