Dear All,
I am currently using win10 VM on pve-6.2-4 with kernel 5.4.41 and pve-qume-kvm_5.0.0. My hardware are Xeon E2244G, 2x 16G ECC RAM, and 1660 super (passthrough). I found that after using win10 for a short time (about 10-20min, after perf: interrupt took too long (2504 > 2500), lowering kernel.perf_event_max_sample_rate to 79750) and shut it down, the cpu would be soft locked up due to kvm. dmesg is listed below:
[Fri May 29 00:01:03 2020] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [kvm:6504]
[Fri May 29 00:01:03 2020] Modules linked in: tcp_diag(E) inet_diag(E) ebtable_filter(E) ebtables(E) ip_set(E) ip6table_raw(E) iptable_raw(E) ip6table_filter(E) ip6_tables(E) xt_nat(E) xt_tcpudp(E) veth(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) xt_addrtype(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) aufs(E) iptable_filter(E) bpfilter(E) overlay(E) softdog(E) nfnetlink_log(E) nfnetlink(E) intel_rapl_msr(E) intel_rapl_common(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) zfs(POE) aesni_intel(E) zunicode(POE) crypto_simd(E) zlua(POE) cryptd(E) glue_helper(E) zavl(POE) icp(POE) intel_cstate(E) ipmi_ssif(E) intel_rapl_perf(E) pcspkr(E) wmi_bmof(E) snd_hda_intel(E) 8250_dw(E) snd_intel_dspcfg(E) joydev(E) snd_hda_codec(E) input_leds(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) mei_me(E) mei(E) ie31200_edac(E)
[Fri May 29 00:01:03 2020] intel_pch_thermal(E) zcommon(POE) ipmi_si(E) ipmi_devintf(E) znvpair(POE) ipmi_msghandler(E) spl(OE) vhost_net(E) vhost(E) tap(E) ib_iser(E) acpi_tad(E) mac_hid(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) vfio_pci(E) vfio_virqfd(E) irqbypass(E) vfio_iommu_type1(E) vfio(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) usbmouse(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) usbkbd(E) dm_bufio(E) libcrc32c(E) hid_generic(E) usbhid(E) hid(E) ast(E) drm_vram_helper(E) ttm(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) i2c_i801(E) drm(E) igb(E) intel_lpss_pci(E) ahci(E) xhci_pci(E) dca(E) intel_lpss(E) i2c_algo_bit(E) libahci(E) idma64(E) virt_dma(E) xhci_hcd(E) wmi(E) video(E) pinctrl_cannonlake(E) pinctrl_intel(E)
[Fri May 29 00:01:03 2020] CPU: 5 PID: 6504 Comm: kvm Tainted: P OE 5.4.41-1-pve #1
[Fri May 29 00:01:03 2020] Hardware name: Supermicro Super Server/X11SCL-IF, BIOS 1.3 02/21/2020
[Fri May 29 00:01:03 2020] RIP: 0010:_raw_spin_unlock_irqrestore+0x15/0x20
[Fri May 29 00:01:03 2020] Code: c0 5d c3 b8 01 00 00 00 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 40 00 48 89 f7 57 9d <0f> 1f 44 00 00 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 8b 07
[Fri May 29 00:01:03 2020] RSP: 0018:ffffb13789767ac8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[Fri May 29 00:01:03 2020] RAX: 0000000000000000 RBX: ffff9a169a1fa4a4 RCX: 0000000000000000
[Fri May 29 00:01:03 2020] RDX: 001f000000000000 RSI: 0000000000000246 RDI: 0000000000000246
[Fri May 29 00:01:03 2020] RBP: ffffb13789767ac8 R08: 0000000000000000 R09: ffffffff9a372900
[Fri May 29 00:01:03 2020] R10: ffff9a1692cc92a0 R11: 0000000000000001 R12: 0000000000000001
[Fri May 29 00:01:03 2020] R13: ffff9a169a1fa428 R14: ffff9a169a1fa400 R15: 0000000000000246
[Fri May 29 00:01:03 2020] FS: 00007fe39a3ff700(0000) GS:ffff9a169eb40000(0000) knlGS:0000000000000000
[Fri May 29 00:01:03 2020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri May 29 00:01:03 2020] CR2: 00007fe39edff9d0 CR3: 000000028ce0a004 CR4: 00000000003626e0
[Fri May 29 00:01:03 2020] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Fri May 29 00:01:03 2020] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Fri May 29 00:01:03 2020] Call Trace:
[Fri May 29 00:01:03 2020] __synchronize_hardirq+0x6f/0xd0
[Fri May 29 00:01:03 2020] __free_irq+0x145/0x2c0
[Fri May 29 00:01:03 2020] free_irq+0x32/0x70
[Fri May 29 00:01:03 2020] vfio_intx_set_signal+0x39/0x1d0 [vfio_pci]
[Fri May 29 00:01:03 2020] vfio_intx_disable+0x3a/0x60 [vfio_pci]
[Fri May 29 00:01:03 2020] vfio_pci_set_intx_trigger+0x117/0x180 [vfio_pci]
[Fri May 29 00:01:03 2020] vfio_pci_set_irqs_ioctl+0x87/0xb0 [vfio_pci]
[Fri May 29 00:01:03 2020] vfio_pci_disable+0x58/0x4a0 [vfio_pci]
[Fri May 29 00:01:03 2020] ? vfio_pci_disable+0x4a0/0x4a0 [vfio_pci]
[Fri May 29 00:01:03 2020] vfio_pci_release+0x4d/0x50 [vfio_pci]
[Fri May 29 00:01:03 2020] vfio_device_fops_release+0x22/0x40 [vfio]
[Fri May 29 00:01:03 2020] __fput+0xc6/0x260
[Fri May 29 00:01:03 2020] ____fput+0xe/0x10
[Fri May 29 00:01:03 2020] task_work_run+0x9d/0xc0
[Fri May 29 00:01:03 2020] do_exit+0x367/0xab0
[Fri May 29 00:01:03 2020] do_group_exit+0x47/0xb0
[Fri May 29 00:01:03 2020] get_signal+0x140/0x850
[Fri May 29 00:01:03 2020] ? __fpu__restore_sig+0x48d/0x610
[Fri May 29 00:01:03 2020] ? __set_current_blocked+0x3b/0x60
[Fri May 29 00:01:03 2020] do_signal+0x34/0x6e0
[Fri May 29 00:01:03 2020] ? __x64_sys_futex+0x143/0x17f
[Fri May 29 00:01:03 2020] ? restore_altstack+0x51/0x70
[Fri May 29 00:01:03 2020] exit_to_usermode_loop+0x90/0x130
[Fri May 29 00:01:03 2020] do_syscall_64+0x160/0x190
[Fri May 29 00:01:03 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Fri May 29 00:01:03 2020] RIP: 0033:0x7fe7d771629c
[Fri May 29 00:01:03 2020] Code: Bad RIP value.
[Fri May 29 00:01:03 2020] RSP: 002b:00007fe39a3fa308 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[Fri May 29 00:01:03 2020] RAX: fffffffffffffe00 RBX: 00007fe7ca139280 RCX: 00007fe7d771629c
[Fri May 29 00:01:03 2020] RDX: 0000000000000002 RSI: 0000000000000080 RDI: 000055ac99eb08c0
[Fri May 29 00:01:03 2020] RBP: 0000000000000000 R08: 000055ac99eb08c0 R09: 000055ac99eb07c0
[Fri May 29 00:01:03 2020] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
[Fri May 29 00:01:03 2020] R13: 000055ac99eb08c0 R14: 0000000000000000 R15: 00007fe7ca1392a8
I have no idea what is going on and I wonder if anyone could kindly help me! Thanks!
Best,
Harold
I am currently using win10 VM on pve-6.2-4 with kernel 5.4.41 and pve-qume-kvm_5.0.0. My hardware are Xeon E2244G, 2x 16G ECC RAM, and 1660 super (passthrough). I found that after using win10 for a short time (about 10-20min, after perf: interrupt took too long (2504 > 2500), lowering kernel.perf_event_max_sample_rate to 79750) and shut it down, the cpu would be soft locked up due to kvm. dmesg is listed below:
[Fri May 29 00:01:03 2020] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [kvm:6504]
[Fri May 29 00:01:03 2020] Modules linked in: tcp_diag(E) inet_diag(E) ebtable_filter(E) ebtables(E) ip_set(E) ip6table_raw(E) iptable_raw(E) ip6table_filter(E) ip6_tables(E) xt_nat(E) xt_tcpudp(E) veth(E) xt_conntrack(E) xt_MASQUERADE(E) nf_conntrack_netlink(E) xfrm_user(E) xfrm_algo(E) xt_addrtype(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) aufs(E) iptable_filter(E) bpfilter(E) overlay(E) softdog(E) nfnetlink_log(E) nfnetlink(E) intel_rapl_msr(E) intel_rapl_common(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) zfs(POE) aesni_intel(E) zunicode(POE) crypto_simd(E) zlua(POE) cryptd(E) glue_helper(E) zavl(POE) icp(POE) intel_cstate(E) ipmi_ssif(E) intel_rapl_perf(E) pcspkr(E) wmi_bmof(E) snd_hda_intel(E) 8250_dw(E) snd_intel_dspcfg(E) joydev(E) snd_hda_codec(E) input_leds(E) snd_hda_core(E) snd_hwdep(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) mei_me(E) mei(E) ie31200_edac(E)
[Fri May 29 00:01:03 2020] intel_pch_thermal(E) zcommon(POE) ipmi_si(E) ipmi_devintf(E) znvpair(POE) ipmi_msghandler(E) spl(OE) vhost_net(E) vhost(E) tap(E) ib_iser(E) acpi_tad(E) mac_hid(E) rdma_cm(E) iw_cm(E) ib_cm(E) ib_core(E) iscsi_tcp(E) libiscsi_tcp(E) libiscsi(E) scsi_transport_iscsi(E) vfio_pci(E) vfio_virqfd(E) irqbypass(E) vfio_iommu_type1(E) vfio(E) sunrpc(E) ip_tables(E) x_tables(E) autofs4(E) btrfs(E) xor(E) zstd_compress(E) raid6_pq(E) usbmouse(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) usbkbd(E) dm_bufio(E) libcrc32c(E) hid_generic(E) usbhid(E) hid(E) ast(E) drm_vram_helper(E) ttm(E) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) i2c_i801(E) drm(E) igb(E) intel_lpss_pci(E) ahci(E) xhci_pci(E) dca(E) intel_lpss(E) i2c_algo_bit(E) libahci(E) idma64(E) virt_dma(E) xhci_hcd(E) wmi(E) video(E) pinctrl_cannonlake(E) pinctrl_intel(E)
[Fri May 29 00:01:03 2020] CPU: 5 PID: 6504 Comm: kvm Tainted: P OE 5.4.41-1-pve #1
[Fri May 29 00:01:03 2020] Hardware name: Supermicro Super Server/X11SCL-IF, BIOS 1.3 02/21/2020
[Fri May 29 00:01:03 2020] RIP: 0010:_raw_spin_unlock_irqrestore+0x15/0x20
[Fri May 29 00:01:03 2020] Code: c0 5d c3 b8 01 00 00 00 5d c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 c6 07 00 0f 1f 40 00 48 89 f7 57 9d <0f> 1f 44 00 00 5d c3 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 8b 07
[Fri May 29 00:01:03 2020] RSP: 0018:ffffb13789767ac8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[Fri May 29 00:01:03 2020] RAX: 0000000000000000 RBX: ffff9a169a1fa4a4 RCX: 0000000000000000
[Fri May 29 00:01:03 2020] RDX: 001f000000000000 RSI: 0000000000000246 RDI: 0000000000000246
[Fri May 29 00:01:03 2020] RBP: ffffb13789767ac8 R08: 0000000000000000 R09: ffffffff9a372900
[Fri May 29 00:01:03 2020] R10: ffff9a1692cc92a0 R11: 0000000000000001 R12: 0000000000000001
[Fri May 29 00:01:03 2020] R13: ffff9a169a1fa428 R14: ffff9a169a1fa400 R15: 0000000000000246
[Fri May 29 00:01:03 2020] FS: 00007fe39a3ff700(0000) GS:ffff9a169eb40000(0000) knlGS:0000000000000000
[Fri May 29 00:01:03 2020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Fri May 29 00:01:03 2020] CR2: 00007fe39edff9d0 CR3: 000000028ce0a004 CR4: 00000000003626e0
[Fri May 29 00:01:03 2020] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Fri May 29 00:01:03 2020] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Fri May 29 00:01:03 2020] Call Trace:
[Fri May 29 00:01:03 2020] __synchronize_hardirq+0x6f/0xd0
[Fri May 29 00:01:03 2020] __free_irq+0x145/0x2c0
[Fri May 29 00:01:03 2020] free_irq+0x32/0x70
[Fri May 29 00:01:03 2020] vfio_intx_set_signal+0x39/0x1d0 [vfio_pci]
[Fri May 29 00:01:03 2020] vfio_intx_disable+0x3a/0x60 [vfio_pci]
[Fri May 29 00:01:03 2020] vfio_pci_set_intx_trigger+0x117/0x180 [vfio_pci]
[Fri May 29 00:01:03 2020] vfio_pci_set_irqs_ioctl+0x87/0xb0 [vfio_pci]
[Fri May 29 00:01:03 2020] vfio_pci_disable+0x58/0x4a0 [vfio_pci]
[Fri May 29 00:01:03 2020] ? vfio_pci_disable+0x4a0/0x4a0 [vfio_pci]
[Fri May 29 00:01:03 2020] vfio_pci_release+0x4d/0x50 [vfio_pci]
[Fri May 29 00:01:03 2020] vfio_device_fops_release+0x22/0x40 [vfio]
[Fri May 29 00:01:03 2020] __fput+0xc6/0x260
[Fri May 29 00:01:03 2020] ____fput+0xe/0x10
[Fri May 29 00:01:03 2020] task_work_run+0x9d/0xc0
[Fri May 29 00:01:03 2020] do_exit+0x367/0xab0
[Fri May 29 00:01:03 2020] do_group_exit+0x47/0xb0
[Fri May 29 00:01:03 2020] get_signal+0x140/0x850
[Fri May 29 00:01:03 2020] ? __fpu__restore_sig+0x48d/0x610
[Fri May 29 00:01:03 2020] ? __set_current_blocked+0x3b/0x60
[Fri May 29 00:01:03 2020] do_signal+0x34/0x6e0
[Fri May 29 00:01:03 2020] ? __x64_sys_futex+0x143/0x17f
[Fri May 29 00:01:03 2020] ? restore_altstack+0x51/0x70
[Fri May 29 00:01:03 2020] exit_to_usermode_loop+0x90/0x130
[Fri May 29 00:01:03 2020] do_syscall_64+0x160/0x190
[Fri May 29 00:01:03 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Fri May 29 00:01:03 2020] RIP: 0033:0x7fe7d771629c
[Fri May 29 00:01:03 2020] Code: Bad RIP value.
[Fri May 29 00:01:03 2020] RSP: 002b:00007fe39a3fa308 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[Fri May 29 00:01:03 2020] RAX: fffffffffffffe00 RBX: 00007fe7ca139280 RCX: 00007fe7d771629c
[Fri May 29 00:01:03 2020] RDX: 0000000000000002 RSI: 0000000000000080 RDI: 000055ac99eb08c0
[Fri May 29 00:01:03 2020] RBP: 0000000000000000 R08: 000055ac99eb08c0 R09: 000055ac99eb07c0
[Fri May 29 00:01:03 2020] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
[Fri May 29 00:01:03 2020] R13: 000055ac99eb08c0 R14: 0000000000000000 R15: 00007fe7ca1392a8
I have no idea what is going on and I wonder if anyone could kindly help me! Thanks!
Best,
Harold