Hi All,
Can anyone have a look at this and see if they can help pinpoint the cause of a crash I had to day?
Symptoms
Proxmox had locked up hard
Couldn't log in via web interface
Couldn't ping any of the ethernet interfaces
Wake on LAN didn't work
Config
Intel NUC 9th Gen
6c/12t
64GB RAM
2x 2TB Intel NVMe drives in ZFS mirror
PCI Intel 10GB-e dual RJ45 card
Clues?
Last few days I had been fiddling with remote display types for the VMs, can't find anything 'effect' for me.
I also installed Wireguard and Netmaker on the PVE host in order to access it remotely, but I hadn't actually done this...
I had to get someone to go onsite, use the button to shut the machine down, then boot it back up. Seems to have come back ok
Here is the syslog at time of crash- I also have the log from when we booted back up if that helps
Apr 25 13:00:43 pve01 kernel: [1640777.491800] ------------[ cut here ]------------
Apr 25 13:00:43 pve01 kernel: [1640777.491804] WARNING: CPU: 6 PID: 91 at include/linux/mm.h:1221 try_grab_page+0xec/0x100
Apr 25 13:00:43 pve01 kernel: [1640777.491810] Modules linked in: veth 8021q garp mrp tcp_diag inet_diag wireguard curve25519_x86_64
libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 libcurve25519_generic libchacha libblake2s_generic ip6_
udp_tunnel udp_tunnel ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_
tables bonding tls softdog nfnetlink_log nfnetlink snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic snd_sof_pci_intel_cnl sn
d_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_d
sp intel_rapl_msr snd_sof intel_rapl_common intel_tcc_cooling snd_soc_hdac_hda x86_pkg_temp_thermal snd_hda_ext_core intel_powerclam
p snd_soc_acpi_intel_match snd_soc_acpi coretemp snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core iwlmvm kvm_intel snd
_hwdep soundwire_bus ledtrig_audio snd_soc_core mei_hdcp mac80211 libarc4 kvm snd_compress ac97_bus
Apr 25 13:00:43 pve01 kernel: [1640777.491837] crct10dif_pclmul snd_pcm_dmaengine ghash_clmulni_intel btusb snd_pcm iwlwifi aesni_i
ntel btrtl crypto_simd snd_timer cryptd btbcm input_leds snd btintel rapl intel_cstate intel_wmi_thunderbolt i915 serio_raw pcspkr s
oundcore wmi_bmof efi_pstore ee1004 8250_dw bluetooth cfg80211 drm_kms_helper ecdh_generic ecc cec rc_core fb_sys_fops syscopyarea s:
ysfillrect sysimgblt mei_me mei intel_pch_thermal ucsi_acpi typec_ucsi typec mac_hid acpi_pad acpi_tad vhost_net vhost vhost_iotlb t
ap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfio_pci vfio_virqfd irqbypass vfio_iomm
u_type1 vfio drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) sp
l(O) btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c uas usb_storage crc32_pclmul i2c_i801 psmouse e1000e i2c_smbus igb i
2c_algo_bit nvme xhci_pci intel_lpss_pci xhci_pci_renesas thunderbolt ahci nvme_core ixgbe intel_lpss libahci
Apr 25 13:00:43 pve01 kernel: [1640777.491882] xhci_hcd idma64 xfrm_algo dca mdio wmi video pinctrl_cannonlake
Apr 25 13:00:43 pve01 kernel: [1640777.491911] CPU: 6 PID: 91 Comm: ksmd Tainted: P W O 5.13.19-6-pve #1
Apr 25 13:00:43 pve01 kernel: [1640777.491914] Hardware name: Intel(R) Client Systems NUC9i7QNX/NUC9i7QNB, BIOS QXCFL579.0034.2019.1
125.1436 11/25/2019
Apr 25 13:00:43 pve01 kernel: [1640777.491916] RIP: 0010:try_grab_page+0xec/0x100
Apr 25 13:00:43 pve01 kernel: [1640777.491918] Code: fa 8b 47 34 85 c0 7e 22 f0 ff 47 34 b8 01 00 00 00 c3 c3 48 8b 48 08 ba 00 04 0
0 00 83 e1 01 74 a9 eb 84 0f 0b e9 2b ff ff ff <0f> 0b 31 c0 c3 0f 0b 31 c0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
Apr 25 13:00:43 pve01 kernel: [1640777.491921] RSP: 0018:ffff9968003dfd50 EFLAGS: 00010246
Apr 25 13:00:43 pve01 kernel: [1640777.491923] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
Apr 25 13:00:43 pve01 kernel: [1640777.491941] RDX: fffff9eab34da6c7 RSI: 0000000000000004 RDI: fffff9eab34d3e80
Apr 25 13:00:43 pve01 kernel: [1640777.491943] RBP: ffff9968003dfdb0 R08: 8000000cd34fa807 R09: fffff9eab0149480
Apr 25 13:00:43 pve01 kernel: [1640777.491944] R10: 0000000c05252067 R11: 0000000000000807 R12: fffff9eab01494a8
Apr 25 13:00:43 pve01 kernel: [1640777.491946] R13: ffff8a13c5252cb0 R14: ffff8a0caef52960 R15: fffff9eab34d3e80
Apr 25 13:00:43 pve01 kernel: [1640777.491948] FS: 0000000000000000(0000) GS:ffff8a1820d00000(0000) knlGS:0000000000000000
Apr 25 13:00:43 pve01 kernel: [1640777.491950] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 25 13:00:43 pve01 kernel: [1640777.491951] CR2: 00007f1a6d102a18 CR3: 0000000485e10005 CR4: 00000000003726e0
Apr 25 13:00:43 pve01 kernel: [1640777.491953] Call Trace:
Apr 25 13:00:43 pve01 kernel: [1640777.491954] <TASK>
Apr 25 13:00:43 pve01 kernel: [1640777.491956] ? follow_page_pte+0x2ba/0x4b0
Apr 25 13:00:43 pve01 kernel: [1640777.491959] follow_page_mask+0x4a9/0x810
Apr 25 13:00:43 pve01 kernel: [1640777.491961] follow_page+0x37/0x90
Apr 25 13:00:43 pve01 kernel: [1640777.491963] ksm_scan_thread+0xb6e/0x1c30
Apr 25 13:00:43 pve01 kernel: [1640777.491966] ? __wake_up_pollfree+0x40/0x40
Apr 25 13:00:43 pve01 kernel: [1640777.491968] ? try_to_merge_with_ksm_page+0xd0/0xd0
Apr 25 13:00:43 pve01 kernel: [1640777.491970] kthread+0x12b/0x150
Apr 25 13:00:43 pve01 kernel: [1640777.491972] ? set_kthread_struct+0x50/0x50
Apr 25 13:00:43 pve01 kernel: [1640777.491975] ret_from_fork+0x22/0x30
Apr 25 13:00:43 pve01 kernel: [1640777.491978] </TASK>
Apr 25 13:00:43 pve01 kernel: [1640777.491979] ---[ end trace 5dac039dc4b9baa2 ]---
Can anyone have a look at this and see if they can help pinpoint the cause of a crash I had to day?
Symptoms
Proxmox had locked up hard
Couldn't log in via web interface
Couldn't ping any of the ethernet interfaces
Wake on LAN didn't work
Config
Intel NUC 9th Gen
6c/12t
64GB RAM
2x 2TB Intel NVMe drives in ZFS mirror
PCI Intel 10GB-e dual RJ45 card
Clues?
Last few days I had been fiddling with remote display types for the VMs, can't find anything 'effect' for me.
I also installed Wireguard and Netmaker on the PVE host in order to access it remotely, but I hadn't actually done this...
I had to get someone to go onsite, use the button to shut the machine down, then boot it back up. Seems to have come back ok
Here is the syslog at time of crash- I also have the log from when we booted back up if that helps
Apr 25 13:00:43 pve01 kernel: [1640777.491800] ------------[ cut here ]------------
Apr 25 13:00:43 pve01 kernel: [1640777.491804] WARNING: CPU: 6 PID: 91 at include/linux/mm.h:1221 try_grab_page+0xec/0x100
Apr 25 13:00:43 pve01 kernel: [1640777.491810] Modules linked in: veth 8021q garp mrp tcp_diag inet_diag wireguard curve25519_x86_64
libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libblake2s blake2s_x86_64 libcurve25519_generic libchacha libblake2s_generic ip6_
udp_tunnel udp_tunnel ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_
tables bonding tls softdog nfnetlink_log nfnetlink snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic snd_sof_pci_intel_cnl sn
d_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_d
sp intel_rapl_msr snd_sof intel_rapl_common intel_tcc_cooling snd_soc_hdac_hda x86_pkg_temp_thermal snd_hda_ext_core intel_powerclam
p snd_soc_acpi_intel_match snd_soc_acpi coretemp snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core iwlmvm kvm_intel snd
_hwdep soundwire_bus ledtrig_audio snd_soc_core mei_hdcp mac80211 libarc4 kvm snd_compress ac97_bus
Apr 25 13:00:43 pve01 kernel: [1640777.491837] crct10dif_pclmul snd_pcm_dmaengine ghash_clmulni_intel btusb snd_pcm iwlwifi aesni_i
ntel btrtl crypto_simd snd_timer cryptd btbcm input_leds snd btintel rapl intel_cstate intel_wmi_thunderbolt i915 serio_raw pcspkr s
oundcore wmi_bmof efi_pstore ee1004 8250_dw bluetooth cfg80211 drm_kms_helper ecdh_generic ecc cec rc_core fb_sys_fops syscopyarea s:
ysfillrect sysimgblt mei_me mei intel_pch_thermal ucsi_acpi typec_ucsi typec mac_hid acpi_pad acpi_tad vhost_net vhost vhost_iotlb t
ap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi vfio_pci vfio_virqfd irqbypass vfio_iomm
u_type1 vfio drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) sp
l(O) btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c uas usb_storage crc32_pclmul i2c_i801 psmouse e1000e i2c_smbus igb i
2c_algo_bit nvme xhci_pci intel_lpss_pci xhci_pci_renesas thunderbolt ahci nvme_core ixgbe intel_lpss libahci
Apr 25 13:00:43 pve01 kernel: [1640777.491882] xhci_hcd idma64 xfrm_algo dca mdio wmi video pinctrl_cannonlake
Apr 25 13:00:43 pve01 kernel: [1640777.491911] CPU: 6 PID: 91 Comm: ksmd Tainted: P W O 5.13.19-6-pve #1
Apr 25 13:00:43 pve01 kernel: [1640777.491914] Hardware name: Intel(R) Client Systems NUC9i7QNX/NUC9i7QNB, BIOS QXCFL579.0034.2019.1
125.1436 11/25/2019
Apr 25 13:00:43 pve01 kernel: [1640777.491916] RIP: 0010:try_grab_page+0xec/0x100
Apr 25 13:00:43 pve01 kernel: [1640777.491918] Code: fa 8b 47 34 85 c0 7e 22 f0 ff 47 34 b8 01 00 00 00 c3 c3 48 8b 48 08 ba 00 04 0
0 00 83 e1 01 74 a9 eb 84 0f 0b e9 2b ff ff ff <0f> 0b 31 c0 c3 0f 0b 31 c0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
Apr 25 13:00:43 pve01 kernel: [1640777.491921] RSP: 0018:ffff9968003dfd50 EFLAGS: 00010246
Apr 25 13:00:43 pve01 kernel: [1640777.491923] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
Apr 25 13:00:43 pve01 kernel: [1640777.491941] RDX: fffff9eab34da6c7 RSI: 0000000000000004 RDI: fffff9eab34d3e80
Apr 25 13:00:43 pve01 kernel: [1640777.491943] RBP: ffff9968003dfdb0 R08: 8000000cd34fa807 R09: fffff9eab0149480
Apr 25 13:00:43 pve01 kernel: [1640777.491944] R10: 0000000c05252067 R11: 0000000000000807 R12: fffff9eab01494a8
Apr 25 13:00:43 pve01 kernel: [1640777.491946] R13: ffff8a13c5252cb0 R14: ffff8a0caef52960 R15: fffff9eab34d3e80
Apr 25 13:00:43 pve01 kernel: [1640777.491948] FS: 0000000000000000(0000) GS:ffff8a1820d00000(0000) knlGS:0000000000000000
Apr 25 13:00:43 pve01 kernel: [1640777.491950] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 25 13:00:43 pve01 kernel: [1640777.491951] CR2: 00007f1a6d102a18 CR3: 0000000485e10005 CR4: 00000000003726e0
Apr 25 13:00:43 pve01 kernel: [1640777.491953] Call Trace:
Apr 25 13:00:43 pve01 kernel: [1640777.491954] <TASK>
Apr 25 13:00:43 pve01 kernel: [1640777.491956] ? follow_page_pte+0x2ba/0x4b0
Apr 25 13:00:43 pve01 kernel: [1640777.491959] follow_page_mask+0x4a9/0x810
Apr 25 13:00:43 pve01 kernel: [1640777.491961] follow_page+0x37/0x90
Apr 25 13:00:43 pve01 kernel: [1640777.491963] ksm_scan_thread+0xb6e/0x1c30
Apr 25 13:00:43 pve01 kernel: [1640777.491966] ? __wake_up_pollfree+0x40/0x40
Apr 25 13:00:43 pve01 kernel: [1640777.491968] ? try_to_merge_with_ksm_page+0xd0/0xd0
Apr 25 13:00:43 pve01 kernel: [1640777.491970] kthread+0x12b/0x150
Apr 25 13:00:43 pve01 kernel: [1640777.491972] ? set_kthread_struct+0x50/0x50
Apr 25 13:00:43 pve01 kernel: [1640777.491975] ret_from_fork+0x22/0x30
Apr 25 13:00:43 pve01 kernel: [1640777.491978] </TASK>
Apr 25 13:00:43 pve01 kernel: [1640777.491979] ---[ end trace 5dac039dc4b9baa2 ]---