Hi all,
because they're really cheap I got a used Lenovo M600 with an N3700 combined with a 2TB SSD to test PBS out in my homelab.
However I've had this thing crash on me regularly and I have trouble finding the root cause.
Previously I've disabled swap as earlier crashes had "read error on swap device" as a reason but now I don't even get a display output when connecting a monitor to the crashed device.
This is the syslog output from right before the last crash:
For some reason the iGPU seems to have regular issues but those appear regularly every minute even when the device is operating normally.
Any advice on stuff I should enable/disable/change, commands I can run that give me more infos, etc?
I'm also slightly suspicous of the chinese SSD, any tips there on how I can check whether it's operating normally, as PBS expects?
Thank you for your help!
because they're really cheap I got a used Lenovo M600 with an N3700 combined with a 2TB SSD to test PBS out in my homelab.
However I've had this thing crash on me regularly and I have trouble finding the root cause.
Previously I've disabled swap as earlier crashes had "read error on swap device" as a reason but now I don't even get a display output when connecting a monitor to the crashed device.
This is the syslog output from right before the last crash:
Sep 05 00:16:42 zilean kernel: i915 0000:00:02.0: drm_WARN_ON((intel_uncore_read(&dev_priv->uncore, ((const i915_reg_t){ .reg = (0x130090) })) & mask) != mask)
Sep 05 00:16:42 zilean kernel: WARNING: CPU: 3 PID: 9294 at drivers/gpu/drm/i915/vlv_suspend.c:396 vlv_suspend_complete+0x6fc/0x710 [i915]
Sep 05 00:16:42 zilean kernel: Modules linked in: bonding tls sunrpc binfmt_misc intel_rapl_msr intel_rapl_common intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic kvm irqbypass punit_atom_debug crct10dif_pclmul polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel mei_pxp mei_hdcp crypto_simd i915 snd_hda_intel snd_intel_dspcfg cryptd snd_intel_sdw_acpi snd_hda_codec snd_hda_core drm_buddy snd_hwdep ttm snd_pcm intel_cstate think_lmi serio_raw drm_display_helper snd_timer wmi_bmof firmware_attributes_class ov5693 hci_uart cec mei_txe v4l2_cci btqca pcspkr snd intel_xhci_usb_role_switch mei at24 rc_core btrtl v4l2_fwnode btintel soundcore v4l2_async btbcm i2c_algo_bit bluetooth videodev mc ecdh_generic ecc rfkill_gpio intel_int0002_vgpio mac_hid zfs(PO) spl(O) efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c spi_intel_platform spi_intel ahci xhci_pci xhci_pci_renesas libahci r8169 crc32_pclmul i2c_i801 xhci_hcd i2c_smbus
Sep 05 00:16:42 zilean kernel: lpc_ich realtek i2c_hid_acpi i2c_hid video wmi hid
Sep 05 00:16:42 zilean kernel: CPU: 3 PID: 9294 Comm: kworker/3:1 Tainted: P W O 6.8.8-3-pve #1
Sep 05 00:16:42 zilean kernel: Hardware name: LENOVO 10G9000QGE/BRASWELL, BIOS M00KT33AUS 08/12/2015
Sep 05 00:16:42 zilean kernel: Workqueue: pm pm_runtime_work
Sep 05 00:16:42 zilean kernel: RIP: 0010:vlv_suspend_complete+0x6fc/0x710 [i915]
Sep 05 00:16:42 zilean kernel: Code: 8b 7b 08 4c 8b 6f 50 4d 85 ed 74 25 e8 bd cc 38 cc 48 c7 c1 b0 34 8a c1 4c 89 ea 48 c7 c7 a8 fe 8d c1 48 89 c6 e8 b4 f3 99 cb <0f> 0b e9 86 f9 ff ff 4c 8b 2f eb d6 0f 1f 84 00 00 00 00 00 90 90
Sep 05 00:16:42 zilean kernel: RSP: 0018:ffffa7914942fca0 EFLAGS: 00010246
Sep 05 00:16:42 zilean kernel: RAX: 0000000000000000 RBX: ffff8e05c4bb4000 RCX: 0000000000000000
Sep 05 00:16:42 zilean kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Sep 05 00:16:42 zilean kernel: RBP: ffffa7914942fcc0 R08: 0000000000000000 R09: 0000000000000000
Sep 05 00:16:42 zilean kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e05c4bb5cc8
Sep 05 00:16:42 zilean kernel: R13: ffff8e05c13f3f10 R14: ffff8e05c4bb5cc8 R15: ffff8e05c4bb4000
Sep 05 00:16:42 zilean kernel: FS: 0000000000000000(0000) GS:ffff8e063bd80000(0000) knlGS:0000000000000000
Sep 05 00:16:42 zilean kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 05 00:16:42 zilean kernel: CR2: 00007aaa47602000 CR3: 000000010c436000 CR4: 00000000001006f0
Sep 05 00:16:42 zilean kernel: Call Trace:
Sep 05 00:16:42 zilean kernel: <TASK>
Sep 05 00:16:42 zilean kernel: ? show_regs+0x6d/0x80
Sep 05 00:16:42 zilean kernel: ? __warn+0x89/0x160
Sep 05 00:16:42 zilean kernel: ? vlv_suspend_complete+0x6fc/0x710 [i915]
Sep 05 00:16:42 zilean kernel: ? report_bug+0x17e/0x1b0
Sep 05 00:16:42 zilean kernel: ? handle_bug+0x46/0x90
Sep 05 00:16:42 zilean kernel: ? exc_invalid_op+0x18/0x80
Sep 05 00:16:42 zilean kernel: ? asm_exc_invalid_op+0x1b/0x20
Sep 05 00:16:42 zilean kernel: ? vlv_suspend_complete+0x6fc/0x710 [i915]
Sep 05 00:16:42 zilean kernel: intel_runtime_suspend+0xe4/0x2c0 [i915]
Sep 05 00:16:42 zilean kernel: ? __pfx_pci_pm_runtime_suspend+0x10/0x10
Sep 05 00:16:42 zilean kernel: pci_pm_runtime_suspend+0x6a/0x1f0
Sep 05 00:16:42 zilean kernel: ? __pfx_pci_pm_runtime_suspend+0x10/0x10
Sep 05 00:16:42 zilean kernel: __rpm_callback+0x50/0x170
Sep 05 00:16:42 zilean kernel: ? __pfx_pci_pm_runtime_suspend+0x10/0x10
Sep 05 00:16:42 zilean kernel: rpm_callback+0x6d/0x80
Sep 05 00:16:42 zilean kernel: ? __pfx_pci_pm_runtime_suspend+0x10/0x10
Sep 05 00:16:42 zilean kernel: rpm_suspend+0x122/0x6b0
Sep 05 00:16:42 zilean kernel: ? __schedule+0x409/0x15e0
Sep 05 00:16:42 zilean kernel: pm_runtime_work+0xc6/0xe0
Sep 05 00:16:42 zilean kernel: process_one_work+0x16d/0x350
Sep 05 00:16:42 zilean kernel: worker_thread+0x306/0x440
Sep 05 00:16:42 zilean kernel: ? __pfx_worker_thread+0x10/0x10
Sep 05 00:16:42 zilean kernel: kthread+0xf2/0x120
Sep 05 00:16:42 zilean kernel: ? __pfx_kthread+0x10/0x10
Sep 05 00:16:42 zilean kernel: ret_from_fork+0x47/0x70
Sep 05 00:16:42 zilean kernel: ? __pfx_kthread+0x10/0x10
Sep 05 00:16:42 zilean kernel: ret_from_fork_asm+0x1b/0x30
Sep 05 00:16:42 zilean kernel: </TASK>
Sep 05 00:16:42 zilean kernel: ---[ end trace 0000000000000000 ]---
Sep 05 00:17:01 zilean CRON[9430]: pam_unix(cron:account): account root has password changed in future
Sep 05 00:17:01 zilean CRON[9430]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 05 00:17:01 zilean CRON[9431]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Sep 05 00:17:01 zilean CRON[9430]: pam_unix(cron:session): session closed for user root
Sep 05 00:17:54 zilean kernel: ------------[ cut here ]------------
Sep 05 00:17:54 zilean kernel: i915 0000:00:02.0: drm_WARN_ON((intel_uncore_read(&dev_priv->uncore, ((const i915_reg_t){ .reg = (0x130090) })) & mask) != mask)
Sep 05 00:17:54 zilean kernel: WARNING: CPU: 3 PID: 9139 at drivers/gpu/drm/i915/vlv_suspend.c:396 vlv_suspend_complete+0x6fc/0x710 [i915]
Sep 05 00:17:54 zilean kernel: Modules linked in: bonding tls sunrpc binfmt_misc intel_rapl_msr intel_rapl_common intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic kvm irqbypass punit_atom_debug crct10dif_pclmul polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel mei_pxp mei_hdcp crypto_simd i915 snd_hda_intel snd_intel_dspcfg cryptd snd_intel_sdw_acpi snd_hda_codec snd_hda_core drm_buddy snd_hwdep ttm snd_pcm intel_cstate think_lmi serio_raw drm_display_helper snd_timer wmi_bmof firmware_attributes_class ov5693 hci_uart cec mei_txe v4l2_cci btqca pcspkr snd intel_xhci_usb_role_switch mei at24 rc_core btrtl v4l2_fwnode btintel soundcore v4l2_async btbcm i2c_algo_bit bluetooth videodev mc ecdh_generic ecc rfkill_gpio intel_int0002_vgpio mac_hid zfs(PO) spl(O) efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c spi_intel_platform spi_intel ahci xhci_pci xhci_pci_renesas libahci r8169 crc32_pclmul i2c_i801 xhci_hcd i2c_smbus
Sep 05 00:17:54 zilean kernel: lpc_ich realtek i2c_hid_acpi i2c_hid video wmi hid
Sep 05 00:17:54 zilean kernel: CPU: 3 PID: 9139 Comm: kworker/3:2 Tainted: P W O 6.8.8-3-pve #1
Sep 05 00:17:54 zilean kernel: Hardware name: LENOVO 10G9000QGE/BRASWELL, BIOS M00KT33AUS 08/12/2015
Sep 05 00:17:54 zilean kernel: Workqueue: pm pm_runtime_work
Sep 05 00:17:54 zilean kernel: RIP: 0010:vlv_suspend_complete+0x6fc/0x710 [i915]
Sep 05 00:17:54 zilean kernel: Code: 8b 7b 08 4c 8b 6f 50 4d 85 ed 74 25 e8 bd cc 38 cc 48 c7 c1 b0 34 8a c1 4c 89 ea 48 c7 c7 a8 fe 8d c1 48 89 c6 e8 b4 f3 99 cb <0f> 0b e9 86 f9 ff ff 4c 8b 2f eb d6 0f 1f 84 00 00 00 00 00 90 90
Sep 05 00:17:54 zilean kernel: RSP: 0018:ffffa79148fcfca0 EFLAGS: 00010246
Sep 05 00:17:54 zilean kernel: RAX: 0000000000000000 RBX: ffff8e05c4bb4000 RCX: 0000000000000000
Sep 05 00:17:54 zilean kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Sep 05 00:17:54 zilean kernel: RBP: ffffa79148fcfcc0 R08: 0000000000000000 R09: 0000000000000000
Sep 05 00:17:54 zilean kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e05c4bb5cc8
Sep 05 00:17:54 zilean kernel: R13: ffff8e05c13f3f10 R14: ffff8e05c4bb5cc8 R15: ffff8e05c4bb4000
Sep 05 00:17:54 zilean kernel: FS: 0000000000000000(0000) GS:ffff8e063bd80000(0000) knlGS:0000000000000000
Sep 05 00:17:54 zilean kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 05 00:17:54 zilean kernel: CR2: 00007ffdf502d088 CR3: 000000010c436000 CR4: 00000000001006f0
Sep 05 00:17:54 zilean kernel: Call Trace:
Sep 05 00:17:54 zilean kernel: <TASK>
Sep 05 00:17:54 zilean kernel: ? show_regs+0x6d/0x80
Sep 05 00:17:54 zilean kernel: ? __warn+0x89/0x160
Sep 05 00:17:54 zilean kernel: ? vlv_suspend_complete+0x6fc/0x710 [i915]
Sep 05 00:17:54 zilean kernel: ? report_bug+0x17e/0x1b0
Sep 05 00:17:54 zilean kernel: ? handle_bug+0x46/0x90
Sep 05 00:17:54 zilean kernel: ? exc_invalid_op+0x18/0x80
Sep 05 00:17:54 zilean kernel: ? asm_exc_invalid_op+0x1b/0x20
Sep 05 00:17:54 zilean kernel: ? vlv_suspend_complete+0x6fc/0x710 [i915]
Sep 05 00:17:54 zilean kernel: intel_runtime_suspend+0xe4/0x2c0 [i915]
Sep 05 00:17:54 zilean kernel: ? __pfx_pci_pm_runtime_suspend+0x10/0x10
Sep 05 00:17:54 zilean kernel: pci_pm_runtime_suspend+0x6a/0x1f0
Sep 05 00:17:54 zilean kernel: ? __pfx_pci_pm_runtime_suspend+0x10/0x10
Sep 05 00:17:54 zilean kernel: __rpm_callback+0x50/0x170
Sep 05 00:17:54 zilean kernel: ? __pfx_pci_pm_runtime_suspend+0x10/0x10
Sep 05 00:17:54 zilean kernel: rpm_callback+0x6d/0x80
Sep 05 00:17:54 zilean kernel: ? __pfx_pci_pm_runtime_suspend+0x10/0x10
Sep 05 00:17:54 zilean kernel: rpm_suspend+0x122/0x6b0
Sep 05 00:17:54 zilean kernel: ? __schedule+0x409/0x15e0
Sep 05 00:17:54 zilean kernel: pm_runtime_work+0xc6/0xe0
Sep 05 00:17:54 zilean kernel: process_one_work+0x16d/0x350
Sep 05 00:17:54 zilean kernel: worker_thread+0x306/0x440
Sep 05 00:17:54 zilean kernel: ? __pfx_worker_thread+0x10/0x10
Sep 05 00:17:54 zilean kernel: kthread+0xf2/0x120
Sep 05 00:17:54 zilean kernel: ? __pfx_kthread+0x10/0x10
Sep 05 00:17:54 zilean kernel: ret_from_fork+0x47/0x70
Sep 05 00:17:54 zilean kernel: ? __pfx_kthread+0x10/0x10
Sep 05 00:17:54 zilean kernel: ret_from_fork_asm+0x1b/0x30
Sep 05 00:17:54 zilean kernel: </TASK>
Sep 05 00:17:54 zilean kernel: ---[ end trace 0000000000000000 ]---
Sep 05 00:18:14 zilean kernel: ------------[ cut here ]------------
Sep 05 00:18:14 zilean kernel: i915 0000:00:02.0: drm_WARN_ON((intel_uncore_read(&dev_priv->uncore, ((const i915_reg_t){ .reg = (0x130090) })) & mask) != mask)
Sep 05 00:18:14 zilean kernel: WARNING: CPU: 3 PID: 9294 at drivers/gpu/drm/i915/vlv_suspend.c:396 vlv_suspend_complete+0x6fc/0x710 [i915]
Sep 05 00:18:14 zilean kernel: Modules linked in: bonding tls sunrpc binfmt_misc intel_rapl_msr intel_rapl_common intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic kvm irqbypass punit_atom_debug crct10dif_pclmul polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel mei_pxp mei_hdcp crypto_simd i915 snd_hda_intel snd_intel_dspcfg cryptd snd_intel_sdw_acpi snd_hda_codec snd_hda_core drm_buddy snd_hwdep ttm snd_pcm intel_cstate think_lmi serio_raw drm_display_helper snd_timer wmi_bmof firmware_attributes_class ov5693 hci_uart cec mei_txe v4l2_cci btqca pcspkr snd intel_xhci_usb_role_switch mei at24 rc_core btrtl v4l2_fwnode btintel soundcore v4l2_async btbcm i2c_algo_bit bluetooth videodev mc ecdh_generic ecc rfkill_gpio intel_int0002_vgpio mac_hid zfs(PO) spl(O) efi_pstore dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c spi_intel_platform spi_intel ahci xhci_pci xhci_pci_renesas libahci r8169 crc32_pclmul i2c_i801 xhci_hcd i2c_smbus
Sep 05 00:18:14 zilean kernel: lpc_ich realtek i2c_hid_acpi i2c_hid video wmi hid
Sep 05 00:18:14 zilean kernel: CPU: 3 PID: 9294 Comm: kworker/3:1 Tainted: P W O 6.8.8-3-pve #1
Sep 05 00:18:14 zilean kernel: Hardware name: LENOVO 10G9000QGE/BRASWELL, BIOS M00KT33AUS 08/12/2015
Sep 05 00:18:14 zilean kernel: Workqueue: pm pm_runtime_work
Sep 05 00:18:14 zilean kernel: RIP: 0010:vlv_suspend_complete+0x6fc/0x710 [i915]
Sep 05 00:18:14 zilean kernel: Code: 8b 7b 08 4c 8b 6f 50 4d 85 ed 74 25 e8 bd cc 38 cc 48 c7 c1 b0 34 8a c1 4c 89 ea 48 c7 c7 a8 fe 8d c1 48 89 c6 e8 b4 f3 99 cb <0f> 0b e9 86 f9 ff ff 4c 8b 2f eb d6 0f 1f 84 00 00 00 00 00 90 90
Sep 05 00:18:14 zilean kernel: RSP: 0018:ffffa7914942fca0 EFLAGS: 00010246
Sep 05 00:18:14 zilean kernel: RAX: 0000000000000000 RBX: ffff8e05c4bb4000 RCX: 0000000000000000
Sep 05 00:18:14 zilean kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Sep 05 00:18:14 zilean kernel: RBP: ffffa7914942fcc0 R08: 0000000000000000 R09: 0000000000000000
Sep 05 00:18:14 zilean kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e05c4bb5cc8
Sep 05 00:18:14 zilean kernel: R13: ffff8e05c13f3f10 R14: ffff8e05c4bb5cc8 R15: ffff8e05c4bb4000
Sep 05 00:18:14 zilean kernel: FS: 0000000000000000(0000) GS:ffff8e063bd80000(0000) knlGS:0000000000000000
Sep 05 00:18:14 zilean kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 05 00:18:14 zilean kernel: CR2: 00007ffdf502d088 CR3: 000000010c436000 CR4: 00000000001006f0
Sep 05 00:18:14 zilean kernel: Call Trace:
Sep 05 00:18:14 zilean kernel: <TASK>
Sep 05 00:18:14 zilean kernel: ? show_regs+0x6d/0x80
Sep 05 00:18:14 zilean kernel: ? __warn+0x89/0x160
Sep 05 00:18:14 zilean kernel: ? vlv_suspend_complete+0x6fc/0x710 [i915]
Sep 05 00:18:14 zilean kernel: ? report_bug+0x17e/0x1b0
Sep 05 00:18:14 zilean kernel: ? handle_bug+0x46/0x90
Sep 05 00:18:14 zilean kernel: ? exc_invalid_op+0x18/0x80
Sep 05 00:18:14 zilean kernel: ? asm_exc_invalid_op+0x1b/0x20
Sep 05 00:18:14 zilean kernel: ? vlv_suspend_complete+0x6fc/0x710 [i915]
Sep 05 00:18:14 zilean kernel: intel_runtime_suspend+0xe4/0x2c0 [i915]
Sep 05 00:18:14 zilean kernel: ? __pfx_pci_pm_runtime_suspend+0x10/0x10
Sep 05 00:18:14 zilean kernel: pci_pm_runtime_suspend+0x6a/0x1f0
Sep 05 00:18:14 zilean kernel: ? __pfx_pci_pm_runtime_suspend+0x10/0x10
Sep 05 00:18:14 zilean kernel: __rpm_callback+0x50/0x170
Sep 05 00:18:14 zilean kernel: ? __pfx_pci_pm_runtime_suspend+0x10/0x10
Sep 05 00:18:14 zilean kernel: rpm_callback+0x6d/0x80
Sep 05 00:18:14 zilean kernel: ? __pfx_pci_pm_runtime_suspend+0x10/0x10
Sep 05 00:18:14 zilean kernel: rpm_suspend+0x122/0x6b0
Sep 05 00:18:14 zilean kernel: ? __schedule+0x409/0x15e0
Sep 05 00:18:14 zilean kernel: ? add_timer+0x20/0x40
Sep 05 00:18:14 zilean kernel: pm_runtime_work+0xc6/0xe0
Sep 05 00:18:14 zilean kernel: process_one_work+0x16d/0x350
Sep 05 00:18:14 zilean kernel: worker_thread+0x306/0x440
Sep 05 00:18:14 zilean kernel: ? __pfx_worker_thread+0x10/0x10
Sep 05 00:18:14 zilean kernel: kthread+0xf2/0x120
Sep 05 00:18:14 zilean kernel: ? __pfx_kthread+0x10/0x10
Sep 05 00:18:14 zilean kernel: ret_from_fork+0x47/0x70
Sep 05 00:18:14 zilean kernel: ? __pfx_kthread+0x10/0x10
Sep 05 00:18:14 zilean kernel: ret_from_fork_asm+0x1b/0x30
Sep 05 00:18:14 zilean kernel: </TASK>
Sep 05 00:18:14 zilean kernel: ---[ end trace 0000000000000000 ]---
Sep 05 00:24:01 zilean CRON[9437]: pam_unix(cron:account): account root has password changed in future
Sep 05 00:24:01 zilean CRON[9437]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 05 00:24:01 zilean CRON[9438]: (root) CMD (if [ $(date +%w) -eq 0 ] && [ -x /usr/lib/zfs-linux/trim ]; then /usr/lib/zfs-linux/trim; fi)
Sep 05 00:24:01 zilean CRON[9437]: pam_unix(cron:session): session closed for user root
-- Reboot --
Sep 11 10:39:54 zilean kernel: Linux version 6.8.8-3-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.8-3 (2024-07-16T16:16Z) ()
For some reason the iGPU seems to have regular issues but those appear regularly every minute even when the device is operating normally.
Any advice on stuff I should enable/disable/change, commands I can run that give me more infos, etc?
I'm also slightly suspicous of the chinese SSD, any tips there on how I can check whether it's operating normally, as PBS expects?
Thank you for your help!