Hallo liebe Community,
ich habe ein Problem mit meinem PVE (V7.2-7).
Seit einiger Zeit läuft einer meiner Server nicht mehr richtig.
Ich hatte ein Cluster aus 3 Nodes, 2x Minisforum HM90 und 1x in einer VM.
Mein PVE1 stieg eines Tages aus, teils war die Oberfläche noch erreichbar mit ? im Rechenzentrum und manchmal war der PVE nicht mehr erreichbar.
Abhilfe schaffte in beiden Fällen nur noch am PC ausschalten.
Was ich herausgefunden habe, dass pvestatd hängt, wenn mir Uptime-Kuma rechtzeitig eine Nachricht schickte, konnte ich mit einem pvestatd restart das ganze wieder zum Laufen bringen.
Kurz zur Hardware;
1x NVME SSD + 1x SSD im ZFS-Mirror
1x 32 GB RAM
Folgendes habe ich nach dem Auflösen des Clusters leider ohne Erfolg ausprobiert;
Im Syslog bekomme ich folgende Fehler...
Memtest86..
smartctl...
Proxmox Version
Da beide HM90 bis auf die NVME identisch sind aber der Fehler nur auf einen auftritt befürchte ich fast, das es an dem HM90 liegt.
Ich hoffe ihr könnt mir weiterhelfen
Gruß Frank
ich habe ein Problem mit meinem PVE (V7.2-7).
Seit einiger Zeit läuft einer meiner Server nicht mehr richtig.
Ich hatte ein Cluster aus 3 Nodes, 2x Minisforum HM90 und 1x in einer VM.
Mein PVE1 stieg eines Tages aus, teils war die Oberfläche noch erreichbar mit ? im Rechenzentrum und manchmal war der PVE nicht mehr erreichbar.
Abhilfe schaffte in beiden Fällen nur noch am PC ausschalten.
Was ich herausgefunden habe, dass pvestatd hängt, wenn mir Uptime-Kuma rechtzeitig eine Nachricht schickte, konnte ich mit einem pvestatd restart das ganze wieder zum Laufen bringen.
Kurz zur Hardware;
1x NVME SSD + 1x SSD im ZFS-Mirror
1x 32 GB RAM
Folgendes habe ich nach dem Auflösen des Clusters leider ohne Erfolg ausprobiert;
- PVE mit der Version 7.2 neu installiert
- PVE mit der Version 7.1 neu installiert
- Kernel downgrade von 5.15.53-1 auf 5.13.19-2
- Installation von PVE nur auf der NVME SSD (aktueller Stand)
- Memtest86 über 25 Std. ohne Fehler
- Smartctl zeigt keine Fehler bei beiden SSD´s
Im Syslog bekomme ich folgende Fehler...
Sep 14 03:10:01 pve1 CRON[416351]: (root) CMD (test -e /run/systemd/system || SERVICE_MODE=1 /sbin/e2scrub_all -A -r)
Sep 14 03:10:01 pve1 CRON[416350]: pam_unix(cron:session): session closed for user root
Sep 14 03:17:01 pve1 CRON[419211]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 14 03:17:01 pve1 CRON[419212]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Sep 14 03:17:01 pve1 CRON[419211]: pam_unix(cron:session): session closed for user root
Sep 14 04:17:01 pve1 CRON[443694]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 14 04:17:01 pve1 CRON[443695]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Sep 14 04:17:01 pve1 CRON[443694]: pam_unix(cron:session): session closed for user root
Sep 14 04:32:55 pve1 kernel: BUG: unable to handle page fault for address: ffffffff8b4cb014
Sep 14 04:32:55 pve1 kernel: #PF: supervisor write access in kernel mode
Sep 14 04:32:55 pve1 kernel: #PF: error_code(0x0003) - permissions violation
Sep 14 04:32:55 pve1 kernel: PGD 5fc415067 P4D 5fc415067 PUD 5fc416063 PMD 5fa8001e1
Sep 14 04:32:55 pve1 kernel: Oops: 0003 [#1] SMP NOPTI
Sep 14 04:32:55 pve1 kernel: CPU: 6 PID: 450224 Comm: ps Tainted: P O 5.15.53-1-pve #1
Sep 14 04:32:55 pve1 kernel: Hardware name: BESSTAR TECH LIMITED HM90/HM90, BIOS 5.16 10/13/2021
Sep 14 04:32:55 pve1 kernel: RIP: 0010:apparmor_ptrace_access_check+0x7a/0x1a0
Sep 14 04:32:55 pve1 kernel: Code: 8c f0 fe ff 83 fb 01 4c 89 e7 19 d2 48 89 c6 49 89 c5 83 e2 fe 83 c2 04 e8 93 fc fe ff 41 89 c7 4d 85 ed 74 1c b8 ff ff ff ff <f0> 41 0f c1 45 00 83 f8 01 0f 84 9e 00 00 00 85 c0 0f 8e a3 00 00
Sep 14 04:32:55 pve1 kernel: RSP: 0018:ffffb27a86f13b88 EFLAGS: 00010286
Sep 14 04:32:55 pve1 kernel: RAX: 00000000ffffffff RBX: 0000000000000001 RCX: 0000000000000000
Sep 14 04:32:55 pve1 kernel: RDX: 0000000000000004 RSI: ffff93f5400536b8 RDI: 0000000000000000
Sep 14 04:32:55 pve1 kernel: RBP: ffffb27a86f13bb0 R08: 0000000000000001 R09: 0000000000000001
Sep 14 04:32:55 pve1 kernel: R10: 000000000000000b R11: 0000000000000000 R12: ffff93f5400536b8
Sep 14 04:32:55 pve1 kernel: R13: ffffffff8b4cb014 R14: 0000000000000001 R15: 0000000000000000
Sep 14 04:32:55 pve1 kernel: FS: 00007f37399b37c0(0000) GS:ffff93fc2f780000(0000) knlGS:0000000000000000
Sep 14 04:32:55 pve1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 14 04:32:55 pve1 kernel: CR2: ffffffff8b4cb014 CR3: 000000016a810000 CR4: 0000000000350ee0
Sep 14 04:32:55 pve1 kernel: Call Trace:
Sep 14 04:32:55 pve1 kernel: <TASK>
Sep 14 04:32:55 pve1 kernel: security_ptrace_access_check+0x2f/0x50
Sep 14 04:32:55 pve1 kernel: __ptrace_may_access+0xdc/0x160
Sep 14 04:32:55 pve1 kernel: ptrace_may_access+0x2f/0x50
Sep 14 04:32:55 pve1 kernel: do_task_stat+0x97/0xd70
Sep 14 04:32:55 pve1 kernel: ? mod_objcg_state+0x185/0x340
Sep 14 04:32:55 pve1 kernel: ? kvmalloc_node+0x28/0xa0
Sep 14 04:32:55 pve1 kernel: ? memcg_slab_post_alloc_hook+0x19e/0x210
Sep 14 04:32:55 pve1 kernel: proc_tgid_stat+0x14/0x20
Sep 14 04:32:55 pve1 kernel: proc_single_show+0x52/0xc0
Sep 14 04:32:55 pve1 kernel: seq_read_iter+0x126/0x4b0
Sep 14 04:32:55 pve1 kernel: seq_read+0xfd/0x150
Sep 14 04:32:55 pve1 kernel: vfs_read+0xa0/0x1a0
Sep 14 04:32:55 pve1 kernel: ksys_read+0x67/0xf0
Sep 14 04:32:55 pve1 kernel: __x64_sys_read+0x1a/0x20
Sep 14 04:32:55 pve1 kernel: do_syscall_64+0x5c/0xc0
Sep 14 04:32:55 pve1 kernel: ? __x64_sys_close+0x12/0x50
Sep 14 04:32:55 pve1 kernel: ? do_syscall_64+0x69/0xc0
Sep 14 04:32:55 pve1 kernel: ? do_syscall_64+0x69/0xc0
Sep 14 04:32:55 pve1 kernel: ? do_syscall_64+0x69/0xc0
Sep 14 04:32:55 pve1 kernel: entry_SYSCALL_64_after_hwframe+0x61/0xcb
Sep 14 04:32:55 pve1 kernel: RIP: 0033:0x7f3739df384e
Sep 14 04:32:55 pve1 kernel: Code: c0 e9 b6 fe ff ff 50 48 8d 3d 2e 04 0b 00 e8 a9 fd 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
Sep 14 04:32:55 pve1 kernel: RSP: 002b:00007ffc1af4e6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Sep 14 04:32:55 pve1 kernel: RAX: ffffffffffffffda RBX: 00007f3739ef7690 RCX: 00007f3739df384e
Sep 14 04:32:55 pve1 kernel: RDX: 0000000000000800 RSI: 000055de2d3c1b50 RDI: 0000000000000006
Sep 14 04:32:55 pve1 kernel: RBP: 0000000000000006 R08: 00000000ffffffff R09: 00007ffc1af4e580
Sep 14 04:32:55 pve1 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Sep 14 04:32:55 pve1 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Sep 14 04:32:55 pve1 kernel: </TASK>
Sep 14 04:32:55 pve1 kernel: Modules linked in: tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables 8021q garp mrp bonding tls softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common edac_mce_amd amdgpu kvm_amd kvm snd_hda_codec_hdmi irqbypass crct10dif_pclmul iommu_v2 ghash_clmulni_intel gpu_sched drm_ttm_helper snd_usb_audio mt7921e aesni_intel snd_hda_intel snd_intel_dspcfg crypto_simd ttm joydev btusb mt76_connac_lib input_leds snd_usbmidi_lib snd_intel_sdw_acpi cryptd btrtl snd_hda_codec mt76 drm_kms_helper snd_rawmidi btbcm rapl snd_seq_device snd_hda_core snd_pci_acp6x btintel mc mac80211 cec snd_pci_acp5x snd_hwdep snd_pcm bluetooth rc_core i2c_algo_bit snd_timer k10temp efi_pstore pcspkr snd_rn_pci_acp3x ecdh_generic cfg80211 fb_sys_fops ecc snd syscopyarea snd_pci_acp3x sysfillrect sysimgblt soundcore libarc4 ccp cm32181 industrialio mac_hid vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm
Sep 14 04:32:55 pve1 kernel: ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c simplefb usbmouse usbkbd hid_cmedia hid_generic usbhid xhci_pci crc32_pclmul ahci xhci_pci_renesas i2c_piix4 libahci nvme amd_sfh xhci_hcd igc r8169 realtek nvme_core video i2c_hid_acpi i2c_hid hid
Sep 14 04:32:55 pve1 kernel: CR2: ffffffff8b4cb014
Sep 14 04:32:55 pve1 kernel: ---[ end trace 7bdf863b152cf802 ]---
Sep 14 04:32:55 pve1 kernel: RIP: 0010:apparmor_ptrace_access_check+0x7a/0x1a0
Sep 14 04:32:55 pve1 kernel: Code: 8c f0 fe ff 83 fb 01 4c 89 e7 19 d2 48 89 c6 49 89 c5 83 e2 fe 83 c2 04 e8 93 fc fe ff 41 89 c7 4d 85 ed 74 1c b8 ff ff ff ff <f0> 41 0f c1 45 00 83 f8 01 0f 84 9e 00 00 00 85 c0 0f 8e a3 00 00
Sep 14 04:32:55 pve1 kernel: RSP: 0018:ffffb27a86f13b88 EFLAGS: 00010286
Sep 14 04:32:55 pve1 kernel: RAX: 00000000ffffffff RBX: 0000000000000001 RCX: 0000000000000000
Sep 14 04:32:55 pve1 kernel: RDX: 0000000000000004 RSI: ffff93f5400536b8 RDI: 0000000000000000
Sep 14 04:32:55 pve1 kernel: RBP: ffffb27a86f13bb0 R08: 0000000000000001 R09: 0000000000000001
Sep 14 04:32:55 pve1 kernel: R10: 000000000000000b R11: 0000000000000000 R12: ffff93f5400536b8
Sep 14 04:32:55 pve1 kernel: R13: ffffffff8b4cb014 R14: 0000000000000001 R15: 0000000000000000
Sep 14 04:32:55 pve1 kernel: FS: 00007f37399b37c0(0000) GS:ffff93fc2f780000(0000) knlGS:0000000000000000
Sep 14 04:32:55 pve1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 14 04:32:55 pve1 kernel: CR2: ffffffff8b4cb014 CR3: 000000016a810000 CR4: 0000000000350ee0
Sep 14 04:33:25 pve1 kernel: watchdog: BUG: soft lockup - CPU#4 stuck for 26s! [pvestatd:1970]
Sep 14 04:33:25 pve1 kernel: Modules linked in: tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables 8021q garp mrp bonding tls softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common edac_mce_amd amdgpu kvm_amd kvm snd_hda_codec_hdmi irqbypass crct10dif_pclmul iommu_v2 ghash_clmulni_intel gpu_sched drm_ttm_helper snd_usb_audio mt7921e aesni_intel snd_hda_intel snd_intel_dspcfg crypto_simd ttm joydev btusb mt76_connac_lib input_leds snd_usbmidi_lib snd_intel_sdw_acpi cryptd btrtl snd_hda_codec mt76 drm_kms_helper snd_rawmidi btbcm rapl snd_seq_device snd_hda_core snd_pci_acp6x btintel mc mac80211 cec snd_pci_acp5x snd_hwdep snd_pcm bluetooth rc_core i2c_algo_bit snd_timer k10temp efi_pstore pcspkr snd_rn_pci_acp3x ecdh_generic cfg80211 fb_sys_fops ecc snd syscopyarea snd_pci_acp3x sysfillrect sysimgblt soundcore libarc4 ccp cm32181 industrialio mac_hid vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm
Sep 14 04:33:25 pve1 kernel: ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c simplefb usbmouse usbkbd hid_cmedia hid_generic usbhid xhci_pci crc32_pclmul ahci xhci_pci_renesas i2c_piix4 libahci nvme amd_sfh xhci_hcd igc r8169 realtek nvme_core video i2c_hid_acpi i2c_hid hid
Sep 14 04:33:25 pve1 kernel: CPU: 4 PID: 1970 Comm: pvestatd Tainted: P D O 5.15.53-1-pve #1
Sep 14 04:33:25 pve1 kernel: Hardware name: BESSTAR TECH LIMITED HM90/HM90, BIOS 5.16 10/13/2021
Sep 14 04:33:25 pve1 kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x240
Sep 14 04:33:25 pve1 kernel: Code: 2b 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 03 30 e4 09 d0 a9 00 01 ff ff 0f 85 13 01 00 00 85 c0 74 0e 8b 03 84 c0 74 08 f3 90 <8b> 03 84 c0 75 f8 b8 01 00 00 00 66 89 03 5b 41 5c 41 5d 41 5e 41
Sep 14 04:33:25 pve1 kernel: RSP: 0018:ffffb27a9424fd30 EFLAGS: 00000202
Sep 14 04:33:25 pve1 kernel: RAX: 0000000000000101 RBX: ffff93f5557af038 RCX: ffffb27a9424fe48
Sep 14 03:10:01 pve1 CRON[416350]: pam_unix(cron:session): session closed for user root
Sep 14 03:17:01 pve1 CRON[419211]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 14 03:17:01 pve1 CRON[419212]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Sep 14 03:17:01 pve1 CRON[419211]: pam_unix(cron:session): session closed for user root
Sep 14 04:17:01 pve1 CRON[443694]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Sep 14 04:17:01 pve1 CRON[443695]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Sep 14 04:17:01 pve1 CRON[443694]: pam_unix(cron:session): session closed for user root
Sep 14 04:32:55 pve1 kernel: BUG: unable to handle page fault for address: ffffffff8b4cb014
Sep 14 04:32:55 pve1 kernel: #PF: supervisor write access in kernel mode
Sep 14 04:32:55 pve1 kernel: #PF: error_code(0x0003) - permissions violation
Sep 14 04:32:55 pve1 kernel: PGD 5fc415067 P4D 5fc415067 PUD 5fc416063 PMD 5fa8001e1
Sep 14 04:32:55 pve1 kernel: Oops: 0003 [#1] SMP NOPTI
Sep 14 04:32:55 pve1 kernel: CPU: 6 PID: 450224 Comm: ps Tainted: P O 5.15.53-1-pve #1
Sep 14 04:32:55 pve1 kernel: Hardware name: BESSTAR TECH LIMITED HM90/HM90, BIOS 5.16 10/13/2021
Sep 14 04:32:55 pve1 kernel: RIP: 0010:apparmor_ptrace_access_check+0x7a/0x1a0
Sep 14 04:32:55 pve1 kernel: Code: 8c f0 fe ff 83 fb 01 4c 89 e7 19 d2 48 89 c6 49 89 c5 83 e2 fe 83 c2 04 e8 93 fc fe ff 41 89 c7 4d 85 ed 74 1c b8 ff ff ff ff <f0> 41 0f c1 45 00 83 f8 01 0f 84 9e 00 00 00 85 c0 0f 8e a3 00 00
Sep 14 04:32:55 pve1 kernel: RSP: 0018:ffffb27a86f13b88 EFLAGS: 00010286
Sep 14 04:32:55 pve1 kernel: RAX: 00000000ffffffff RBX: 0000000000000001 RCX: 0000000000000000
Sep 14 04:32:55 pve1 kernel: RDX: 0000000000000004 RSI: ffff93f5400536b8 RDI: 0000000000000000
Sep 14 04:32:55 pve1 kernel: RBP: ffffb27a86f13bb0 R08: 0000000000000001 R09: 0000000000000001
Sep 14 04:32:55 pve1 kernel: R10: 000000000000000b R11: 0000000000000000 R12: ffff93f5400536b8
Sep 14 04:32:55 pve1 kernel: R13: ffffffff8b4cb014 R14: 0000000000000001 R15: 0000000000000000
Sep 14 04:32:55 pve1 kernel: FS: 00007f37399b37c0(0000) GS:ffff93fc2f780000(0000) knlGS:0000000000000000
Sep 14 04:32:55 pve1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 14 04:32:55 pve1 kernel: CR2: ffffffff8b4cb014 CR3: 000000016a810000 CR4: 0000000000350ee0
Sep 14 04:32:55 pve1 kernel: Call Trace:
Sep 14 04:32:55 pve1 kernel: <TASK>
Sep 14 04:32:55 pve1 kernel: security_ptrace_access_check+0x2f/0x50
Sep 14 04:32:55 pve1 kernel: __ptrace_may_access+0xdc/0x160
Sep 14 04:32:55 pve1 kernel: ptrace_may_access+0x2f/0x50
Sep 14 04:32:55 pve1 kernel: do_task_stat+0x97/0xd70
Sep 14 04:32:55 pve1 kernel: ? mod_objcg_state+0x185/0x340
Sep 14 04:32:55 pve1 kernel: ? kvmalloc_node+0x28/0xa0
Sep 14 04:32:55 pve1 kernel: ? memcg_slab_post_alloc_hook+0x19e/0x210
Sep 14 04:32:55 pve1 kernel: proc_tgid_stat+0x14/0x20
Sep 14 04:32:55 pve1 kernel: proc_single_show+0x52/0xc0
Sep 14 04:32:55 pve1 kernel: seq_read_iter+0x126/0x4b0
Sep 14 04:32:55 pve1 kernel: seq_read+0xfd/0x150
Sep 14 04:32:55 pve1 kernel: vfs_read+0xa0/0x1a0
Sep 14 04:32:55 pve1 kernel: ksys_read+0x67/0xf0
Sep 14 04:32:55 pve1 kernel: __x64_sys_read+0x1a/0x20
Sep 14 04:32:55 pve1 kernel: do_syscall_64+0x5c/0xc0
Sep 14 04:32:55 pve1 kernel: ? __x64_sys_close+0x12/0x50
Sep 14 04:32:55 pve1 kernel: ? do_syscall_64+0x69/0xc0
Sep 14 04:32:55 pve1 kernel: ? do_syscall_64+0x69/0xc0
Sep 14 04:32:55 pve1 kernel: ? do_syscall_64+0x69/0xc0
Sep 14 04:32:55 pve1 kernel: entry_SYSCALL_64_after_hwframe+0x61/0xcb
Sep 14 04:32:55 pve1 kernel: RIP: 0033:0x7f3739df384e
Sep 14 04:32:55 pve1 kernel: Code: c0 e9 b6 fe ff ff 50 48 8d 3d 2e 04 0b 00 e8 a9 fd 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28
Sep 14 04:32:55 pve1 kernel: RSP: 002b:00007ffc1af4e6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Sep 14 04:32:55 pve1 kernel: RAX: ffffffffffffffda RBX: 00007f3739ef7690 RCX: 00007f3739df384e
Sep 14 04:32:55 pve1 kernel: RDX: 0000000000000800 RSI: 000055de2d3c1b50 RDI: 0000000000000006
Sep 14 04:32:55 pve1 kernel: RBP: 0000000000000006 R08: 00000000ffffffff R09: 00007ffc1af4e580
Sep 14 04:32:55 pve1 kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Sep 14 04:32:55 pve1 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Sep 14 04:32:55 pve1 kernel: </TASK>
Sep 14 04:32:55 pve1 kernel: Modules linked in: tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables 8021q garp mrp bonding tls softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common edac_mce_amd amdgpu kvm_amd kvm snd_hda_codec_hdmi irqbypass crct10dif_pclmul iommu_v2 ghash_clmulni_intel gpu_sched drm_ttm_helper snd_usb_audio mt7921e aesni_intel snd_hda_intel snd_intel_dspcfg crypto_simd ttm joydev btusb mt76_connac_lib input_leds snd_usbmidi_lib snd_intel_sdw_acpi cryptd btrtl snd_hda_codec mt76 drm_kms_helper snd_rawmidi btbcm rapl snd_seq_device snd_hda_core snd_pci_acp6x btintel mc mac80211 cec snd_pci_acp5x snd_hwdep snd_pcm bluetooth rc_core i2c_algo_bit snd_timer k10temp efi_pstore pcspkr snd_rn_pci_acp3x ecdh_generic cfg80211 fb_sys_fops ecc snd syscopyarea snd_pci_acp3x sysfillrect sysimgblt soundcore libarc4 ccp cm32181 industrialio mac_hid vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm
Sep 14 04:32:55 pve1 kernel: ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c simplefb usbmouse usbkbd hid_cmedia hid_generic usbhid xhci_pci crc32_pclmul ahci xhci_pci_renesas i2c_piix4 libahci nvme amd_sfh xhci_hcd igc r8169 realtek nvme_core video i2c_hid_acpi i2c_hid hid
Sep 14 04:32:55 pve1 kernel: CR2: ffffffff8b4cb014
Sep 14 04:32:55 pve1 kernel: ---[ end trace 7bdf863b152cf802 ]---
Sep 14 04:32:55 pve1 kernel: RIP: 0010:apparmor_ptrace_access_check+0x7a/0x1a0
Sep 14 04:32:55 pve1 kernel: Code: 8c f0 fe ff 83 fb 01 4c 89 e7 19 d2 48 89 c6 49 89 c5 83 e2 fe 83 c2 04 e8 93 fc fe ff 41 89 c7 4d 85 ed 74 1c b8 ff ff ff ff <f0> 41 0f c1 45 00 83 f8 01 0f 84 9e 00 00 00 85 c0 0f 8e a3 00 00
Sep 14 04:32:55 pve1 kernel: RSP: 0018:ffffb27a86f13b88 EFLAGS: 00010286
Sep 14 04:32:55 pve1 kernel: RAX: 00000000ffffffff RBX: 0000000000000001 RCX: 0000000000000000
Sep 14 04:32:55 pve1 kernel: RDX: 0000000000000004 RSI: ffff93f5400536b8 RDI: 0000000000000000
Sep 14 04:32:55 pve1 kernel: RBP: ffffb27a86f13bb0 R08: 0000000000000001 R09: 0000000000000001
Sep 14 04:32:55 pve1 kernel: R10: 000000000000000b R11: 0000000000000000 R12: ffff93f5400536b8
Sep 14 04:32:55 pve1 kernel: R13: ffffffff8b4cb014 R14: 0000000000000001 R15: 0000000000000000
Sep 14 04:32:55 pve1 kernel: FS: 00007f37399b37c0(0000) GS:ffff93fc2f780000(0000) knlGS:0000000000000000
Sep 14 04:32:55 pve1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 14 04:32:55 pve1 kernel: CR2: ffffffff8b4cb014 CR3: 000000016a810000 CR4: 0000000000350ee0
Sep 14 04:33:25 pve1 kernel: watchdog: BUG: soft lockup - CPU#4 stuck for 26s! [pvestatd:1970]
Sep 14 04:33:25 pve1 kernel: Modules linked in: tcp_diag inet_diag ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter nf_tables 8021q garp mrp bonding tls softdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common edac_mce_amd amdgpu kvm_amd kvm snd_hda_codec_hdmi irqbypass crct10dif_pclmul iommu_v2 ghash_clmulni_intel gpu_sched drm_ttm_helper snd_usb_audio mt7921e aesni_intel snd_hda_intel snd_intel_dspcfg crypto_simd ttm joydev btusb mt76_connac_lib input_leds snd_usbmidi_lib snd_intel_sdw_acpi cryptd btrtl snd_hda_codec mt76 drm_kms_helper snd_rawmidi btbcm rapl snd_seq_device snd_hda_core snd_pci_acp6x btintel mc mac80211 cec snd_pci_acp5x snd_hwdep snd_pcm bluetooth rc_core i2c_algo_bit snd_timer k10temp efi_pstore pcspkr snd_rn_pci_acp3x ecdh_generic cfg80211 fb_sys_fops ecc snd syscopyarea snd_pci_acp3x sysfillrect sysimgblt soundcore libarc4 ccp cm32181 industrialio mac_hid vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm
Sep 14 04:33:25 pve1 kernel: ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c simplefb usbmouse usbkbd hid_cmedia hid_generic usbhid xhci_pci crc32_pclmul ahci xhci_pci_renesas i2c_piix4 libahci nvme amd_sfh xhci_hcd igc r8169 realtek nvme_core video i2c_hid_acpi i2c_hid hid
Sep 14 04:33:25 pve1 kernel: CPU: 4 PID: 1970 Comm: pvestatd Tainted: P D O 5.15.53-1-pve #1
Sep 14 04:33:25 pve1 kernel: Hardware name: BESSTAR TECH LIMITED HM90/HM90, BIOS 5.16 10/13/2021
Sep 14 04:33:25 pve1 kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x79/0x240
Sep 14 04:33:25 pve1 kernel: Code: 2b 08 0f 92 c0 0f b6 c0 c1 e0 08 89 c2 8b 03 30 e4 09 d0 a9 00 01 ff ff 0f 85 13 01 00 00 85 c0 74 0e 8b 03 84 c0 74 08 f3 90 <8b> 03 84 c0 75 f8 b8 01 00 00 00 66 89 03 5b 41 5c 41 5d 41 5e 41
Sep 14 04:33:25 pve1 kernel: RSP: 0018:ffffb27a9424fd30 EFLAGS: 00000202
Sep 14 04:33:25 pve1 kernel: RAX: 0000000000000101 RBX: ffff93f5557af038 RCX: ffffb27a9424fe48
Memtest86..
smartctl...
root@pve1:~# smartctl -a /dev/nvme0n1
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.53-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Samsung SSD 970 EVO Plus 1TB
Serial Number: S6P7NG0R631117M
Firmware Version: 3B2QEXM7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 6
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization: 53,039,779,840 [53.0 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 5611515abf
Local Time is: Wed Sep 14 13:38:48 2022 CEST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057): Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f): S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size: 128 Pages
Warning Comp. Temp. Threshold: 82 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 7.54W - - 0 0 0 0 0 0
1 + 7.54W - - 1 1 1 1 0 200
2 + 7.54W - - 2 2 2 2 0 1000
3 - 0.0500W - - 3 3 3 3 2000 1200
4 - 0.0050W - - 4 4 4 4 500 9500
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 43 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 5%
Data Units Read: 58,154,539 [29.7 TB]
Data Units Written: 76,309,562 [39.0 TB]
Host Read Commands: 1,715,213,497
Host Write Commands: 2,086,121,001
Controller Busy Time: 5,950
Power Cycles: 212
Power On Hours: 3,175
Unsafe Shutdowns: 122
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 1
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 43 Celsius
Temperature Sensor 2: 42 Celsius
Thermal Temp. 1 Transition Count: 4
Thermal Temp. 2 Transition Count: 1
Thermal Temp. 1 Total Time: 446
Thermal Temp. 2 Total Time: 103
Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.53-1-pve] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Samsung SSD 970 EVO Plus 1TB
Serial Number: S6P7NG0R631117M
Firmware Version: 3B2QEXM7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 6
NVMe Version: 1.3
Number of Namespaces: 1
Namespace 1 Size/Capacity: 1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization: 53,039,779,840 [53.0 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 5611515abf
Local Time is: Wed Sep 14 13:38:48 2022 CEST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057): Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Log Page Attributes (0x0f): S/H_per_NS Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size: 128 Pages
Warning Comp. Temp. Threshold: 82 Celsius
Critical Comp. Temp. Threshold: 85 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 7.54W - - 0 0 0 0 0 0
1 + 7.54W - - 1 1 1 1 0 200
2 + 7.54W - - 2 2 2 2 0 1000
3 - 0.0500W - - 3 3 3 3 2000 1200
4 - 0.0050W - - 4 4 4 4 500 9500
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 43 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 5%
Data Units Read: 58,154,539 [29.7 TB]
Data Units Written: 76,309,562 [39.0 TB]
Host Read Commands: 1,715,213,497
Host Write Commands: 2,086,121,001
Controller Busy Time: 5,950
Power Cycles: 212
Power On Hours: 3,175
Unsafe Shutdowns: 122
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 1
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 43 Celsius
Temperature Sensor 2: 42 Celsius
Thermal Temp. 1 Transition Count: 4
Thermal Temp. 2 Transition Count: 1
Thermal Temp. 1 Total Time: 446
Thermal Temp. 2 Total Time: 103
Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
Proxmox Version
root@pve1:~# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.53-1-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-10
pve-kernel-5.15.53-1-pve: 5.15.53-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-8
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.5-1
proxmox-backup-file-restore: 2.2.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-1
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-3
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.5-pve1
proxmox-ve: 7.2-1 (running kernel: 5.15.53-1-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-10
pve-kernel-5.15.53-1-pve: 5.15.53-1
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-8
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.5-1
proxmox-backup-file-restore: 2.2.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-1
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-3
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.5-pve1
Da beide HM90 bis auf die NVME identisch sind aber der Fehler nur auf einen auftritt befürchte ich fast, das es an dem HM90 liegt.
Ich hoffe ihr könnt mir weiterhelfen
Gruß Frank