Yesterday, my PVE crashed "out of nowhere" (I did not change any configuration, issue any command or such, just normal VMs running as ever). It ran flawlessly on exactly this hardware for 2 years now.
Since the first thing that happened according to
The symptoms were that none of the services running in LXCs/VMs were reachable, neither was the WebUI, but the server did not shut down. Sadly, I did not have physical access beyond hard-rebooting the machine today, so this is all I know. The tasks log does not have any entry around that time.
Full log of the incident:
Is this issue known? Would it be mitigated by e.g. auto-restarting
Since the first thing that happened according to
journalctl -xeb-1
was, that the pvestatd.service
was killed, I assume the issue may be there.The symptoms were that none of the services running in LXCs/VMs were reachable, neither was the WebUI, but the server did not shut down. Sadly, I did not have physical access beyond hard-rebooting the machine today, so this is all I know. The tasks log does not have any entry around that time.
Full log of the incident:
Code:
Feb 14 16:15:11 rack0 systemd[1]: pvestatd.service: Main process exited, code=killed, status=9/KILL
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ An ExecStart= process belonging to unit pvestatd.service has exited.
░░
░░ The process' exit code is 'killed' and its exit status is 9.
Feb 14 16:15:11 rack0 kernel: BUG: unable to handle page fault for address: 0000000000006204
Feb 14 16:15:11 rack0 kernel: #PF: supervisor read access in kernel mode
Feb 14 16:15:11 rack0 kernel: #PF: error_code(0x0000) - not-present page
Feb 14 16:15:11 rack0 kernel: PGD 0 P4D 0
Feb 14 16:15:11 rack0 kernel: Oops: 0000 [#1] SMP NOPTI
Feb 14 16:15:11 rack0 kernel: CPU: 1 PID: 5021 Comm: pvestatd Tainted: P O 5.15.116-1-pve #1
Feb 14 16:15:11 rack0 kernel: Hardware name: BIOSTAR Group B560MX-E PRO/B560MX-E PRO, BIOS 5.19 12/21/2021
Feb 14 16:15:11 rack0 kernel: RIP: 0010:pid_nr_ns+0x14/0x40
Feb 14 16:15:11 rack0 kernel: Code: ba e8 50 7d 5a 00 eb b9 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 55 45 31 c0 48 89 e5 48 85 ff 74 15 8b 46 40 <3b> 47 04 77 0d 48 c1 e0 04 48 01 c7 48 39 77 68 74 09 44 89 c0 5d
Feb 14 16:15:11 rack0 kernel: RSP: 0018:ffffb47e0a253d60 EFLAGS: 00010206
Feb 14 16:15:11 rack0 kernel: RAX: 0000000000000000 RBX: ffffffffba08a780 RCX: 0000000000000000
Feb 14 16:15:11 rack0 kernel: RDX: 0000000000040006 RSI: ffffffffba08a780 RDI: 0000000000006200
Feb 14 16:15:11 rack0 kernel: RBP: ffffb47e0a253d60 R08: 0000000000000000 R09: ffffffffba08a780
Feb 14 16:15:11 rack0 kernel: R10: 0000000000000228 R11: ffffb47e0a253ce0 R12: 0000000000006200
Feb 14 16:15:11 rack0 kernel: R13: 000000000004473d R14: 000000000004473d R15: ffffb47e0a253e68
Feb 14 16:15:11 rack0 kernel: FS: 00007fc486078280(0000) GS:ffff93a8d5840000(0000) knlGS:0000000000000000
Feb 14 16:15:11 rack0 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 14 16:15:11 rack0 kernel: CR2: 0000000000006204 CR3: 000000014c506004 CR4: 0000000000772ee0
Feb 14 16:15:11 rack0 kernel: PKRU: 55555554
Feb 14 16:15:11 rack0 kernel: Call Trace:
Feb 14 16:15:11 rack0 kernel: <TASK>
Feb 14 16:15:11 rack0 kernel: ? __die_body.cold+0x1a/0x1f
Feb 14 16:15:11 rack0 kernel: ? __die+0x2b/0x37
Feb 14 16:15:11 rack0 kernel: ? page_fault_oops+0x136/0x2c0
Feb 14 16:15:11 rack0 kernel: ? do_user_addr_fault+0x1e0/0x660
Feb 14 16:15:11 rack0 kernel: ? do_user_addr_fault+0x31a/0x660
Feb 14 16:15:11 rack0 kernel: ? number+0x39a/0x400
Feb 14 16:15:11 rack0 kernel: ? exc_page_fault+0x77/0x170
Feb 14 16:15:11 rack0 kernel: ? asm_exc_page_fault+0x27/0x30
Feb 14 16:15:11 rack0 kernel: ? pid_nr_ns+0x14/0x40
Feb 14 16:15:11 rack0 kernel: next_tgid+0x4a/0x100
Feb 14 16:15:11 rack0 kernel: proc_pid_readdir+0xaf/0x220
Feb 14 16:15:11 rack0 kernel: proc_root_readdir+0x3a/0x50
Feb 14 16:15:11 rack0 kernel: iterate_dir+0x9f/0x1d0
Feb 14 16:15:11 rack0 kernel: __x64_sys_getdents64+0x78/0x110
Feb 14 16:15:11 rack0 kernel: ? __ia32_compat_sys_getdents+0x110/0x110
Feb 14 16:15:11 rack0 kernel: do_syscall_64+0x59/0xc0
Feb 14 16:15:11 rack0 kernel: ? exit_to_user_mode_prepare+0x37/0x1b0
Feb 14 16:15:11 rack0 kernel: ? irqentry_exit_to_user_mode+0x9/0x20
Feb 14 16:15:11 rack0 kernel: ? irqentry_exit+0x1d/0x30
Feb 14 16:15:11 rack0 kernel: ? exc_page_fault+0x89/0x170
Feb 14 16:15:11 rack0 kernel: entry_SYSCALL_64_after_hwframe+0x61/0xcb
Feb 14 16:15:11 rack0 kernel: RIP: 0033:0x7fc486176f07
Feb 14 16:15:11 rack0 kernel: Code: 0f 1f 00 48 8b 47 20 c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 81 fa ff ff ff 7f b8 ff ff ff 7f 48 0f 47 d0 b8 d9 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 59 af 10 00 f7 d8 64 89 02 48
Feb 14 16:15:11 rack0 kernel: RSP: 002b:00007fffb71f5a48 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
Feb 14 16:15:11 rack0 kernel: RAX: ffffffffffffffda RBX: 000056367ee182d0 RCX: 00007fc486176f07
Feb 14 16:15:11 rack0 kernel: RDX: 0000000000008000 RSI: 000056367ee18300 RDI: 0000000000000008
Feb 14 16:15:11 rack0 kernel: RBP: 000056367ee18300 R08: 0000000000000030 R09: 00005636784853b0
Feb 14 16:15:11 rack0 kernel: R10: 000056367edb64a8 R11: 0000000000000293 R12: ffffffffffffff80
Feb 14 16:15:11 rack0 kernel: R13: 000056367ee182d4 R14: 0000000000000000 R15: 000056367edb64a8
Feb 14 16:15:11 rack0 kernel: </TASK>
Feb 14 16:15:11 rack0 kernel: Modules linked in: tcp_diag udp_diag inet_diag binfmt_misc cfg80211 8021q garp mrp wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter xt_mark nft_compat rpcsec_gss_krb5 nfsv4 n>
Feb 14 16:15:11 rack0 kernel: snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine kvm snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec irqbypass crct10dif_pclmul ghash_clmulni_intel snd_hda_core aesni_intel snd_hwdep cec snd_pcm rc_core crypto_simd mei_hdcp i2c_algo_bit cryptd fb_sys_fops snd_timer intel_cstate ee1004 snd syscopyarea mei_me sysfillrect soundcore>
Feb 14 16:15:11 rack0 kernel: CR2: 0000000000006204
Feb 14 16:15:11 rack0 kernel: ---[ end trace c87fe6b1c9027956 ]---
Feb 14 16:15:11 rack0 kernel: RIP: 0010:pid_nr_ns+0x14/0x40
Feb 14 16:15:11 rack0 kernel: Code: ba e8 50 7d 5a 00 eb b9 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 0f 1f 44 00 00 55 45 31 c0 48 89 e5 48 85 ff 74 15 8b 46 40 <3b> 47 04 77 0d 48 c1 e0 04 48 01 c7 48 39 77 68 74 09 44 89 c0 5d
Feb 14 16:15:11 rack0 kernel: RSP: 0018:ffffb47e0a253d60 EFLAGS: 00010206
Feb 14 16:15:11 rack0 kernel: RAX: 0000000000000000 RBX: ffffffffba08a780 RCX: 0000000000000000
Feb 14 16:15:11 rack0 kernel: RDX: 0000000000040006 RSI: ffffffffba08a780 RDI: 0000000000006200
Feb 14 16:15:11 rack0 kernel: RBP: ffffb47e0a253d60 R08: 0000000000000000 R09: ffffffffba08a780
Feb 14 16:15:11 rack0 kernel: R10: 0000000000000228 R11: ffffb47e0a253ce0 R12: 0000000000006200
Feb 14 16:15:11 rack0 kernel: R13: 000000000004473d R14: 000000000004473d R15: ffffb47e0a253e68
Feb 14 16:15:11 rack0 kernel: FS: 00007fc486078280(0000) GS:ffff93a8d5840000(0000) knlGS:0000000000000000
Feb 14 16:15:11 rack0 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 14 16:15:11 rack0 kernel: CR2: 0000000000006204 CR3: 000000014c506004 CR4: 0000000000772ee0
Feb 14 16:15:11 rack0 kernel: PKRU: 55555554
Feb 14 16:15:11 rack0 systemd[1]: pvestatd.service: Failed with result 'signal'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit pvestatd.service has entered the 'failed' state with result 'signal'.
Feb 14 16:15:11 rack0 systemd[1]: pvestatd.service: Consumed 14h 56min 28.993s CPU time.
░░ Subject: Resources consumed by unit runtime
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit pvestatd.service completed and consumed the indicated resources.
Feb 14 16:15:12 rack0 kernel: BUG: Bad page map in process ksmd pte:800000065c87d307 pmd:4bd1a8067
Feb 14 16:15:12 rack0 kernel: addr:00007f7e50de71b0 vm_flags:a8120073 anon_vma:ffff93a540921410 mapping:0000000000000000 index:7f7e50de7
Feb 14 16:15:12 rack0 kernel: file:(null) fault:0x0 mmap:0x0 readpage:0x0
Is this issue known? Would it be mitigated by e.g. auto-restarting
pvestatd.service
upon kill/crash?