One of my PVE hosts hung today. Web UI alive and stats updating, but no actions possible, no access to VM/LXC consoles or host shell. Ping response, SSH not. No obvious issues in the stats but they were pretty much flat for a couple of hours, which isn't real. Physical console had a ton of messages about killing leftover systemd-journal processes. A ctrl+alt+del on the console threw up a load more similar messages about systemd-journald and other processes "running after unit stopped". Ultimately, it got stuck after trying to terminate some other processes and I had to hard reset the machine.
Looking at journalctl on the host after the reset, I can see a lot of this:
and in the journal of two of the LXCs, both Ubuntu 24.04 systems:
Both issues repeat about every 10 minutes until the time at which the CPU graph flattens, when the LXC journals stop recording anything until I rebooted the host.
Given that it correlates with sysstat-collect errors on two guests, I've disabled sysstat, sysstat-collect and sysstat-collect-timer in the LXCs for now. The third guest active on the system, a Debian-based VM, showed no issues. The PVE UI charts now have a gap for the 2 hours where previously they were showing flat/flat-ish values, so I guess there wasn't really any data collected in that period.
Looks to me, like this might be a kernel and/or LXC issue. Has anyone come across this or similar and has any idea how to identify what caused it and how to fix it / where to report it? Should I disable sysstat-collect on the host too, or will that upset Proxmox too much?
Thanks!
Looking at journalctl on the host after the reset, I can see a lot of this:
Oct 13 00:20:11 orinoco systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Oct 13 00:20:11 orinoco kernel: BUG: unable to handle page fault for address: 0000000000200000
Oct 13 00:20:11 orinoco kernel: #PF: supervisor read access in kernel mode
Oct 13 00:20:11 orinoco kernel: #PF: error_code(0x0000) - not-present page
Oct 13 00:20:11 orinoco kernel: PGD 0 P4D 0
Oct 13 00:20:11 orinoco kernel: Oops: Oops: 0000 [#2679] PREEMPT SMP NOPTI
Oct 13 00:20:11 orinoco kernel: CPU: 8 UID: 0 PID: 3563143 Comm: sadc Tainted: P D W O 6.14.11-3-pve #1
Oct 13 00:20:11 orinoco kernel: Tainted: [P]=PROPRIETARY_MODULE, [D]=DIE, [W]=WARN, [O]=OOT_MODULE
Oct 13 00:20:11 orinoco kernel: Hardware name: Gigabyte Technology Co., Ltd. B760M DS3H DDR4/B760M DS3H DDR4, BIOS F21 06/19/2025
Oct 13 00:20:11 orinoco kernel: RIP: 0010:softnet_seq_show+0x34/0xa0
Oct 13 00:20:11 orinoco kernel: Code: 57 45 31 ff 41 56 41 55 41 54 49 89 fc 53 48 89 f3 44 8b b6 00 01 00 00 44 8b 6e 20 e8 95 2d 32 ff 48 8b 43 40 48 85 c0 74 03 <44> 8b 38 e8 b4 63 32 ff 8b 53 28 4c 89 e7 45 31 >
Oct 13 00:20:11 orinoco kernel: RSP: 0018:ffffcecd45b4bb00 EFLAGS: 00010206
Oct 13 00:20:11 orinoco kernel: RAX: 0000000000200000 RBX: ffff8e2a1f7b8600 RCX: 00000000000003c8
Oct 13 00:20:11 orinoco kernel: RDX: 0000000000000000 RSI: ffff8e2a1f7b8600 RDI: ffff8e228503a7f8
Oct 13 00:20:11 orinoco kernel: RBP: ffffcecd45b4bb28 R08: 0000000000000000 R09: 0000000000000000
Oct 13 00:20:11 orinoco kernel: R10: 0000000000000195 R11: 000000000000000a R12: ffff8e228503a7f8
Oct 13 00:20:11 orinoco kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Oct 13 00:20:11 orinoco kernel: FS: 000074ae808b2740(0000) GS:ffff8e2a1f600000(0000) knlGS:0000000000000000
Oct 13 00:20:11 orinoco kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 13 00:20:11 orinoco kernel: CR2: 0000000000200000 CR3: 00000001572f2005 CR4: 0000000000f72ef0
Oct 13 00:20:11 orinoco kernel: PKRU: 55555554
Oct 13 00:20:11 orinoco kernel: Call Trace:
Oct 13 00:20:11 orinoco kernel: <TASK>
Oct 13 00:20:11 orinoco kernel: seq_read_iter+0x2c5/0x490
Oct 13 00:20:11 orinoco kernel: proc_reg_read_iter+0x58/0x90
Oct 13 00:20:11 orinoco kernel: vfs_read+0x2b4/0x390
Oct 13 00:20:11 orinoco kernel: ksys_read+0x70/0xf0
Oct 13 00:20:11 orinoco kernel: __x64_sys_read+0x19/0x30
Oct 13 00:20:11 orinoco kernel: x64_sys_call+0x1ba0/0x2310
Oct 13 00:20:11 orinoco kernel: do_syscall_64+0x7e/0x170
Oct 13 00:20:11 orinoco kernel: ? apparmor_file_permission+0x77/0x1c0
Oct 13 00:20:11 orinoco kernel: ? mutex_lock+0x12/0x50
Oct 13 00:20:11 orinoco kernel: ? seq_read_iter+0x21c/0x490
Oct 13 00:20:11 orinoco kernel: ? proc_reg_read_iter+0x2c/0x90
Oct 13 00:20:11 orinoco kernel: ? vfs_read+0x2b4/0x390
Oct 13 00:20:11 orinoco kernel: ? ksys_read+0x70/0xf0
Oct 13 00:20:11 orinoco kernel: ? arch_exit_to_user_mode_prepare.isra.0+0x22/0x120
Oct 13 00:20:11 orinoco kernel: ? syscall_exit_to_user_mode+0x38/0x1d0
Oct 13 00:20:11 orinoco kernel: ? do_syscall_64+0x8a/0x170
Oct 13 00:20:11 orinoco kernel: ? do_syscall_64+0x8a/0x170
Oct 13 00:20:11 orinoco kernel: ? arch_exit_to_user_mode_prepare.isra.0+0x22/0x120
Oct 13 00:20:11 orinoco kernel: ? syscall_exit_to_user_mode+0x38/0x1d0
Oct 13 00:20:11 orinoco kernel: ? do_syscall_64+0x8a/0x170
Oct 13 00:20:11 orinoco kernel: ? do_syscall_64+0x8a/0x170
Oct 13 00:20:11 orinoco kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Oct 13 00:20:11 orinoco kernel: RIP: 0033:0x74ae80a34687
Oct 13 00:20:11 orinoco kernel: Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 >
Oct 13 00:20:11 orinoco kernel: RSP: 002b:00007fff583ce4c0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
Oct 13 00:20:11 orinoco kernel: RAX: ffffffffffffffda RBX: 000074ae808b2740 RCX: 000074ae80a34687
Oct 13 00:20:11 orinoco kernel: RDX: 0000000000000400 RSI: 000057361a6aeb30 RDI: 0000000000000004
Oct 13 00:20:11 orinoco kernel: RBP: 000074ae80b8afd0 R08: 0000000000000000 R09: 0000000000000000
Oct 13 00:20:11 orinoco kernel: R10: 0000000000000000 R11: 0000000000000202 R12: 000074ae80b8ae80
Oct 13 00:20:11 orinoco kernel: R13: 000057361a6aef30 R14: 00000000000003b0 R15: 000057361a6b5300
Oct 13 00:20:11 orinoco kernel: </TASK>
Oct 13 00:20:11 orinoco kernel: Modules linked in: dm_snapshot tcp_diag inet_diag nf_conntrack_netlink xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE xt_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_i>
Oct 13 00:20:11 orinoco kernel: snd_hda_ext_core snd_soc_core snd_compress x86_pkg_temp_thermal intel_powerclamp ac97_bus coretemp snd_pcm_dmaengine snd_hda_intel kvm_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_>
Oct 13 00:20:11 orinoco kernel: dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq usbmouse usbkbd hid_generic usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio nvme xhci_pci i>
Oct 13 00:20:11 orinoco kernel: CR2: 0000000000200000
Oct 13 00:20:11 orinoco kernel: ---[ end trace 0000000000000000 ]---
Oct 13 00:20:11 orinoco kernel: RIP: 0010:softnet_seq_show+0x34/0xa0
Oct 13 00:20:11 orinoco kernel: Code: 57 45 31 ff 41 56 41 55 41 54 49 89 fc 53 48 89 f3 44 8b b6 00 01 00 00 44 8b 6e 20 e8 95 2d 32 ff 48 8b 43 40 48 85 c0 74 03 <44> 8b 38 e8 b4 63 32 ff 8b 53 28 4c 89 e7 45 31 >
Oct 13 00:20:11 orinoco kernel: RSP: 0018:ffffcecd4a69fd40 EFLAGS: 00010206
Oct 13 00:20:11 orinoco kernel: RAX: 0000000000200000 RBX: ffff8e2a1f7b8600 RCX: 00000000000003c8
Oct 13 00:20:11 orinoco kernel: RDX: 0000000000000000 RSI: ffff8e2a1f7b8600 RDI: ffff8e22971e9348
Oct 13 00:20:11 orinoco kernel: RBP: ffffcecd4a69fd68 R08: 0000000000000000 R09: 0000000000000000
Oct 13 00:20:11 orinoco kernel: R10: 0000000000000195 R11: 000000000000000a R12: ffff8e22971e9348
Oct 13 00:20:11 orinoco kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Oct 13 00:20:11 orinoco kernel: FS: 000074ae808b2740(0000) GS:ffff8e2a1f600000(0000) knlGS:0000000000000000
Oct 13 00:20:11 orinoco kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 13 00:20:11 orinoco kernel: CR2: 0000000000200000 CR3: 00000001572f2005 CR4: 0000000000f72ef0
Oct 13 00:20:11 orinoco kernel: PKRU: 55555554
Oct 13 00:20:11 orinoco kernel: note: sadc[3563143] exited with irqs disabled
Oct 13 00:20:11 orinoco systemd[1]: sysstat-collect.service: Main process exited, code=killed, status=9/KILL
Oct 13 00:20:11 orinoco systemd[1]: sysstat-collect.service: Failed with result 'signal'.
Oct 13 00:20:11 orinoco systemd[1]: Failed to start sysstat-collect.service - system activity accounting tool.
Oct 13 00:20:37 orinoco kernel: BUG: unable to handle page fault for address: 0000000000200000
Oct 13 00:20:37 orinoco kernel: #PF: supervisor read access in kernel mode
Oct 13 00:20:37 orinoco kernel: #PF: error_code(0x0000) - not-present page
Oct 13 00:20:37 orinoco kernel: PGD 0 P4D 0
Oct 13 00:20:37 orinoco kernel: Oops: Oops: 0000 [#2680] PREEMPT SMP NOPTI
Oct 13 00:20:37 orinoco kernel: CPU: 11 UID: 100000 PID: 3563338 Comm: sadc Tainted: P D W O 6.14.11-3-pve #1
Oct 13 00:20:37 orinoco kernel: Tainted: [P]=PROPRIETARY_MODULE, [D]=DIE, [W]=WARN, [O]=OOT_MODULE
Oct 13 00:20:37 orinoco kernel: Hardware name: Gigabyte Technology Co., Ltd. B760M DS3H DDR4/B760M DS3H DDR4, BIOS F21 06/19/2025
Oct 13 00:20:37 orinoco kernel: RIP: 0010:softnet_seq_show+0x34/0xa0
Oct 13 00:20:37 orinoco kernel: Code: 57 45 31 ff 41 56 41 55 41 54 49 89 fc 53 48 89 f3 44 8b b6 00 01 00 00 44 8b 6e 20 e8 95 2d 32 ff 48 8b 43 40 48 85 c0 74 03 <44> 8b 38 e8 b4 63 32 ff 8b 53 28 4c 89 e7 45 31 >
Oct 13 00:20:37 orinoco kernel: RSP: 0018:ffffcecd4bd53d40 EFLAGS: 00010206
Oct 13 00:20:37 orinoco kernel: RAX: 0000000000200000 RBX: ffff8e2a1f7b8600 RCX: 00000000000003c8
Oct 13 00:20:37 orinoco kernel: RDX: 0000000000000000 RSI: ffff8e2a1f7b8600 RDI: ffff8e229888f4b0
Oct 13 00:20:37 orinoco kernel: RBP: ffffcecd4bd53d68 R08: 0000000000000000 R09: 0000000000000000
Oct 13 00:20:37 orinoco kernel: R10: 0000000000000195 R11: 000000000000000a R12: ffff8e229888f4b0
Oct 13 00:20:37 orinoco kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Oct 13 00:20:37 orinoco kernel: FS: 0000786b1e0b2740(0000) GS:ffff8e2a1f780000(0000) knlGS:0000000000000000
Oct 13 00:20:37 orinoco kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 13 00:20:37 orinoco kernel: CR2: 0000000000200000 CR3: 000000010b91a005 CR4: 0000000000f72ef0
Oct 13 00:20:37 orinoco kernel: PKRU: 55555554
Oct 13 00:20:37 orinoco kernel: Call Trace:
Oct 13 00:20:37 orinoco kernel: <TASK>
Oct 13 00:20:37 orinoco kernel: seq_read_iter+0x2c5/0x490
Oct 13 00:20:37 orinoco kernel: proc_reg_read_iter+0x58/0x90
Oct 13 00:20:37 orinoco kernel: vfs_read+0x2b4/0x390
Oct 13 00:20:37 orinoco kernel: ksys_read+0x70/0xf0
Oct 13 00:20:37 orinoco kernel: __x64_sys_read+0x19/0x30
Oct 13 00:20:37 orinoco kernel: x64_sys_call+0x1ba0/0x2310
Oct 13 00:20:37 orinoco kernel: do_syscall_64+0x7e/0x170
Oct 13 00:20:37 orinoco kernel: ? do_syscall_64+0x8a/0x170
Oct 13 00:20:37 orinoco kernel: ? exc_page_fault+0x96/0x1e0
Oct 13 00:20:37 orinoco kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Oct 13 00:20:37 orinoco kernel: RIP: 0033:0x786b1df1ba91
Oct 13 00:20:37 orinoco kernel: Code: 00 48 8b 15 89 73 0e 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd e8 40 c4 01 00 f3 0f 1e fa 80 3d b5 f5 0e 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 4f c3 66 0f 1f 44 00 00 55 >
Oct 13 00:20:37 orinoco kernel: RSP: 002b:00007ffeda01f308 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
Oct 13 00:20:37 orinoco kernel: RAX: ffffffffffffffda RBX: 00005b1bdb8752e0 RCX: 0000786b1df1ba91
Oct 13 00:20:37 orinoco kernel: RDX: 0000000000000400 RSI: 00005b1bdb86dbf0 RDI: 0000000000000004
Oct 13 00:20:37 orinoco kernel: RBP: 00007ffeda01f340 R08: 0000000000000000 R09: 0000000000000000
Oct 13 00:20:37 orinoco kernel: R10: 0000786b1dfb1fc0 R11: 0000000000000246 R12: 0000786b1e002030
Oct 13 00:20:37 orinoco kernel: R13: 0000786b1e001ee0 R14: 00005b1bdb86dff0 R15: 00005b1bdb8752e0
Oct 13 00:20:37 orinoco kernel: </TASK>
Oct 13 00:20:37 orinoco kernel: Modules linked in: dm_snapshot tcp_diag inet_diag nf_conntrack_netlink xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE xt_set nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_i>
Oct 13 00:20:37 orinoco kernel: snd_hda_ext_core snd_soc_core snd_compress x86_pkg_temp_thermal intel_powerclamp ac97_bus coretemp snd_pcm_dmaengine snd_hda_intel kvm_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_>
Oct 13 00:20:37 orinoco kernel: dmi_sysfs ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq usbmouse usbkbd hid_generic usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio nvme xhci_pci i>
Oct 13 00:20:37 orinoco kernel: CR2: 0000000000200000
Oct 13 00:20:37 orinoco kernel: ---[ end trace 0000000000000000 ]
Oct 13 00:20:37 orinoco kernel: RIP: 0010:softnet_seq_show+0x34/0xa0
Oct 13 00:20:37 orinoco kernel: Code: 57 45 31 ff 41 56 41 55 41 54 49 89 fc 53 48 89 f3 44 8b b6 00 01 00 00 44 8b 6e 20 e8 95 2d 32 ff 48 8b 43 40 48 85 c0 74 03 <44> 8b 38 e8 b4 63 32 ff 8b 53 28 4c 89 e7 45 31 >
Oct 13 00:20:37 orinoco kernel: RSP: 0018:ffffcecd4a69fd40 EFLAGS: 00010206
Oct 13 00:20:37 orinoco kernel: RAX: 0000000000200000 RBX: ffff8e2a1f7b8600 RCX: 00000000000003c8
Oct 13 00:20:37 orinoco kernel: RDX: 0000000000000000 RSI: ffff8e2a1f7b8600 RDI: ffff8e22971e9348
Oct 13 00:20:37 orinoco kernel: RBP: ffffcecd4a69fd68 R08: 0000000000000000 R09: 0000000000000000
Oct 13 00:20:37 orinoco kernel: R10: 0000000000000195 R11: 000000000000000a R12: ffff8e22971e9348
Oct 13 00:20:37 orinoco kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Oct 13 00:20:37 orinoco kernel: FS: 0000786b1e0b2740(0000) GS:ffff8e2a1f780000(0000) knlGS:0000000000000000
Oct 13 00:20:37 orinoco kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 13 00:20:37 orinoco kernel: CR2: 0000000000200000 CR3: 000000010b91a005 CR4: 0000000000f72ef0
Oct 13 00:20:37 orinoco kernel: PKRU: 55555554
Oct 13 00:20:37 orinoco kernel: note: sadc[3563338] exited with irqs disabled
and in the journal of two of the LXCs, both Ubuntu 24.04 systems:
Oct 13 11:00:37 cctv2 systemd[1]: Starting sysstat-collect.service - system activity accounting tool...
Oct 13 11:00:37 cctv2 systemd[1]: sysstat-collect.service: Main process exited, code=killed, status=9/KILL
Oct 13 11:00:37 cctv2 systemd[1]: sysstat-collect.service: Failed with result 'signal'.
Oct 13 11:00:37 cctv2 systemd[1]: Failed to start sysstat-collect.service - system activity accounting tool.
Both issues repeat about every 10 minutes until the time at which the CPU graph flattens, when the LXC journals stop recording anything until I rebooted the host.
Given that it correlates with sysstat-collect errors on two guests, I've disabled sysstat, sysstat-collect and sysstat-collect-timer in the LXCs for now. The third guest active on the system, a Debian-based VM, showed no issues. The PVE UI charts now have a gap for the 2 hours where previously they were showing flat/flat-ish values, so I guess there wasn't really any data collected in that period.
Looks to me, like this might be a kernel and/or LXC issue. Has anyone come across this or similar and has any idea how to identify what caused it and how to fix it / where to report it? Should I disable sysstat-collect on the host too, or will that upset Proxmox too much?
Thanks!
Last edited: