PVE 7.2x 5.19kernel crash, need help

hootigger

New Member
Oct 28, 2022
2
0
1
the log:
Code:
Oct 28 03:42:37 pve kernel: [386108.744402]     (t=15001 jiffies g=48257437 q=65876 ncpus=4)
Oct 28 03:42:37 pve kernel: [386108.744406] NMI backtrace for cpu 1
Oct 28 03:42:37 pve kernel: [386108.744408] CPU: 1 PID: 1257485 Comm: EmbyServer Tainted: P     U     O      5.19.7-2-pve #1
Oct 28 03:42:37 pve kernel: [386108.744410] Hardware name: CncTion N5105-4L/N5105-4L, BIOS 5.19 06/17/2022
Oct 28 03:42:37 pve kernel: [386108.744412] Call Trace:
Oct 28 03:42:37 pve kernel: [386108.744414]  <IRQ>
Oct 28 03:42:37 pve kernel: [386108.744432]  dump_stack_lvl+0x49/0x63
Oct 28 03:42:37 pve kernel: [386108.744438]  dump_stack+0x10/0x16
Oct 28 03:42:37 pve kernel: [386108.744440]  nmi_cpu_backtrace.cold+0x4d/0x95
Oct 28 03:42:37 pve kernel: [386108.744443]  ? lapic_can_unplug_cpu+0x90/0x90
Oct 28 03:42:37 pve kernel: [386108.744446]  nmi_trigger_cpumask_backtrace+0xf8/0x110
Oct 28 03:42:37 pve kernel: [386108.744450]  arch_trigger_cpumask_backtrace+0x19/0x20
Oct 28 03:42:37 pve kernel: [386108.744453]  trigger_single_cpu_backtrace+0x44/0x4f
Oct 28 03:42:37 pve kernel: [386108.744455]  rcu_dump_cpu_stacks+0xfb/0x13f
Oct 28 03:42:37 pve kernel: [386108.744457]  rcu_sched_clock_irq.cold+0x3a4/0x715
Oct 28 03:42:37 pve kernel: [386108.744460]  ? __cgroup_account_cputime_field+0x3b/0x60
Oct 28 03:42:37 pve kernel: [386108.744462]  ? account_system_index_time+0x9b/0xc0
Oct 28 03:42:37 pve kernel: [386108.744465]  update_process_times+0x63/0xa0
Oct 28 03:42:37 pve kernel: [386108.744468]  ? tick_sched_do_timer+0xa0/0xa0
Oct 28 03:42:37 pve kernel: [386108.744471]  tick_sched_handle+0x29/0x70
Oct 28 03:42:37 pve kernel: [386108.744473]  ? tick_sched_do_timer+0xa0/0xa0
Oct 28 03:42:37 pve kernel: [386108.744474]  tick_sched_timer+0x6f/0x90
Oct 28 03:42:37 pve kernel: [386108.744476]  __hrtimer_run_queues+0x106/0x260
Oct 28 03:42:37 pve kernel: [386108.744479]  ? clockevents_program_event+0xa8/0x130
Oct 28 03:42:37 pve kernel: [386108.744481]  hrtimer_interrupt+0x101/0x230
Oct 28 03:42:37 pve kernel: [386108.744483]  __sysvec_apic_timer_interrupt+0x61/0x110
Oct 28 03:42:37 pve kernel: [386108.744485]  sysvec_apic_timer_interrupt+0x7b/0x90
Oct 28 03:42:37 pve kernel: [386108.744488]  </IRQ>
Oct 28 03:42:37 pve kernel: [386108.744488]  <TASK>
Oct 28 03:42:37 pve kernel: [386108.744489]  asm_sysvec_apic_timer_interrupt+0x1b/0x20
Oct 28 03:42:37 pve kernel: [386108.744492] RIP: 0010:xas_start+0x30/0x130
Oct 28 03:42:37 pve kernel: [386108.744495] Code: 53 48 89 fb 48 83 ec 10 4c 8b 67 18 4c 89 e0 83 e0 03 0f 84 82 00 00 00 48 83 f8 02 75 09 49 81 fc 05 c0 ff ff 77 39 48 8b 03 <4c> 8b 63 08 48 8b 40 08 48 89 c2 83 e2 03 48 83 fa 02 75 08 48 3d
Oct 28 03:42:37 pve kernel: [386108.744497] RSP: 0000:ffffb1e104c2f9a0 EFLAGS: 00000202
Oct 28 03:42:37 pve kernel: [386108.744500] RAX: ffff9f9b7d391130 RBX: ffffb1e104c2fa60 RCX: 0000000000000000
Oct 28 03:42:37 pve kernel: [386108.744502] RDX: 00000000000000e5 RSI: fffffffffffffffe RDI: ffffb1e104c2fa60
Oct 28 03:42:37 pve kernel: [386108.744503] RBP: ffffb1e104c2f9c0 R08: ffffb1e104c2fb30 R09: 000000000000a024
Oct 28 03:42:37 pve kernel: [386108.744504] R10: 0000000000000000 R11: ffffffffffffffff R12: 0000000000000003
Oct 28 03:42:37 pve kernel: [386108.744505] R13: ffffb1e104c2fa60 R14: 0000000000000402 R15: ffff9f9b7d391130
Oct 28 03:42:37 pve kernel: [386108.744508]  xas_load+0x1f/0xf0
Oct 28 03:42:37 pve kernel: [386108.744510]  xas_find+0x184/0x1e0
Oct 28 03:42:37 pve kernel: [386108.744512]  find_get_entries+0x72/0x1d0
Oct 28 03:42:37 pve kernel: [386108.744515]  shmem_undo_range+0x2fb/0x7f0
Oct 28 03:42:37 pve kernel: [386108.744520]  shmem_evict_inode+0x108/0x260
Oct 28 03:42:37 pve kernel: [386108.744522]  ? swake_up_one+0x70/0x70
Oct 28 03:42:37 pve kernel: [386108.744525]  evict+0xcd/0x1e0
Oct 28 03:42:37 pve kernel: [386108.744528]  iput.part.0+0x183/0x1e0
Oct 28 03:42:37 pve kernel: [386108.744530]  iput+0x1c/0x30
Oct 28 03:42:37 pve kernel: [386108.744532]  dentry_unlink_inode+0xcc/0x130
Oct 28 03:42:37 pve kernel: [386108.744534]  __dentry_kill+0xec/0x1a0
Oct 28 03:42:37 pve kernel: [386108.744536]  dput+0x1c6/0x3c0
Oct 28 03:42:37 pve kernel: [386108.744537]  __fput+0xf0/0x260
Oct 28 03:42:37 pve kernel: [386108.744540]  ____fput+0xe/0x20
Oct 28 03:42:37 pve kernel: [386108.744542]  task_work_run+0x61/0xa0
Oct 28 03:42:37 pve kernel: [386108.744545]  do_exit+0x33b/0xaa0
Oct 28 03:42:37 pve kernel: [386108.744548]  ? wake_up_state+0x10/0x20
Oct 28 03:42:37 pve kernel: [386108.744550]  do_group_exit+0x35/0xa0
Oct 28 03:42:37 pve kernel: [386108.744552]  __x64_sys_exit_group+0x18/0x20
Oct 28 03:42:37 pve kernel: [386108.744555]  do_syscall_64+0x59/0x90
Oct 28 03:42:37 pve kernel: [386108.744556]  ? exit_to_user_mode_prepare+0x8f/0x180
Oct 28 03:42:37 pve kernel: [386108.744559]  ? irqentry_exit_to_user_mode+0x9/0x20
Oct 28 03:42:37 pve kernel: [386108.744561]  ? irqentry_exit+0x3b/0x50
Oct 28 03:42:37 pve kernel: [386108.744563]  ? exc_page_fault+0x87/0x180
Oct 28 03:42:37 pve kernel: [386108.744565]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
Oct 28 03:42:37 pve kernel: [386108.744568] RIP: 0033:0x7f96ec4f4059
Oct 28 03:42:37 pve kernel: [386108.744571] Code: Unable to access opcode bytes at RIP 0x7f96ec4f402f.
Oct 28 03:42:37 pve kernel: [386108.744572] RSP: 002b:00007ffcf8d2e1e8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
Oct 28 03:42:37 pve kernel: [386108.744574] RAX: ffffffffffffffda RBX: 00007f96ec5f6880 RCX: 00007f96ec4f4059
Oct 28 03:42:37 pve kernel: [386108.744575] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000003
Oct 28 03:42:37 pve kernel: [386108.744576] RBP: 0000000000000003 R08: ffffffffffffff58 R09: 0000000000000000
Oct 28 03:42:37 pve kernel: [386108.744577] R10: 00007f96eca34f5d R11: 0000000000000246 R12: 00007f96ec5f6880
Oct 28 03:42:37 pve kernel: [386108.744578] R13: 00000000000000db R14: 00007f96ec5fbe08 R15: 0000000000000000
Oct 28 03:42:37 pve kernel: [386108.744580]  </TASK>
Oct 28 03:44:30 pve lvm[481]: Monitoring thin pool pve-data-tpool.
Oct 28 03:44:30 pve kernel: [    0.000000] Linux version 5.19.7-2-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PREEMPT_DYNAMIC PVE 5.19.7-2 (Tue, 04 Oct 2022 17:18:40 + ()
Oct 28 03:44:30 pve kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.19.7-2-pve root=/dev/mapper/pve-root ro quiet intel_iommu=on iommu=pt

pve server info:
Code:
proxmox-ve: 7.2-1 (running kernel: 5.19.7-2-pve)
pve-manager: 7.2-11 (running version: 7.2-11/b76d3178)
pve-kernel-5.15: 7.2-13
pve-kernel-5.19: 7.2-13
pve-kernel-helper: 7.2-13
pve-kernel-5.19.7-2-pve: 5.19.7-2
pve-kernel-5.15.64-1-pve: 5.15.64-1
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-3
libpve-guest-common-perl: 4.1-4
libpve-http-server-perl: 4.1-4
libpve-storage-perl: 7.2-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.7-1
proxmox-backup-file-restore: 2.2.7-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-3
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-6
pve-firmware: 3.5-6
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.6-pve1

For about 3 days it would crash, and I changed the kernel parameters to reboot automatically. Can anyone help me? What is the problem?
English is not a native language, I hope you can understand what I mean

Code:
#!/bin/bash
echo -e "before..."
sysctl kernel.watchdog_thresh
sysctl kernel.panic          
sysctl kernel.hung_task_panic                
sysctl kernel.softlockup_panic
sysctl kernel.perf_cpu_time_max_percent

echo -e "after..."
sysctl -w kernel.watchdog_thresh=30
sysctl -w kernel.panic=5
#sysctl -w kernel.hung_task_panic=1
sysctl -w kernel.softlockup_panic=1
sysctl -w kernel.perf_cpu_time_max_percent=40
 
Last edited:
Please let me know if you need any other information, I would be happy to provide it, thanks again!